Credit of correct and non-correct events thus influencing the

scoring in retail banking

creditworthiness of borrowers

We Will Write a Custom Essay Specifically
For You For Only $13.90/page!

order now



in Data Analytics

City University


[email protected]



Abstract- Credit scoring techniques
are used for determining the creditworthiness of a
borrower that is to determine whether to give a loan to a borrower or not based
on credit scoring. The higher the score, better it is for banks to give a loan
to borrowers. The aim of the paper is to develop a retail credit scoring model using
techniques like logistic regression, clustering, and propensity scoring method
and to investigate on various things like on how the incorporation of more
variables improve the accuracy of the model, top five factors influencing the
risk, to see how important credit and demographic variables are, what cut off
probability to choose that have an effect on number of correct and non-correct
events thus influencing the confusion matrix. Keywords- Credit Scoring, Logistic
Regression, Retail banking, Credit Risk


1. Introduction

The aim of commercial banks is to give credit to
borrowers. Credit risk arises when borrower defaults in repayment of loan which
can cause because of various reasons like insolvency of borrowers, willful
default (when borrower intentionally doesn’t pay) etc. History signifies that
ineffective credit risk management can lead any banks or financial institution
to bankruptcy. So, it is imperative for banks or financial institution to
observe and accumulate information about potential borrowers and to review
performance of an accepted borrower over time as well to maintain solvency,
thus quality of loan is very important for survival and for profitability. So
there comes a credit scoring which helps in reducing cost and time in decision
making thus improving the profitability of banks. The need for a formal process
for credit scoring first started in 1960’s when there was boom in the credit
card business and automatic decision-making process became vital for business
growth (Trainer, 2015). Credit
scoring is a method of mitigating the probability of a default among customers,
which can thus maximize the profitability of a bank or financial institutions
by minimizing the ensuing risk to them. Techniques like regression analysis,
logistic regression, support vector machine, decision trees, neural network,
etc. are widely used in building credit scoring models (Pointon, 2011).
Anderson (2007) broke credit
scoring into two parts. Credit means buy now, pay later and scoring refer to
numerical tool to rank order cases according to real or perceived quality to
discriminate them and to ensure objective and consistent decision. So, credit
scoring is simply the use of statistical models to transform data into numerical
form for better decision making. Credit score determine how risky borrower is.
The higher the score, better it is for banks to give loans to borrowers. The
aim of this paper is to build appropriate retail credit scoring model to
predict creditworthiness of borrower and so able to see which are the most
important variables in decision making process.



2. Literature Review


Mostly credit scoring is done with non-retail loans as
the data tends to be more readily available. Also, the amount of money for lending
in the given to non-retail sector tend to be higher than the retail sector (Kocenda
and Vojtek, 2009). In retail lending, various socio-demographic variables along
with credit bureau variables of customers are taken to make a prediction about the
client’s portfolio. Through this, credit scoring is developed for estimating the
probability of default on retail loans. Blazy and Weill (2006) have stated that riskier loans are to be
collatarized else should not be financed. 

According to Basel II capital accord (Basel Comittee
on Banking Supervision, 2015), any loan which is not repaid within 90 days will be considered as a Non-Performing
Loan. The Initial decision to sanction a loan is normally based on judgmental
approach by just analyzing the details on the application form of the borrowers (Pope). It is based on the so-called 5 C’s Principle
of Character, Capital, Capacity, Collateral, and Condition.

With the help of a statistical model, credit scoring
converts data based on these traditional criteria of 5 C’s of credit into
numerical form to make credit decision that is to determine whether future
customers will default or not. As Credit Scoring Model tends to reduce the time and
cost spent by the loan officer on loan assessment, hence decreasing the default
ratio, it is far better than traditional approach of loan assessment (Caire and Kossmann, 2003). Hand and Henley (1997) applied and reviewed various statistical techniques like logistic
regression, neural networks, and recursive partitioning, etc. for building
credit scoring model. They came to conclusion that apart from classification of
customers into good and bad based on their initial application characteristics,
there are also various statistical challenges in credit scoring like loan
review functions (to know when to approach customers for repayment of their
loans), fraud, questions like when and how to act on delinquent loan, etc.

D.J.Hand (2005) examined that in predictive models whereby
scorecards were used to assign customers to classes thus leading to the proper
course of action being taken based on a customer’s predicted score being above
or below a given threshold. Common statistical measures like Gini Coefficient,
Kolmogorov-Smirnov statistic, etc. may not use relevant information about the
magnitude of scores thus leading to possible misclassification leading to
degradation in decision quality. It was Anderson in (2007) who proposed that
credit scoring term be divided into two segments, credit and scoring. Credit is
taking money now and paying it later. Scoring is numbers given to determine the
customer’s creditworthiness that is whether customer is worthy enough to have a
loan or not. Higher the credit score better the customer is to have a loan.

Dinh and Kleimeier (2007)
proposed a credit scoring model
for Vietnamese retail loans which made them conclude that credit risk modeling
help banks to reduce time, cost spent on loan assessment and thus helps in
increase in the profitability of the business. Hasan (2016) made a retail credit scoring model on scarce data to find
the probability of default on retail loans and concluded that even with scarce
data, construction of a model can be achieved thus helping the decision makers
in expediting the credit appraisal process. Kocenda and Vojtek (2009) concluded that taking account of
socio-demographic factors was imperative during the time spent giving credit
and along these lines such factors ought not to be rejected from credit scoring
model determination. Hand and Zhou (2009) studied two behavioral classifications (settle
immediately versus not settle immediately and make some repayment versus make
no repayment.) and prediction was made using rule whether in which class each
customer belongs. The aim was to construct a rule that will allow objects to be
assigned to one of the classes (here 0 and 1). The rule is constructed from
past data for a sample of objects. In this, there are two fundamental aspects
of classification rules that were considered when performance was evaluated.
The first was the score distribution for two classes as 0 and 1. A second
aspect was choice of classification threshold (t) such that objects with scores
greater than t are predicted to belong to class 1 and to class 0 otherwise.
Misclassification arose when object with score above t belong to class 0 and
object with score below t belong to class 1. Performance is vital in choosing a
rule appropriately and thus getting accurate predictions of future behavior of
customers. Bekhet and Eletter (2014) proposed
two credit scoring models (Logistic regression and Radial basis function)
utilizing data mining techniques to help advance choices for the Jordanian
banks. Advance application assessment would enhance credit choice viability and
control advance office tasks and in addition spare time and cost for analysis
and concluded that logistic regression model was slightly better than radial
basis function model in terms of overall accuracy rate, but radial basis
function was good in identifying those customers who might default. Karwa?ski Grzybowska (2015) examined
that model are not only meant for finding the probability of default but also
to identify the risk drivers, so propensity scoring method is used to detect
risk drivers using logistic regression, random forest and gradient boosting.
Hussain A. Abdou, et all (2016) compared
performances of various models by using ROC curves and Gini coefficients which
were used for evaluation criteria and Kolmogorov test which was used for
robustness with using different techniques- Logistic regression, Classification
and regression tree and cascade correlation neural network (CCNN). They found
out that CCNN was superior to other techniques. Also, variables like previous
occupation, borrower’s account functioning, guarantees, other loans and monthly
expenses were identified as key variables for forecasting and decision-making
processes of a credit policy.


3. Research


My research is on consumer credit risk of credit cards.
As per the research on various papers of credit risk management, techniques
like Logistic Regression can be used to build a data scoring model that is for
building predictive models to determine the credit card risk on the dataset.
Different types of model validation to check for the accuracy of the model like
gain/lift curve, concordance/discordance ratio, ROC curve can be done. Gain
chart is used to determine how much better one can do with predictive models
than without it. Gain chart is somewhat like propensity scoring match where
observation is equally divided into 10 equal groups and then cumulative number
of actual events are taken and then outcome of the chart should be like
predictive outcome should come higher than observed outcome for better model
accuracy. Then there will be confusion matrix which has four things in it- True
and false positive, true and false negative. Further, ROC curve which is curve
defining your true positive and false positive. Greater the number of true
positives and true negatives, much better our model is. Once the model is fine,
it will help in combating the risk associated with new data and thus improves a
quality of loans.

Clustering can be done where segmentation can be done on
the dataset and thus homogenous clusters can be made and next credit scoring
can be done on each homogenous segment. Though it can lead to additional cost
due to development, implementation, maintenance etc but at the same time
improves performance.



4. Conclusions

In this
paper, retail credit scoring model will be built on dataset having 150 data
points with 11 variables which is divided into 2 parts- borrower credit bureau
and borrower demographic bureau. Based on this predictive model, we will be
able to find the creditworthiness of a borrower whether borrower should be
given a loan or not. Various techniques like logistic regression, clustering
and propensity scoring model will be used. Further different types of model validation to check for the accuracy of the
model like gain/lift curve, concordance/discordance ratio, ROC curve can be