Creditscoring in retail bankingPredictingcreditworthiness of borrowers PrashantDimriMastersin Data AnalyticsDublinCity UniversityDublin,[email protected]

ie Abstract- Credit scoring techniquesare used for determining the creditworthiness of aborrower that is to determine whether to give a loan to a borrower or not basedon credit scoring. The higher the score, better it is for banks to give a loanto borrowers. The aim of the paper is to develop a retail credit scoring model usingtechniques like logistic regression, clustering, and propensity scoring methodand to investigate on various things like on how the incorporation of morevariables improve the accuracy of the model, top five factors influencing therisk, to see how important credit and demographic variables are, what cut offprobability to choose that have an effect on number of correct and non-correctevents thus influencing the confusion matrix. Keywords- Credit Scoring, LogisticRegression, Retail banking, Credit Risk 1. IntroductionThe aim of commercial banks is to give credit toborrowers.

Credit risk arises when borrower defaults in repayment of loan whichcan cause because of various reasons like insolvency of borrowers, willfuldefault (when borrower intentionally doesn’t pay) etc. History signifies thatineffective credit risk management can lead any banks or financial institutionto bankruptcy. So, it is imperative for banks or financial institution toobserve and accumulate information about potential borrowers and to reviewperformance of an accepted borrower over time as well to maintain solvency,thus quality of loan is very important for survival and for profitability. Sothere comes a credit scoring which helps in reducing cost and time in decisionmaking thus improving the profitability of banks. The need for a formal processfor credit scoring first started in 1960’s when there was boom in the creditcard business and automatic decision-making process became vital for businessgrowth (Trainer, 2015).

Creditscoring is a method of mitigating the probability of a default among customers,which can thus maximize the profitability of a bank or financial institutionsby minimizing the ensuing risk to them. Techniques like regression analysis,logistic regression, support vector machine, decision trees, neural network,etc. are widely used in building credit scoring models (Pointon, 2011).Anderson (2007) broke creditscoring into two parts.

Credit means buy now, pay later and scoring refer tonumerical tool to rank order cases according to real or perceived quality todiscriminate them and to ensure objective and consistent decision. So, creditscoring is simply the use of statistical models to transform data into numericalform for better decision making. Credit score determine how risky borrower is.The higher the score, better it is for banks to give loans to borrowers. Theaim of this paper is to build appropriate retail credit scoring model topredict creditworthiness of borrower and so able to see which are the mostimportant variables in decision making process. 2. Literature Review Mostly credit scoring is done with non-retail loans asthe data tends to be more readily available. Also, the amount of money for lendingin the given to non-retail sector tend to be higher than the retail sector (Kocendaand Vojtek, 2009).

In retail lending, various socio-demographic variables alongwith credit bureau variables of customers are taken to make a prediction about theclient’s portfolio. Through this, credit scoring is developed for estimating theprobability of default on retail loans. Blazy and Weill (2006) have stated that riskier loans are to becollatarized else should not be financed. According to Basel II capital accord (Basel Comitteeon Banking Supervision, 2015), any loan which is not repaid within 90 days will be considered as a Non-PerformingLoan. The Initial decision to sanction a loan is normally based on judgmentalapproach by just analyzing the details on the application form of the borrowers (Pope). It is based on the so-called 5 C’s Principleof Character, Capital, Capacity, Collateral, and Condition.With the help of a statistical model, credit scoringconverts data based on these traditional criteria of 5 C’s of credit intonumerical form to make credit decision that is to determine whether futurecustomers will default or not. As Credit Scoring Model tends to reduce the time andcost spent by the loan officer on loan assessment, hence decreasing the defaultratio, it is far better than traditional approach of loan assessment (Caire and Kossmann, 2003).

Hand and Henley (1997) applied and reviewed various statistical techniques like logisticregression, neural networks, and recursive partitioning, etc. for buildingcredit scoring model. They came to conclusion that apart from classification ofcustomers into good and bad based on their initial application characteristics,there are also various statistical challenges in credit scoring like loanreview functions (to know when to approach customers for repayment of theirloans), fraud, questions like when and how to act on delinquent loan, etc.D.J.Hand (2005) examined that in predictive models wherebyscorecards were used to assign customers to classes thus leading to the propercourse of action being taken based on a customer’s predicted score being aboveor below a given threshold. Common statistical measures like Gini Coefficient,Kolmogorov-Smirnov statistic, etc.

may not use relevant information about themagnitude of scores thus leading to possible misclassification leading todegradation in decision quality. It was Anderson in (2007) who proposed thatcredit scoring term be divided into two segments, credit and scoring. Credit istaking money now and paying it later.

Scoring is numbers given to determine thecustomer’s creditworthiness that is whether customer is worthy enough to have aloan or not. Higher the credit score better the customer is to have a loan.Dinh and Kleimeier (2007)proposed a credit scoring modelfor Vietnamese retail loans which made them conclude that credit risk modelinghelp banks to reduce time, cost spent on loan assessment and thus helps inincrease in the profitability of the business. Hasan (2016) made a retail credit scoring model on scarce data to findthe probability of default on retail loans and concluded that even with scarcedata, construction of a model can be achieved thus helping the decision makersin expediting the credit appraisal process. Kocenda and Vojtek (2009) concluded that taking account ofsocio-demographic factors was imperative during the time spent giving creditand along these lines such factors ought not to be rejected from credit scoringmodel determination. Hand and Zhou (2009) studied two behavioral classifications (settleimmediately versus not settle immediately and make some repayment versus makeno repayment.) and prediction was made using rule whether in which class eachcustomer belongs. The aim was to construct a rule that will allow objects to beassigned to one of the classes (here 0 and 1).

The rule is constructed frompast data for a sample of objects. In this, there are two fundamental aspectsof classification rules that were considered when performance was evaluated.The first was the score distribution for two classes as 0 and 1. A secondaspect was choice of classification threshold (t) such that objects with scoresgreater than t are predicted to belong to class 1 and to class 0 otherwise.Misclassification arose when object with score above t belong to class 0 andobject with score below t belong to class 1. Performance is vital in choosing arule appropriately and thus getting accurate predictions of future behavior ofcustomers. Bekhet and Eletter (2014) proposedtwo credit scoring models (Logistic regression and Radial basis function)utilizing data mining techniques to help advance choices for the Jordanianbanks.

Advance application assessment would enhance credit choice viability andcontrol advance office tasks and in addition spare time and cost for analysisand concluded that logistic regression model was slightly better than radialbasis function model in terms of overall accuracy rate, but radial basisfunction was good in identifying those customers who might default. Karwa?ski Grzybowska (2015) examinedthat model are not only meant for finding the probability of default but alsoto identify the risk drivers, so propensity scoring method is used to detectrisk drivers using logistic regression, random forest and gradient boosting.Hussain A. Abdou, et all (2016) comparedperformances of various models by using ROC curves and Gini coefficients whichwere used for evaluation criteria and Kolmogorov test which was used forrobustness with using different techniques- Logistic regression, Classificationand regression tree and cascade correlation neural network (CCNN). They foundout that CCNN was superior to other techniques. Also, variables like previousoccupation, borrower’s account functioning, guarantees, other loans and monthlyexpenses were identified as key variables for forecasting and decision-makingprocesses of a credit policy.

3. ResearchPlan My research is on consumer credit risk of credit cards.As per the research on various papers of credit risk management, techniqueslike Logistic Regression can be used to build a data scoring model that is forbuilding predictive models to determine the credit card risk on the dataset.Different types of model validation to check for the accuracy of the model likegain/lift curve, concordance/discordance ratio, ROC curve can be done. Gainchart is used to determine how much better one can do with predictive modelsthan without it. Gain chart is somewhat like propensity scoring match whereobservation is equally divided into 10 equal groups and then cumulative numberof actual events are taken and then outcome of the chart should be likepredictive outcome should come higher than observed outcome for better modelaccuracy. Then there will be confusion matrix which has four things in it- Trueand false positive, true and false negative.

Further, ROC curve which is curvedefining your true positive and false positive. Greater the number of truepositives and true negatives, much better our model is. Once the model is fine,it will help in combating the risk associated with new data and thus improves aquality of loans.Clustering can be done where segmentation can be done onthe dataset and thus homogenous clusters can be made and next credit scoringcan be done on each homogenous segment. Though it can lead to additional costdue to development, implementation, maintenance etc but at the same timeimproves performance.

4. ConclusionsIn thispaper, retail credit scoring model will be built on dataset having 150 datapoints with 11 variables which is divided into 2 parts- borrower credit bureauand borrower demographic bureau. Based on this predictive model, we will beable to find the creditworthiness of a borrower whether borrower should begiven a loan or not. Various techniques like logistic regression, clusteringand propensity scoring model will be used.

Further different types of model validation to check for the accuracy of themodel like gain/lift curve, concordance/discordance ratio, ROC curve can bedone.