Machine learning is changing the game in the lending space by translating the raw data into valuable insights and helping lenders make better credit risk decisions.
What is Machine Learning?
Taking a step back, ML is a very hyped word today and as an easy description, you could call it a mathematical technique used to automate learning from data. But what does that actually mean? It means feeding data into an algorithm to create models, which are optimised to solve a specific problem.
That said, it will probably not revolutionise consumer lending by itself but implementing it can help springboard your modelling and decision making to the next level; helping you to boost your business by lowering NPLs and also increase acceptance rates.
Are There Reasons for not Implementing ML?
Credit underwriting is all about taking calculated risks and having the right process in place to make accurate risk-based decisions. Instantor recently conducted a report together with stakeholders in the industry from over 20 European countries.
We were happy to see that 70% of those participating in our survey are already seeing the value of ML and implementing these methods. This gives them a competitive advantage as they are seeing a big lift in the accuracy of predictions.
However, the quality of data matters. When the wrong data points are being fed into an ML model, the output will not yield the intended results.
Why Isn’t Everyone Using ML?
Of the financial institutions currently not using ML, 44% haven’t yet understood the operational impact that it has. Therefore, we’ve found there is a gap in knowledge and understanding of the potential gains of using ML.
Also, the reality of implementing it is that developing models that can be relied upon to make decisions in underwriting requires an investment that you might have to fight for internally, as with anything in terms of budget allocation.
Where do People Fail?
In any ML project, 80% of the work relates to data issues - cleaning, testing data quality, and feature engineering. This is just to get to the point of having a well structured and informative input, which is what is needed for reliable algorithm models. What you put in is what you will get out.
The key here is to have well structured and informative input data!
Going back to our report, this was identified as a significant reason why so many ML projects are unsuccessful. It’s not because the method is wrong but because the pre-processing and the structure of the input is not adequate enough.
Interpreting the Data
Looking back, banks had a monopoly of the raw data. While now, they are starting to build up functions to understand and interpret this data. The opportunities being created by PSD2 make it possible for everyone to tap into new data sources. This, in turn, generates more customer-centric products and offers.
As previously mentioned, the foundation for any successful ML project is a qualitative input. This can come from a number of sources, for example, transactional data.
Transactional data is essential for accurately predicting the probability of default, as your spending pattern and income tell a lot about you as a person. For example, if you often make small purchases at the supermarket it has shown that you are more likely to default since you have a hard time planning ahead.
Transactional Data is Helpful for a Number of Reasons
Firstly, transactional data can be a completely new source of information. It contains information that cannot be obtained (or more importantly – verified) using traditional channels.
Secondly, transactional data can be refined to illustrate behaviour trends in an applicant and identify changes in behaviour that cannot be easily detected by other data sources. Perhaps a certain person had a poor credit rating a few years ago but in the last few months, they have shown more stable financial behaviours, which indicates this person should be accepted for a loan.
Finally, since cleaning data is one of the most expensive procedures, using transactional data that comes prepackaged and structured saves time and increases efficiency. Structured data is what is needed in machine learning models for automating the underwriting flows.
For many, the focus has been on the digitalisation of the credit underwriting process and on big data. There are many opportunities to use data to improve credit risk processes, e.g. fraud detection tools, faster lending decision-making tools and income verification tools. In doing this you are maximising the possibility of lending. However, you still need to either invest in a data science department or alternatively collaborate with a partner.
The Underwriting Process
Looking at it from an underwriting process perspective and in terms of machine learning, you’ll see a rising number of industry challengers who are starting to complement credit bureau data with transactional data.
For applicants who have not been approved using traditional sources of data, transactional data is an opportunity to get a better picture of their situation. Instead of requesting additional documents, which can be very time-consuming, lenders can simply ask the potential borrower to use their transactional data. This way the lender can verify a person’s income and their identity simultaneously.
Road map for Machine Learning
Machine learning models require aligning the problem with a business strategy and qualitative data. Whilst this might seem like a lofty goal, it’s the best way to start creating a machine learning model that you can base decisions on. Just like other business decisions, it’s important to set KPIs. For example, reducing the rate of NPLs, or responsibly increasing acceptance rates; so that later on, you’ll be able to measure the performance of the model.
The second step is all about the data. This is where you obtain the transactional data to clean and structure it. It sounds simple but this is where 80% of the work is centered. The data is crucial here because if data is not the solution to the problem – the problem is not something that can be solved with a machine learning model, e.g. songwriting.
The third step is the training process. This means choosing the appropriate ML method to find the best performing model. There are a lot of ML methods - XG Boost, Random forests, and K-Means to name a few - and depending on the nature of the problem, an appropriate solution is selected.
Once you have selected the most suitable model – which is the model that performs most successfully against evaluation metrics e.g. AUC and Gini – you need to put it into production to start making decisions based on the model. It is important to ensure that the KPIs from step one has been implemented, and if they have, you can deploy the model and put it into a production environment.
Finally, when the model is in production it’s important to ensure that the decisions are stable, and to do this you monitor them over time. If these steps are followed and the decisions that are being made are not stable it can be the result of behavioural changes. These changes show up in the data, but not in the chosen model. This process is iterative, so if something changes or goes wrong then you need to go back to step one.
What do you do without first-hand access to valuable data? Well, you can always use an API, similar to Instantor’s, to get started with transactional data.
Many of our customers today, who assess the creditworthiness of an applicant but do not have access to their transactional data, come to us because of our 8 years of experience in the industry. We’ve analysed billions of transactions and are providing an API that has been used by over 200 financial institutions worldwide.A great first step in the digitalisation journey can be products that fully digitally verify a borrower's income. Finally, clients also reach out to us because we can uncover these insights in a fast manner and help you augment your credit risk models.
If we look at the differences between using only one data source and using more than one, we can see uplifts in the accuracy of predictions. Applying additional data sources and predictive features to your models and decision-making processes will help you better decide who to accept and who to reject; thus improving your overall performance.