All Things Database: Education, Best Practices,
Use Cases & More

Data Warehouse and Visualizations for Credit Risk Analysis

Sai Valluri
Product Marketing Manager

Most people are dependent on credit to finance vehicles, real estate, student loans, or start small businesses. For financial institutions, assessing credit risk data is critical to determining whether to extend that credit. In this blog, we'll demonstrate how incorporating data from disparate data sources (in this case, from four data sets) allows you to better understand the primary credit risk factors and optimize financial models.

What's the best way to make that easy? By using Autonomous Data Warehouse, which gives financial institutions the flexibility to dynamically test and modify analytical models without specialized skills. We'll demonstrate how Autonomous Data Warehouse makes analyzing credit risk simpler. 

Try a Data Warehouse to Improve Your Analytics Capabilities

Analyzing Credit Risk

For many financial institutions, one key performance measure comes to mind more than any other: credit risk. A person’s credit risk score is based on financial health factors including: available credit, debt, payment history, and length of credit history. The financial factors not built into the credit score include income, bank balance, and employment status. But all of these can potentially be used to improve the credit risk model, which ultimately drives more revenue. In this blog, let's review different data sets that we will use to effectively analyze credit risk.

Understanding the Data Sets

By using data visualizations, data analysts can learn about and effectively segment the market. In this project we are connecting multiple data sources:


Data analysts generate insights by sifting through significant amounts of data that can be used in conjunction with one another. However, data from different departments can often be siloed, making it harder for an analyst to incorporate potentially valuable predictive data into the model. For example, data elements in credit risk analysis include employment history from HR, purchase history from sales, and core financial health reports from finance. By combining these data sources into a single cohesive system, analysts can create more accurate models. Financial institutions can not only reduce costs by strategically identifying their target market segment, but also better monetize their data by continuously tailoring financial products while improving service delivery.

We looked at the following questions:

  1. How are weights assigned to individual financial factors to create a model that predicts the credit risk?
  2. What is the distribution of our target market based on our credit risk model?
  3. What kinds of loans is our target market segment interested in?
  4. How is the rate of homeownership correlated with wealth brackets based on the type of loans our target market is interested in (housing loans)?
  5. What combination of attributes identifies a risk-free customer?
  6. How effective was the targeted marketing campaign based on our segmentation analysis?

To get started, we downloaded the CREDIT_SCORING_100K_V dataset. This is one of the four datasets we will be using in this project. Here’s how the different attributes are displayed in Excel. 

Let's view the data in Oracle Data Visualization Desktop now. There are multiple ways to upload data to Oracle Cloud for analysis using Oracle Autonomous Data Warehouse. For this example, we uploaded the Credit Scoring 100K data set and reviewed the data in Data Visualization Desktop.

Here’s a quick snapshot of the data from Data Visualization Desktop:

1. How are weights assigned to individual financial factors to create a model that predicts the credit risk?


In the pivot table, (on the left) we see different factors that help to determine the potential value of a customer including: credit scores, wealth, education, income, debt, and other financial measures. Each factor is given a weight based on significance and ranked. When we plot this data on a horizontal bar graph visualization, we can see all the financial factors from most to least important. This way we can visually see that a factor like wealth (IV: .54) is 10X more important than family size (IV: .04).

2. What is the distribution of our target market based on our credit risk model?

This shows the probability of good credit for various demographic factors. Adjusting the filters above (when you're in Data Visualization Desktop) to gain an understanding of what is likely to result in good credit. Each row is a person, so we can see that in our model, most people have a 52.85 or 55.26 percent probability of good credit. From this data, we can perform statistical analysis on the standard deviation to understand the target group of clients with more than 50 percent probability of good credit. 

3. What kinds of loans is our target market segment interested in?


In this visualization, we set up a pivot table to target people with a high probability of good credit as our target segment. Then we filter their credit history by delay, duly now, duly past, not taken, and risky.

From this, we can construct a treemap visualization to see the loan type of this target market segment. We see that the most common type of loan includes need based followed by housing, auto, and education loans. More than half of the loans are either need based or housing loans. 

4. How is the rate of homeownership correlated with wealth brackets based on the type of loans our target market is interested in (housing loans)?

In this visualization, we use a scatterplot to correlate the credit scores, age, and wealth (on the left). We also use pie charts to understand the rate of home ownership among different income brackets (on the right). In the scatterplot, we see that credit scores are correlated to wealth but not correlated to age. In the pie chart, homeowners are shown in green. Out of those surveyed, 22.5 percent of respondents were homeowners while 35.4 percent were tenants. When broken out by wealth, the rate of homeownership increases as you move up the income bracket.

5. What combination of attributes identifies a risk-free customer?

The network map links uses lines to link variables such as the probability of good credit, family size, and residential status. Each data point is a node and each linkage represents a relationship between two data points. In this visualization, we’ve filtered to only show individuals with more than 50 percent probability of good credit. Drilling down further into the simplified network, we can isolate a node that homeowners with 2-3 children are a demographic that often has a high probability of good credit (see below). We can continue the analysis by looking at individual customer IDs and execute a marketing campaign to acquire low-risk customers. By targeting high-value customers, we optimize a limited marketing budget and increase the effectiveness of our sales promotion.

6. How effective was the targeted marketing campaign based on our segmentation analysis?


In this line graph, we use cumulative lift to measure of how much better prediction results are compared to a baseline. In this model, the red line acts as a baseline and the yellow line represents actual results. As an example, suppose you normally have a 5 percent response rate but your most recent marketing campaign has an astonishing 20 percent response rate. The lift for that model would be 5/20 or 4. Since lift is computed using actual outcomes, analysts can compare how well a campaign performed against data on from previous campaigns.


Oracle Autonomous Database allows users to easily create data marts in the cloud with no specialized DBA skills and generate powerful business insights. It took us fewer than ten minutes to provision a database and upload data for analysis.

Analysts are always looking for ways to create a more accurate credit risk model with data. They ask for analytical capabilities to discover innovative answers to their questions. While analysts are looking for those data insights, leadership wants insights delivered in a clear and concise format to understand the business. IT can’t deal with difficult-to-manage legacy approaches requiring expensive teams with highly specialized skills. And that's where the Autonomous Data Warehouse comes into play. 

Now you can also leverage the autonomous data warehouse through a cloud trial:

Sign up for your free Autonomous Data Warehouse trial today

Please visit the blogs below for a step-by-step guide on how to start your free cloud trial: upload your data into OCI Object Store, create an Object Store Authentication Token, create a Database Credential for user and load data using the Data Import Wizard in SQL Developer:

Feedback and question welcome.

Written by Sai Valluri and Philip Li

Join the discussion

Comments ( 1 )
  • Shankar Friday, March 8, 2019
    Hi, what is the source of this credit risk data in case we want to try it out ourselves.

    The datasets (xls/files/tables) are mentioned but not where they are located.

Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.