0% found this document useful (0 votes)
57 views5 pages

Credit Risk Project, Installment 1: Indian School of Business

The document provides instructions for a credit risk project involving analysis of loan data. It defines a key performance metric called PRSM and instructs students to calculate this for each loan. It then poses two questions: 1) Analyze the distribution of PRSM scores and determine if the empirical rule applies. Remedy any anomalies to improve fit. 2) Examine the distribution of "Years in Business" and the effect of taking its log transform with an offset of 1.

Uploaded by

TANAY SETH
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
57 views5 pages

Credit Risk Project, Installment 1: Indian School of Business

The document provides instructions for a credit risk project involving analysis of loan data. It defines a key performance metric called PRSM and instructs students to calculate this for each loan. It then poses two questions: 1) Analyze the distribution of PRSM scores and determine if the empirical rule applies. Remedy any anomalies to improve fit. 2) Examine the distribution of "Years in Business" and the effect of taking its log transform with an offset of 1.

Uploaded by

TANAY SETH
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5

Indian School of Business

Credit Risk Project, Installment 1


Your answers to the following questions are to be submitted in a single report document. The report
file must have a separate cover page that identifies the team (e.g., J-1) and lists the members of the
team who are participating in the project. Number the subsequent pages and format them to have 1-
inch margins all around. Include only plots that are discussed in your report. Reports are to be
submitted via the LMS by 23:55hrs on Monday, 1nd Aug.

These questions use your team project dataset. For the sake of checking conditions, assume that the
cases represent a simple random sample from the population of loans described in the project
description. You do not have all of the loans, just a small sample out of the many thousands handled
by this lender. For these questions, use only the 628 loans with complete descriptions in your data
table.1

A key variable in the remainder of the analysis is defined as follows. The lender uses a performance
metric known as PRSM, performance ratio at six months. To construct PRSM, divide two times the
amount repaid at six months by the total amount to be repaid:

PRSM = 2 *( Amount repaid at six months / Total amount to be repaid )

PRSM should be approximately equal to 1 if the payments at 6 months are on track to fulfill the debt
at the end of the year. Values of PRSM < 1 indicate a loan for which the payments are currently
coming in slower than expected; PRSM > 1 indicates a loan that is being paid off faster than
expected. You will need to create this column in your JMP dataset using the formula calculator. The
formula calculator manipulates columns in the data table.2
1. (a) Using both graphics and descriptive statistics describe the shape and form of the
distribution of the PRSM score. Does it appear reasonable to use the Empirical Rule with this
variable?
(b) If you were to remedy any anomalies with this variable how well would the Empirical
Rule work now?
Solution:
(a)

From above graph we can state that distribution of PRSM score is not bell shaped and is rightly
skewed because of an outlier entry number 550, case # 39316396 with value: 373.30068.
Moreover, we can also compare skewness and kurtosis from summary statistics which are
25.05776 and 627.927; for normal distribution it should be around 0. Thus, it is not normal
distribution.

Further, as shown in normal quantile plot, most of the data set are landing out of diagonals (Red
dashed part). Thus this proves that the distribution if PRSM score does not follow normal
distribution.

So as the distribution is not following the normal distribution, we can not apply the concept of
empirical rule aka 68-95-99.7 rule, which says that 68 %, 95 %, and 99.7 % of the data set lie within
one, two and three standard deviation from the mean of the population.
(b)

As we have seen in part above, we have an outlier at entry number 550, case # 39316396 with
value: 373.30068. This specific entry is making our distribution rightly skewed. So in order to
eliminate this, we exclude this entry and redraw the graph:

Now we can see that the distribution of PRSM score is bell shaped curve and it follows normal
distribution graphically.
Also, skewness and kurtosis for this updated data set is 0.0199966 and -0.143193 respectively;
very close to 0. Moreover, all the entries of PRSM score in normal quantile plot are lying between
the diagonals (red dotted lines).
Thus we can say that this PRSM score distribution curve follows normal distribution and we can
easily apply empirical rule (68-95-99.7 rule) in this case.
2. (a) The variable Years in Business may ultimately be useful in forecasting the PRSM
score. Comment on the shape of the distribution of this predictor.

(b) It is common (but not a requirement) to take the log of a variable that displays a shape
such as this variable does. Further, the log transform is not defined for the value of zero, so when a
variable contains zero values, we modify the log transform to include an offset term. The most
common offset is 1. Comment on the distribution of log(1 + Years in Business).

Solution
(a)

From above graph and summary statistics (skewness) we can state that distribution of
variables ‘Year in business” is rightly skewed.

Moreover, from normal quantile plot we can restate that distribution doesn’t follow normal
distribution and is rightly skewed.
(b) Using log transformation (with offset) for ‘year in businesses, and distribution is:

From above details we can suggest that though skewness of data variable ‘log (year in business +1)’
is decreased and data is brought closer. However, still the distribution is not normally distributed.

You might also like