0% found this document useful (0 votes)

25 views12 pages

Machine Learning Credit Rating Model

Lab 02 in CS 307 focuses on developing a machine learning model to predict consumer credit ratings using demographic and financial features. The project aims to create a regression model that can assess creditworthiness efficiently, leveraging techniques like cross-validation and grid search for optimization. The document includes data processing, statistical analysis, and model training methodologies to evaluate the model's performance and ethical implications.

Uploaded by

tristanyvipa

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

25 views12 pages

Machine Learning Credit Rating Model

Uploaded by

tristanyvipa

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

9/23/24, 9:13 PM Lab 02: Credit Ratings

Lab 02: Credit Ratings

AUTHOR PUBLISHED
Your Name Here September 13, 2024
Introduction
For Lab 02 in CS 307, we will develop a machine learning model to predict individual consumer credit
ratings based on demographic and financial features such as income, age, and education. The objective
is to create a regression model that could potentially allow a bank to assess customer creditworthiness
without relying on costly third-party credit agencies. Using historical credit data, our goal is to build a
model that generalizes well to unseen data and meets the performance requirements of the autograder.
By evaluating the model’s predictive accuracy and considering the ethical and practical implications, we
aim to determine whether this approach is viable for real-world application.
Throughout this lab, we will leverage cross-validation, pipelines, and grid search to optimize the model,
while adhering to scikit-learn’s best practices. We will assess model performance using Root Mean
Square Error (RMSE) and discuss the suitability of the chosen features in predicting credit ratings.
Methods
# imports
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
from sklearn.datasets import make_blobs
from sklearn.model_selection import train_test_split
from sklearn.model_selection import GridSearchCV, cross_val_score
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import r2_score
from sklearn.metrics import mean_absolute_error
from sklearn.metrics import accuracy_score
from sklearn.metrics import mean_squared_error
from sklearn.pipeline import Pipeline
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import OneHotEncoder
from sklearn.compose import ColumnTransformer
from sklearn.neighbors import KNeighborsRegressor
from sklearn.model_selection import cross_val_score
from sklearn.metrics import mean_squared_error
import matplotlib.pyplot as plt
import seaborn as sns
from joblib import dump, load

localhost:5401 1/12
9/23/24, 9:13 PM Lab 02: Credit Ratings
from datetime import datetime
from joblib import dump

Data
# load data
credit_train = pd.read_csv("https://cs307.org/lab-02/data/credit-train.csv")
credit_test = pd.read_csv("https://cs307.org/lab-02/data/credit-test.csv")

# process data for ML

# create X and y for train

X_train = credit_train.drop("Rating", axis=1)
y_train = credit_train["Rating"]

# create X and y for test

X_test = credit_test.drop("Rating", axis=1)
y_test = credit_test["Rating"]

credit_train

Rating Income Age Education Gender Student Married Ethnicity

0 257.0 44.473 81.0 16.0 Female No No NaN
1 353.0 41.532 50.0 NaN Male No Yes Caucasian
2 388.0 16.479 26.0 16.0 Male NaN No NaN
3 321.0 10.793 29.0 13.0 Male No No Caucasian
4 367.0 76.273 65.0 14.0 Female No Yes Caucasian
... ... ... ... ... ... ... ... ...
251 268.0 26.370 78.0 11.0 Male No Yes Asian
252 433.0 26.427 50.0 15.0 Female Yes Yes Asian
253 259.0 12.031 58.0 18.0 Female NaN Yes Caucasian
254 335.0 80.861 29.0 15.0 Female No Yes Asian
255 93.0 15.717 38.0 16.0 Male Yes Yes Caucasian
256 rows × 8 columns
credit_test

Rating Income Age Education Gender Student Married Ethnicity

0 527.0 94.193 44.0 16.0 NaN No Yes Caucasian
localhost:5401 2/12
9/23/24, 9:13 PM Lab 02: Credit Ratings

Rating Income Age Education Gender Student Married Ethnicity

1 347.0 44.978 30.0 10.0 Female No NaN Caucasian
2 203.0 13.676 80.0 16.0 Female No No African American
3 205.0 44.522 72.0 15.0 Male No Yes Asian
4 291.0 12.581 48.0 16.0 NaN NaN Yes Caucasian
... ... ... ... ... ... ... ... ...
59 129.0 18.951 82.0 13.0 Female No No NaN
60 817.0 140.672 46.0 9.0 Male No Yes African American
61 387.0 19.636 64.0 10.0 Female No No African American
62 410.0 49.794 40.0 8.0 Male No No Caucasian
63 259.0 57.202 72.0 11.0 Female No No Caucasian
64 rows × 8 columns
y_train

0 257.0
1 353.0
2 388.0
3 321.0
4 367.0
...
251 268.0
252 433.0
253 259.0
254 335.0
255 93.0
Name: Rating, Length: 256, dtype: float64

y_test

0 527.0
1 347.0
2 203.0
3 205.0
4 291.0
...
59 129.0
60 817.0
61 387.0
62 410.0
63 259.0
Name: Rating, Length: 64, dtype: float64

localhost:5401 3/12
9/23/24, 9:13 PM Lab 02: Credit Ratings

In this section, we calculate and display summary statistics for student status and marriage status. We
also show correlation of age and income with ratings. These statistics help in understanding the dataset
of credit prediction.
# summary statistics

# Calculate mean and standard deviation credit rating score

mean_rating = y_train.mean()
std_rating = y_train.std()
print(mean_rating)
print(std_rating)

347.609375
148.8931046679028

# Calculate mean and standard deviation based on student status

student_stats = credit_train.groupby('Student')['Rating'].agg(['mean', 'std','count']).re
print(student_stats)

Student mean std count

0 No 349.511737 148.003525 213
1 Yes 338.473684 149.164476 19

# Calculate mean and standard deviation based on marriage status

marriage_stats = credit_train.groupby('Married')['Rating'].agg(['mean', 'std','count']).r
print(marriage_stats)

Married mean std count

0 No 329.397959 132.039649 98
1 Yes 358.905063 157.797935 158

# Correlation of Age and Income with Rating

correlations = credit_train[['Rating', 'Age', 'Income']].corr()

print("\nCorrelation with Rating:")

print(correlations['Rating'])

Correlation with Rating:

Rating 1.000000
Age 0.143032
Income 0.770842
Name: Rating, dtype: float64

# Proportion of missing values for each variable

missing_data = credit_train.isnull().mean()

localhost:5401 4/12
9/23/24, 9:13 PM Lab 02: Credit Ratings
print("\nMissing Data (Proportion Missing):")
print(missing_data)

Missing Data (Proportion Missing):

Rating 0.000000
Income 0.000000
Age 0.128906
Education 0.097656
Gender 0.023438
Student 0.093750
Married 0.000000
Ethnicity 0.097656
dtype: float64

# visualizations

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np

# Load the data

credit_train = pd.read_csv("https://cs307.org/lab-02/data/credit-train.csv")

# Custom color palette

custom_palette = ["#143294", "#FF5F05"]
sns.set_theme(style="whitegrid")

# Function to add jitter (optional for scatter plot)

def add_jitter(data, jitter_amount=0.2):
return data + np.random.uniform(-jitter_amount, jitter_amount, size=data.shape)

# Create a FacetGrid for scatter plot (Income vs Age)

g = sns.FacetGrid(credit_train, col="Gender", height=4, aspect=1, palette=custom_palette,
g.map_dataframe(sns.scatterplot, x=add_jitter(credit_train['Age']), y="Income")

# Add legend, axis labels, and titles

g.add_legend(title="Gender")
g.set_axis_labels('Age', 'Income ($1000s)')
g.set_titles(col_template='Gender: {col_name}')

# Customize grid and appearance

for ax in g.axes.flat:
ax.grid(True, which='both', linestyle="--", color="gray", alpha=0.7)
ax.set_facecolor('#f9f9f9')
for spine in ax.spines.values():
spine.set_edgecolor("black")
spine.set_linewidth(1.5)
ax.spines["top"].set_visible(False)

localhost:5401 5/12
9/23/24, 9:13 PM Lab 02: Credit Ratings
ax.spines["right"].set_visible(False)

g.figure.suptitle("Income vs Age by Gender", fontsize=16)

g.figure.tight_layout(rect=[0, 0, 1, 0.95])
plt.show()

# Boxplot for Credit Rating by Gender or Student

plt.figure(figsize=(10, 6))
sns.boxplot(x=credit_train['Gender'], y=credit_train['Rating'], palette=custom_palette)
plt.title('Distribution of Credit Ratings by Gender', fontsize=16)
plt.xlabel('Gender')
plt.ylabel('Credit Rating')
plt.grid(True, which='both', linestyle="--", color="lightgrey", alpha=0.7)
plt.show()

# Boxplot for Credit Rating by Student Status

plt.figure(figsize=(10, 6))
sns.boxplot(x=credit_train['Student'], y=credit_train['Rating'], palette=custom_palette)
plt.title('Distribution of Credit Ratings by Student Status', fontsize=16)
plt.xlabel('Student')
plt.ylabel('Credit Rating')
plt.grid(True, which='both', linestyle="--", color="lightgrey", alpha=0.7)
plt.show()

/var/folders/x_/71wrj7jx5474n6st5fwkvj7r0000gn/T/ipykernel_75858/2219617321.py:44:
FutureWarning:

Passing `palette` without assigning `hue` is deprecated and will be removed in v0.14.0.
Assign the `x` variable to `hue` and set `legend=False` for the same effect.

sns.boxplot(x=credit_train['Gender'], y=credit_train['Rating'], palette=custom_palette)

localhost:5401 6/12
9/23/24, 9:13 PM Lab 02: Credit Ratings

/var/folders/x_/71wrj7jx5474n6st5fwkvj7r0000gn/T/ipykernel_75858/2219617321.py:53:
FutureWarning:

Passing `palette` without assigning `hue` is deprecated and will be removed in v0.14.0.
Assign the `x` variable to `hue` and set `legend=False` for the same effect.

sns.boxplot(x=credit_train['Student'], y=credit_train['Rating'],
palette=custom_palette)

localhost:5401 7/12
9/23/24, 9:13 PM Lab 02: Credit Ratings

Models
# process data for ML

# create X and y for train

X_train = credit_train.drop("Rating", axis=1)
y_train = credit_train["Rating"]

# create X and y for test

X_test = credit_test.drop("Rating", axis=1)
y_test = credit_test["Rating"]

# train models

# Define the numerical and categorical features as per your dataset

numeric_columns = ['Age', 'Income', 'Education']
categorical_columns = ['Gender', 'Student', 'Married', 'Ethnicity']

# Pipeline to preprocess numeric data (using median for missing values)

numeric_pipeline = Pipeline(steps=[
('imputer', SimpleImputer(strategy='median'))
])

# Pipeline to preprocess categorical data (using mode and one-hot encoding)

localhost:5401 8/12
9/23/24, 9:13 PM Lab 02: Credit Ratings
categorical_pipeline = Pipeline(steps=[
('imputer', SimpleImputer(strategy='most_frequent')),
('encoder', OneHotEncoder(handle_unknown='ignore', drop='first'))
])

# Combine preprocessing steps for both numeric and categorical data

preprocessor = ColumnTransformer(transformers=[
('num', numeric_pipeline, numeric_columns),
('cat', categorical_pipeline, categorical_columns)
])

# Build the final pipeline with preprocessing and KNeighborsRegressor

knn_pipeline = Pipeline(steps=[
('preprocessing', preprocessor),
('model', KNeighborsRegressor())
])

# Fit the preprocessor on training data and transform it

preprocessor.fit(X_train)
X_train_preprocessed = pd.DataFrame(
preprocessor.transform(X_train),
columns=preprocessor.get_feature_names_out()
)

# Preprocess the test set using the fitted preprocessor

X_test_preprocessed = pd.DataFrame(
preprocessor.transform(X_test),
columns=preprocessor.get_feature_names_out()
)

# Display first few rows of preprocessed test set

X_test_preprocessed.head()

# Tuning the model by finding the best k (number of neighbors)

k_values = np.arange(1, 50)
best_k_rmse = []

for k in k_values:
knn = KNeighborsRegressor(n_neighbors=k)
scores = cross_val_score(knn, X_train_preprocessed, y_train, cv=5, scoring='neg_mean_
rmse_scores = np.sqrt(-scores)
best_k_rmse.append(rmse_scores.mean())

# Get the k with the lowest RMSE

best_k = k_values[np.argmin(best_k_rmse)]
print(f"Best k: {best_k}")

localhost:5401 9/12
9/23/24, 9:13 PM Lab 02: Credit Ratings

Best k: 19

# Initialize and train the final model with the best k value
final_model_pipeline = Pipeline(steps=[
('preprocessing', preprocessor),
('model', KNeighborsRegressor(n_neighbors=best_k))
])

# Fit the model to the training data

final_model_pipeline.fit(X_train, y_train)

# Predict using the test data

predictions = final_model_pipeline.predict(X_test)

# Evaluate the model using Root Mean Squared Error (RMSE)

test_rmse = mean_squared_error(y_test, predictions, squared=False)
print(f"Test RMSE: {test_rmse}")

Test RMSE: 105.67237377851484

/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-
packages/sklearn/metrics/_regression.py:492: FutureWarning: 'squared' is deprecated in
version 1.4 and will be removed in 1.6. To calculate the root mean squared error, use the
function'root_mean_squared_error'.
warnings.warn(

Results
# report model metrics
# Test the final model on unseen test data
pred_test = final_model_pipeline.predict(X_test)

# Calculate RMSE on test data

test_rmse = np.sqrt(mean_squared_error(y_test, pred_test))
print(f"Test RMSE: {test_rmse}")

# Calculate MAE on test data

mae = mean_absolute_error(y_test, pred_test)
print(f"Test MAE: {mae}")

# Calculate R-squared score

r2 = r2_score(y_test, pred_test)
print(f"R-squared: {r2}")

Test RMSE: 105.67237377851484

Test MAE: 88.33963815789474
R-squared: 0.5495112808660372

localhost:5401 10/12
9/23/24, 9:13 PM Lab 02: Credit Ratings

# report model metrics

from joblib import dump
dump(final_model_pipeline, "credit-ratings.joblib")

['credit-ratings.joblib']

The final model was tested on unseen data, and the following metrics were obtained:
Root Mean Squared Error: The model achieved an RMSE of 105.67. This value represents the average
difference between the actual and predicted credit ratings in the test set, giving an indication of the
prediction error magnitude. A lower RMSE is generally preferred, as it suggests more accurate
predictions.
Mean Absolute Error: The MAE is 88.34, meaning that, on average, the predicted credit scores differ
from the actual scores by around 88 points. This metric offers a straightforward interpretation of
prediction error across all data points.
R-squared: The R² score is 0.55, indicating that the model explains 54.95% of the variance in the test
data. While this is a moderate value, it implies that the model captures just over half of the variability in
the credit scores based on the selected features.
Discussion
Discussion The results indicate that the credit rating model, while functional, has room for improvement.
With an RMSE of 105.67, the model’s predictions are somewhat off from the actual values, suggesting
that the features used (age, income, education, etc.) do not fully explain the variation in credit scores.
The MAE of 88.34 supports this, highlighting a significant average error in the predicted scores.
The R² value of 0.55 indicates that just over half of the variance in credit scores is explained by the
model. While this shows that the model is capturing some meaningful relationships between the
features and the target, there is still 45% of the variance unaccounted for, which suggests that
additional or more informative features (e.g., detailed credit history, financial behavior patterns) may be
needed to improve the model’s accuracy.
Limitations: One potential limitation of the model is that the features chosen might not fully capture all
relevant factors affecting credit scores. While age, income, and education are important, incorporating
more nuanced financial metrics, such as credit utilization ratio, debt-to-income ratio, or payment
history, might yield more accurate predictions.
Potential Improvements: To improve the model, several strategies could be explored:
Feature Engineering: Adding new features that directly impact creditworthiness, such as the number of
open credit lines or recent credit inquiries, could improve the model’s predictive power. Model
Selection: Trying different regression models, such as Random Forest Regressor or Gradient Boosting,
could yield better results by capturing non-linear relationships between features and the target.
Hyperparameter Tuning: Further tuning the model’s parameters (e.g., experimenting with the number of
localhost:5401 11/12
9/23/24, 9:13 PM Lab 02: Credit Ratings

neighbors in KNeighborsRegressor or adjusting regularization in linear models) could reduce prediction

error and improve the overall performance.
Conclusion
In this lab, we developed a credit rating prediction model using demographic and financial features such
as age, income, and education. The model was evaluated on a test set, achieving a Root Mean Squared
Error (RMSE) of 105.67, a Mean Absolute Error (MAE) of 88.34, and an R-squared (R²) score of 0.55.
These results indicate that while the model captures some meaningful relationships between the
features and credit scores, there is still significant room for improvement.
The R² score of 0.55 suggests that the model explains approximately 55% of the variance in credit
scores, leaving 45% unexplained. This highlights the need for additional features, such as more detailed
financial history or behavioral metrics, to further enhance prediction accuracy. Moreover, with an
average error of 88.34 credit points, the model’s predictions, while reasonable, may not yet be reliable
enough for high-stakes decision-making.
Moving forward, the model could benefit from further feature engineering, incorporating additional
variables like credit history or debt-to-income ratios. Additionally, experimenting with more advanced
machine learning algorithms, such as Random Forests or Gradient Boosting, could help capture non-
linear relationships between features and improve overall performance.
In conclusion, while the current model provides a good starting point for predicting credit ratings,
refining the feature set and exploring more complex algorithms will be essential to achieving the level of
accuracy required for practical applications. With these improvements, the model could become a
valuable tool in assessing creditworthiness, potentially offering banks and financial institutions a cost-
effective alternative to traditional credit assessments.

localhost:5401 12/12

Kritika Sejwal 24MCI10023 ML Lab Project Report
No ratings yet
Kritika Sejwal 24MCI10023 ML Lab Project Report
10 pages
Progress Report 2
No ratings yet
Progress Report 2
10 pages
Final-12-Lab Programs
No ratings yet
Final-12-Lab Programs
30 pages
Python Code For Loan Default Prediction
No ratings yet
Python Code For Loan Default Prediction
4 pages
Final Project Making Predictions From Data-Course 2: October 6, 2020
No ratings yet
Final Project Making Predictions From Data-Course 2: October 6, 2020
20 pages
LOan Final
No ratings yet
LOan Final
6 pages
Supervised Learning
100% (1)
Supervised Learning
15 pages
Loan Status Prediction
No ratings yet
Loan Status Prediction
23 pages
Machine Learning Techniques For Predicting Credit Approvals: Prawar Mundra 2018IMG-037
No ratings yet
Machine Learning Techniques For Predicting Credit Approvals: Prawar Mundra 2018IMG-037
16 pages
Build Naive Bayes Model for Churn Prediction
No ratings yet
Build Naive Bayes Model for Churn Prediction
9 pages
Loan Default Prediction System 1753830667
No ratings yet
Loan Default Prediction System 1753830667
11 pages
DSBDA Practicals
No ratings yet
DSBDA Practicals
16 pages
Bank Marketing Targets 1724510938
No ratings yet
Bank Marketing Targets 1724510938
13 pages
Bank Customer Churn Analysis - Jupyter Notebook
No ratings yet
Bank Customer Churn Analysis - Jupyter Notebook
11 pages
Credit Scores Classification
No ratings yet
Credit Scores Classification
104 pages
Deep Learning Practical File
No ratings yet
Deep Learning Practical File
18 pages
Prathamesh KRAI
No ratings yet
Prathamesh KRAI
38 pages
DAV Lab Manual Yashraj
No ratings yet
DAV Lab Manual Yashraj
28 pages
ML Lab Exp-5
No ratings yet
ML Lab Exp-5
3 pages
Project 2
No ratings yet
Project 2
5 pages
Logistic Regression
No ratings yet
Logistic Regression
4 pages
Loan Approval
No ratings yet
Loan Approval
12 pages
Source Code
No ratings yet
Source Code
20 pages
Predicting Credit Risk 1713295035
No ratings yet
Predicting Credit Risk 1713295035
19 pages
Machine Learning With PySpark and MLlib - Solving A Binary Classification Problem - by Susan Li - Towards Data Science
No ratings yet
Machine Learning With PySpark and MLlib - Solving A Binary Classification Problem - by Susan Li - Towards Data Science
10 pages
#Group: B (ML) : Numpy NP Pandas PD
No ratings yet
#Group: B (ML) : Numpy NP Pandas PD
9 pages
'Universalbank - CSV': #Reading The File
No ratings yet
'Universalbank - CSV': #Reading The File
4 pages
DSC Project 442
No ratings yet
DSC Project 442
12 pages
Credit Card Default Clients Prediction 1693295790
No ratings yet
Credit Card Default Clients Prediction 1693295790
23 pages
Machine Learning File
No ratings yet
Machine Learning File
28 pages
Loan Prediction
No ratings yet
Loan Prediction
33 pages
Reading Data: #Importing Required Libraries
No ratings yet
Reading Data: #Importing Required Libraries
16 pages
Project Paarth
No ratings yet
Project Paarth
21 pages
Classification Problems
100% (1)
Classification Problems
25 pages
Machine Learning Project
No ratings yet
Machine Learning Project
29 pages
Ai Labtask13
No ratings yet
Ai Labtask13
3 pages
Zindi Financial Inclusion Guide
No ratings yet
Zindi Financial Inclusion Guide
12 pages
Naive Bayes Vs Logistic Regression
No ratings yet
Naive Bayes Vs Logistic Regression
16 pages
PA v0.25
No ratings yet
PA v0.25
18 pages
Group 5 Dseb64a Report
No ratings yet
Group 5 Dseb64a Report
10 pages
Week 4 LAB
No ratings yet
Week 4 LAB
26 pages
DS 8
No ratings yet
DS 8
6 pages
Machine Learning Paper BD
No ratings yet
Machine Learning Paper BD
16 pages
Articles Xgboost Classification With Smote-Enn Algorithm
No ratings yet
Articles Xgboost Classification With Smote-Enn Algorithm
11 pages
Ids Lab
No ratings yet
Ids Lab
14 pages
Naive Bayes Model With Python 1684166563
No ratings yet
Naive Bayes Model With Python 1684166563
9 pages
Building Logistic Regression Model in Python
No ratings yet
Building Logistic Regression Model in Python
24 pages
ML Practical 205160694034
No ratings yet
ML Practical 205160694034
33 pages
Loan Approval Prediction Python
No ratings yet
Loan Approval Prediction Python
6 pages
Modelling and Simmulation Assignment - Ipynb - Colab
No ratings yet
Modelling and Simmulation Assignment - Ipynb - Colab
7 pages
Feature Engineering - 01
No ratings yet
Feature Engineering - 01
31 pages
ML Assignment 5
No ratings yet
ML Assignment 5
8 pages
SPPUML3
No ratings yet
SPPUML3
12 pages
Credit Risk Modeling in Python Chapter3
No ratings yet
Credit Risk Modeling in Python Chapter3
35 pages
ML Complete Notes Hridoy
No ratings yet
ML Complete Notes Hridoy
5 pages
Machine Learning Evaluation Guide
100% (1)
Machine Learning Evaluation Guide
504 pages
Series Parallel L Study Guide
No ratings yet
Series Parallel L Study Guide
2 pages
FRM0014 LQA Form - No Macros
No ratings yet
FRM0014 LQA Form - No Macros
23 pages
Varco Ibop Valves PDF
100% (1)
Varco Ibop Valves PDF
50 pages
Goldmedal Essenza
No ratings yet
Goldmedal Essenza
24 pages
Cell Phone Addiction and Psychological and Physiol
No ratings yet
Cell Phone Addiction and Psychological and Physiol
4 pages
Business Process Reengineering
No ratings yet
Business Process Reengineering
3 pages
Absentee Parents
No ratings yet
Absentee Parents
4 pages
The Role of Intelligence in National Security: Stan A AY O
100% (2)
The Role of Intelligence in National Security: Stan A AY O
20 pages
Linear Control Systems Lecture # 8 Observability & Discrete-Time Systems
No ratings yet
Linear Control Systems Lecture # 8 Observability & Discrete-Time Systems
25 pages
Brochure of ICP-OES MICS Full Spectrum Direct Reading Inductively Coupled Plasma Emission Spectrometer
No ratings yet
Brochure of ICP-OES MICS Full Spectrum Direct Reading Inductively Coupled Plasma Emission Spectrometer
4 pages
Finite Element Analysis - 2 Marks - All 5 Units
77% (31)
Finite Element Analysis - 2 Marks - All 5 Units
13 pages
Decision Theory & Analysis Module
No ratings yet
Decision Theory & Analysis Module
4 pages
DRCS Cover - To Author PDF
0% (1)
DRCS Cover - To Author PDF
1 page
Web Engineering Lec 08
No ratings yet
Web Engineering Lec 08
20 pages
Address Update Aadhar
No ratings yet
Address Update Aadhar
1 page
Year 2 Daily Lesson Plans: By:Missash
No ratings yet
Year 2 Daily Lesson Plans: By:Missash
5 pages
Project Report On BOPP FILM AND ADHESIVE TAPE
No ratings yet
Project Report On BOPP FILM AND ADHESIVE TAPE
7 pages
Social Psychology Lesson Notes
No ratings yet
Social Psychology Lesson Notes
2 pages
BS 160 Selected Pages
No ratings yet
BS 160 Selected Pages
52 pages
Taekwondo in Horn of Africa
No ratings yet
Taekwondo in Horn of Africa
13 pages
Outline and Evaluate The MSM
No ratings yet
Outline and Evaluate The MSM
2 pages
Ao Smith Motors PDF
100% (1)
Ao Smith Motors PDF
59 pages
6097473R1 TB302 ForkModInspect
No ratings yet
6097473R1 TB302 ForkModInspect
2 pages
Entrepreneurship Skills
No ratings yet
Entrepreneurship Skills
6 pages
Class 10 Math Exam Attendance
No ratings yet
Class 10 Math Exam Attendance
5 pages
Efficiency of Air Curtains Used For Separating Smoke Free Zones in Case of Fire
No ratings yet
Efficiency of Air Curtains Used For Separating Smoke Free Zones in Case of Fire
6 pages
RRL Use of Visual Schedules To Improve Classroom Transitions For Students With Autism Spectrum Disorder
No ratings yet
RRL Use of Visual Schedules To Improve Classroom Transitions For Students With Autism Spectrum Disorder
6 pages
Assignment 4
No ratings yet
Assignment 4
12 pages
Davidsunil: Key Skills: HW Platforms
No ratings yet
Davidsunil: Key Skills: HW Platforms
3 pages
Ematscan Helps Lower Costs and Raise Crack-Detection Confidence For Gas-Pipeline Operators
No ratings yet
Ematscan Helps Lower Costs and Raise Crack-Detection Confidence For Gas-Pipeline Operators
2 pages

Machine Learning Credit Rating Model

Uploaded by

Machine Learning Credit Rating Model

Uploaded by

9/23/24, 9:13 PM Lab 02: Credit Ratings

Lab 02: Credit Ratings

# process data for ML

# create X and y for train

# create X and y for test

Rating Income Age Education Gender Student Married Ethnicity

Rating Income Age Education Gender Student Married Ethnicity

Rating Income Age Education Gender Student Married Ethnicity

# Calculate mean and standard deviation credit rating score

# Calculate mean and standard deviation based on student status

Student mean std count

# Calculate mean and standard deviation based on marriage status

Married mean std count

# Correlation of Age and Income with Rating

print("\nCorrelation with Rating:")

Correlation with Rating:

# Proportion of missing values for each variable

Missing Data (Proportion Missing):

# Load the data

# Custom color palette

# Function to add jitter (optional for scatter plot)

# Create a FacetGrid for scatter plot (Income vs Age)

# Add legend, axis labels, and titles

# Customize grid and appearance

g.figure.suptitle("Income vs Age by Gender", fontsize=16)

# Boxplot for Credit Rating by Gender or Student

# Boxplot for Credit Rating by Student Status

sns.boxplot(x=credit_train['Gender'], y=credit_train['Rating'], palette=custom_palette)

# create X and y for train

# create X and y for test

# Define the numerical and categorical features as per your dataset

# Pipeline to preprocess numeric data (using median for missing values)

# Pipeline to preprocess categorical data (using mode and one-hot encoding)

# Combine preprocessing steps for both numeric and categorical data

# Build the final pipeline with preprocessing and KNeighborsRegressor

# Fit the preprocessor on training data and transform it

# Preprocess the test set using the fitted preprocessor

# Display first few rows of preprocessed test set

# Tuning the model by finding the best k (number of neighbors)

# Get the k with the lowest RMSE

# Fit the model to the training data

# Predict using the test data

# Evaluate the model using Root Mean Squared Error (RMSE)

Test RMSE: 105.67237377851484

# Calculate RMSE on test data

# Calculate MAE on test data

# Calculate R-squared score

Test RMSE: 105.67237377851484

# report model metrics

neighbors in KNeighborsRegressor or adjusting regularization in linear models) could reduce prediction

You might also like