0% found this document useful (0 votes)

3K views2 pages

T15 Hand-On Solution Id 80827

The document contains a Python script that utilizes PySpark for data processing, specifically for loading, cleaning, and analyzing loan data from an S3 bucket. It includes functions to read data, clean it by removing nulls and duplicates, and perform analysis to calculate income-to-installment ratios and default rates based on loan purposes. Additionally, the script provides functionality to load processed data into Redshift and S3, with placeholders for bucket names and JDBC URLs.

Uploaded by

kotresh7477

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3K views2 pages

T15 Hand-On Solution Id 80827

Uploaded by

kotresh7477

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

You are on page 1/ 2

# -- coding: utf-8 --

import os
import shutil
import pyspark
from pyspark.sql.window import Window
from pyspark.sql import SparkSession
from pyspark.sql.functions import *
from pyspark.sql.types import *
import traceback

def read_data(spark, customSchema):

#Mention the Bucket name inside the bucket_name variable

bucket_name = "loan-data1234" # Replace with your bucket name
s3_input_path = "s3://" + bucket_name + "/inputfile/loan_data.csv"

df = spark.read.csv(s3_input_path, header=True, schema=customSchema)

return df

def clean_data(input_df):

df = input_df.dropna().dropDuplicates()
df = df.filter(df.purpose != 'null')

return df

def s3_load_data(data,file_name):

#Mention the bucket name inside the bucket_name variable

bucket_name = "loan-data1234"
output_path = "s3://" + bucket_name + "/output"+ file_name

if data.count() != 0:
print("Loading the data", output_path)
#write the s3 load data command here
data.coalesce(1).write.csv(output_path, header=True, mode="overwrite")

else:
print("Empty dataframe, hence cannot save the data", output_path)

def result_1(input_df):

df = input_df.filter((col("purpose") == "educational") | (col("purpose") ==

"small_business"))
df = df.withColumn("income_to_installment_ratio", col("log_annual_inc") /
col("installment"))
df = df.withColumn("int_rate_category",
when(col("int_rate") < 0.1, "low")
.when((col("int_rate") >= 0.1) & (col("int_rate") < 0.15), "medium")
.otherwise("high")
)
df = df.withColumn("high_risk_borrower",
when((col("dti") > 20) | (col("fico") < 700) | (col("revol_util") > 80), 1)
.otherwise(0)
)

return df

def result_2(input_df):

df = input_df.groupBy("purpose").agg(
(sum(col("not_fully_paid")) / count("*")).alias("default_rate")
)
df = df.withColumn("default_rate", round(col("default_rate"), 2))

return df

def redshift_load_data(data):

if data.count() != 0:
print("Loading the data into Redshift...")
jdbcUrl = "your-jdbc-url" # Replace with your Redshift JDBC URL
username = "awsuser"
password = "Awsuser1"
table_name = "result_2"

#Write the redshift load data command here

data.write \
.format("jdbc") \
.option("url", jdbcUrl) \
.option("dbtable", table_name) \
.option("user", username) \
.option("password", password) \
.mode("overwrite") \
.save()

else:
print("Empty dataframe, hence cannot load the data")

PDF Course Id 51803 Rio Application Operation Competency - Compress
No ratings yet
PDF Course Id 51803 Rio Application Operation Competency - Compress
9 pages
T13 Answers Ion PDF
No ratings yet
T13 Answers Ion PDF
20 pages
Untitled Document
No ratings yet
Untitled Document
13 pages
Essentials (55104)
No ratings yet
Essentials (55104)
4 pages
Wings1 T1 Nodejs APIs (62637)
No ratings yet
Wings1 T1 Nodejs APIs (62637)
3 pages
52502
No ratings yet
52502
15 pages
JIRA Respuestas
No ratings yet
JIRA Respuestas
4 pages
Final MCQ
No ratings yet
Final MCQ
46 pages
T15 AWSAnalyticsAndAI ProblemStatement Mocktest
No ratings yet
T15 AWSAnalyticsAndAI ProblemStatement Mocktest
14 pages
Business Skills Track 2 Complete Notes PDF
100% (1)
Business Skills Track 2 Complete Notes PDF
31 pages
ProblemStatement For t4 Track
No ratings yet
ProblemStatement For t4 Track
4 pages
Wings T4 Programming Java Handson
No ratings yet
Wings T4 Programming Java Handson
3 pages
Untitled Document
No ratings yet
Untitled Document
3 pages
Tableau Assessment
No ratings yet
Tableau Assessment
17 pages
JavaScript Worklist Handson Solution Ievolve 57714
No ratings yet
JavaScript Worklist Handson Solution Ievolve 57714
3 pages
67754
No ratings yet
67754
2 pages
Tableau Sequel
No ratings yet
Tableau Sequel
5 pages
Course Id - 58755
No ratings yet
Course Id - 58755
14 pages
Wings1 Articulation Set
No ratings yet
Wings1 Articulation Set
5 pages
2
No ratings yet
2
16 pages
Articulation Dump
No ratings yet
Articulation Dump
98 pages
Change Datatypes and Return Required Json Data
No ratings yet
Change Datatypes and Return Required Json Data
1 page
PL SQL MCQ With Answers
No ratings yet
PL SQL MCQ With Answers
20 pages
Anti Bribery and Corruption Training - Part 2 - Completion - Certificate
No ratings yet
Anti Bribery and Corruption Training - Part 2 - Completion - Certificate
1 page
17818
No ratings yet
17818
2 pages
Change Management & Scrum FAQs
No ratings yet
Change Management & Scrum FAQs
23 pages
Business Skill 2 Poll Questions
No ratings yet
Business Skill 2 Poll Questions
23 pages
New Notes 4pages
No ratings yet
New Notes 4pages
4 pages
Untitled Document
No ratings yet
Untitled Document
4 pages
Tcs EDA Question
0% (1)
Tcs EDA Question
5 pages
Wings1 Process Notes
No ratings yet
Wings1 Process Notes
73 pages
Fresco Play Hands On Answers
33% (3)
Fresco Play Hands On Answers
2 pages
New Notes 10pages
No ratings yet
New Notes 10pages
11 pages
Email
100% (3)
Email
1 page
Ethics and Compliance Quiz
No ratings yet
Ethics and Compliance Quiz
12 pages
Wings1 T1 ReactJS Application (62636)
No ratings yet
Wings1 T1 ReactJS Application (62636)
5 pages
Articulation
No ratings yet
Articulation
6 pages
AFI Updated Flyer - LBC2022
0% (2)
AFI Updated Flyer - LBC2022
2 pages
This Study Resource Was: Section-1 5 2 10M Stereotypes
No ratings yet
This Study Resource Was: Section-1 5 2 10M Stereotypes
8 pages
Assessment Completion Notice
No ratings yet
Assessment Completion Notice
1 page
Biz Skill 2 MCQ With Answers
No ratings yet
Biz Skill 2 MCQ With Answers
21 pages
Wings1 Biz Skills Track 2 MCQs - May 22
No ratings yet
Wings1 Biz Skills Track 2 MCQs - May 22
13 pages
Phone Directory E2 Stage 1
0% (1)
Phone Directory E2 Stage 1
3 pages
Java Fabric Inventory Program
No ratings yet
Java Fabric Inventory Program
2 pages
T12 Se
No ratings yet
T12 Se
11 pages
Assessment - LTEB
No ratings yet
Assessment - LTEB
15 pages
What Drill Includes Staff Dispersal Test? Simulation Drill Restora On Drill Drill Above
No ratings yet
What Drill Includes Staff Dispersal Test? Simulation Drill Restora On Drill Drill Above
10 pages
Wings 1 Sop 2024 Jan
No ratings yet
Wings 1 Sop 2024 Jan
6 pages
Articulation PDF
No ratings yet
Articulation PDF
25 pages
Ans
100% (2)
Ans
58 pages
Of 1jor: Kiqlut
No ratings yet
Of 1jor: Kiqlut
9 pages
Wa0000.
No ratings yet
Wa0000.
5 pages
TCS Knome: Internal Social Collaboration Platform
100% (2)
TCS Knome: Internal Social Collaboration Platform
13 pages
17818
No ratings yet
17818
2 pages
Estimation - SPACE - E0 - Assessment
100% (1)
Estimation - SPACE - E0 - Assessment
13 pages
Chapter-3 Risk Management Through Insurance: Certificate in Insurance Concepts
60% (5)
Chapter-3 Risk Management Through Insurance: Certificate in Insurance Concepts
23 pages
Hackerrank Java Sort Books by Price
No ratings yet
Hackerrank Java Sort Books by Price
9 pages
iQMS Training Policy
No ratings yet
iQMS Training Policy
257 pages
Car Analytics Solution
No ratings yet
Car Analytics Solution
4 pages
Final Exams - Sample Paper CS301P
No ratings yet
Final Exams - Sample Paper CS301P
9 pages
Mining Frequent, Patterns, Associations, and Correlations
No ratings yet
Mining Frequent, Patterns, Associations, and Correlations
13 pages
1.what Is A Cursor For Loop ?
No ratings yet
1.what Is A Cursor For Loop ?
15 pages
Microservices
No ratings yet
Microservices
5 pages
Backup Restore Matrix - Cyberoam To Non-Wifi XG Appliances
No ratings yet
Backup Restore Matrix - Cyberoam To Non-Wifi XG Appliances
2 pages
Unit 1
No ratings yet
Unit 1
61 pages
Dataguard - Customer Documentation
No ratings yet
Dataguard - Customer Documentation
39 pages
Tutorial Spagobi PDF
100% (1)
Tutorial Spagobi PDF
18 pages
Write An ALP For All Arithematic Operations and Write ALP For Product of Two Numbers Withoutusing MUL Operation
No ratings yet
Write An ALP For All Arithematic Operations and Write ALP For Product of Two Numbers Withoutusing MUL Operation
3 pages
Business Intelligence: Multi-Dimensional Analysis Tools
No ratings yet
Business Intelligence: Multi-Dimensional Analysis Tools
35 pages
Danny Elfman Corpse Bride Official Sheet Music
No ratings yet
Danny Elfman Corpse Bride Official Sheet Music
65 pages
Vishal Singh Resume
No ratings yet
Vishal Singh Resume
1 page
Infosys Interview Prepration
No ratings yet
Infosys Interview Prepration
17 pages
Tableau Whitepaper LOD
50% (2)
Tableau Whitepaper LOD
26 pages
How-To Geek: How To Use Backup and Restore in Windows 7
No ratings yet
How-To Geek: How To Use Backup and Restore in Windows 7
10 pages
Understanding Metadata Concepts: About This Guide
No ratings yet
Understanding Metadata Concepts: About This Guide
4 pages
API Example To Update and Assign Item
No ratings yet
API Example To Update and Assign Item
5 pages
DBMS Data Abstraction Basics
No ratings yet
DBMS Data Abstraction Basics
19 pages
DSA Study
No ratings yet
DSA Study
8 pages
SQL Server & Oracle Functions Guide
100% (5)
SQL Server & Oracle Functions Guide
21 pages
Cat 1 Introdution To Database
No ratings yet
Cat 1 Introdution To Database
3 pages
2023-2024 CSC Project Common Page
No ratings yet
2023-2024 CSC Project Common Page
18 pages
ETL Basics and Testing Guide
No ratings yet
ETL Basics and Testing Guide
19 pages
Ncert Notes Class 12 Ip CH 1 Querying 2024 - 25
100% (3)
Ncert Notes Class 12 Ip CH 1 Querying 2024 - 25
2 pages
Advanced SQL Server
No ratings yet
Advanced SQL Server
25 pages
Команды для терминалов
No ratings yet
Команды для терминалов
46 pages
Gta V RGH
100% (1)
Gta V RGH
3 pages
WP Ten Things About MDM 0324
No ratings yet
WP Ten Things About MDM 0324
8 pages
Hibernate
No ratings yet
Hibernate
111 pages
Information Technology (Subject Code 402) : Sample Question Paper For Term - 2 Answer Key General Instructions
No ratings yet
Information Technology (Subject Code 402) : Sample Question Paper For Term - 2 Answer Key General Instructions
5 pages

T15 Hand-On Solution Id 80827

Uploaded by

T15 Hand-On Solution Id 80827

Uploaded by

# -*- coding: utf-8 -*-

def read_data(spark, customSchema):

#Mention the Bucket name inside the bucket_name variable

df = spark.read.csv(s3_input_path, header=True, schema=customSchema)

#Mention the bucket name inside the bucket_name variable

df = input_df.filter((col("purpose") == "educational") | (col("purpose") ==

#Write the redshift load data command here

You might also like

# -- coding: utf-8 --