Practical Session 1: Exploratory Data Analysis: Exercise 1

The document outlines two practical sessions focused on exploratory data analysis using the Ames House Data and Titanic dataset. The first session involves data importation, cleaning, and analysis of property sales in Ames, Iowa, including visualizations of SalePrice and correlations with numerical and categorical features. The second session examines the Titanic dataset to analyze survival rates based on various features, including gender and class, while also emphasizing data cleaning and basic statistical analysis.

Uploaded by

Husam hr

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views2 pages

Practical Session 1: Exploratory Data Analysis: Exercise 1

Uploaded by

Husam hr

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 2

Practical Session 1 : Exploratory Data Analysis

Exercise 1
We shall explore the Ames House Data1 . The dataset contained 113 variables describing 3970 property sales that
had occurred in Ames, Iowa between 2006 and 2010. The variables were a mix of nominal, ordinal, continuous, and
discrete variables used in calculation of assessed values and included physical property measurements in addition
to computation variables used in the city. The aim of this practical session is to explore this dataset.

Importation and preparation of the data

1. Import the data and store it in a dataframe df. Are there some missing values?
2. We can see that some features will not be relevant in our exploratory analysis as there are too much missing
values (such as Alley and PoolQC). Remove Id and the features with 30% or more NaN values.

3. Plot the histogram of the variable SalePrice. What do you deduce?

Numerical features analysis

1. List all the types of data from the dataset and take only the numerical ones

2. Plot all the histogram of these features

3. Select the features for which the correlation with SalePrice is greater than 0.5. Give the correlation matrix
between these numerical features. Comment!

Include categorical features in the pipeline

We want to visualise the impact of categorical features on sale prices
1. Select the categorical features
2. Display the box plot of the variable SalePrice in function of the different levels of the variable BsmtExposure.
Comment
3. Display the box plot of the variable SalePrice in function of the different levels of the variable SaleCondition.
Comment

1 http://jse.amstat.org/v19n3/decock.pdf

1
Exercise 2
The RMS Titanic was a British passenger liner that sank in the North Atlantic Ocean in the early morning hours
of 15 April 1912, after it collided with an iceberg during its maiden voyage from Southampton to New York City.
There were an estimated 2,224 passengers and crew aboard the ship, and more than 1,500 died, making it one of
the deadliest commercial peacetime maritime disasters in modern history.

Women and children first? The aim is to understand how survivors of Titanic were selected...

Importation of the data and description of the dataset

1. In this first practical session, we shall work on the dataset titanic.csv on the survival of the passengers of
Titanic. Download this dataset as a data frame

2. Describe the dataset titanic : features, nature of the features, number of observations
3. Basic statistics : mean of each variable, quartiles
4. Percentage of missing values for each column. Sort by descending values

Basic graphic analysis

We want to understand what features could contribute to a high survival rate. It would make sense if everything
except ’PassengerId’, ’Ticket’ and ’Name’ would be correlated with a high survival rate.
1. Get rid off the features ’PassengerId’, ’Ticket’ and ’Name’ which seem irrelevant to analyse the data

2. We focus on the features ’Age’ and ’Sex’.

(i) Separate the dataset into men and women
(ii) Display the distribution of the age survivors and non survivors according to the sex. Comment
3. At first glance is there some link between ’Embarked’ and ’Survival’.

4. At first glance is there some link between ’Pclass’ and ’Survival’.

Titanic Dataset Analysis Insights
No ratings yet
Titanic Dataset Analysis Insights
4 pages
Ahamed 123
100% (1)
Ahamed 123
7 pages
10 - Eda To Prediction Dietanic
No ratings yet
10 - Eda To Prediction Dietanic
21 pages
Unit 5 Analysis With Pandas in Python
No ratings yet
Unit 5 Analysis With Pandas in Python
26 pages
AE II Simulation File PDF
No ratings yet
AE II Simulation File PDF
32 pages
Exploring The Titanic Dataset With Python
No ratings yet
Exploring The Titanic Dataset With Python
6 pages
AI Lab5
No ratings yet
AI Lab5
5 pages
PredictingTitanicSurvivorsusing by Applying Exploratory Data Anyltics and ML
No ratings yet
PredictingTitanicSurvivorsusing by Applying Exploratory Data Anyltics and ML
7 pages
Python For Data Sceince l1 Hands On
No ratings yet
Python For Data Sceince l1 Hands On
5 pages
Data Analysis with Python
No ratings yet
Data Analysis with Python
12 pages
Date Preparation and Exploration:: Titanic Data - CSV
No ratings yet
Date Preparation and Exploration:: Titanic Data - CSV
5 pages
Titanic Data Analysis & Modeling
No ratings yet
Titanic Data Analysis & Modeling
12 pages
Titanic Prediction
No ratings yet
Titanic Prediction
53 pages
The Titanic Dataset
No ratings yet
The Titanic Dataset
6 pages
Titanic ML for Data Scientists
No ratings yet
Titanic ML for Data Scientists
36 pages
Pyt Manual 1
No ratings yet
Pyt Manual 1
85 pages
Pythion Assigment
No ratings yet
Pythion Assigment
3 pages
Aim: Predicting The Survival of Titanic Passengers
No ratings yet
Aim: Predicting The Survival of Titanic Passengers
20 pages
Titanic Survival Prediction
No ratings yet
Titanic Survival Prediction
14 pages
Titanic
No ratings yet
Titanic
22 pages
Coding Titanicmain
No ratings yet
Coding Titanicmain
58 pages
Titanic Logistic Regression Project
No ratings yet
Titanic Logistic Regression Project
35 pages
Lab Manual - DSR
No ratings yet
Lab Manual - DSR
32 pages
01-Logistic Regression With Python
No ratings yet
01-Logistic Regression With Python
12 pages
Assignment 2
No ratings yet
Assignment 2
5 pages
I2IT DataVisualizationI - JupyterLab
No ratings yet
I2IT DataVisualizationI - JupyterLab
18 pages
Terminal Assessment 2 DAP
No ratings yet
Terminal Assessment 2 DAP
25 pages
Assignment2 DMS672
No ratings yet
Assignment2 DMS672
15 pages
Python-12-Unit Summary Project
No ratings yet
Python-12-Unit Summary Project
3 pages
AIML Expt
No ratings yet
AIML Expt
7 pages
Exp 12 and 15
No ratings yet
Exp 12 and 15
4 pages
Titanic Survival Prediction Guide
No ratings yet
Titanic Survival Prediction Guide
16 pages
Homework 1
No ratings yet
Homework 1
17 pages
Data Mining Using Python Manual
No ratings yet
Data Mining Using Python Manual
69 pages
08 Titanic
No ratings yet
08 Titanic
19 pages
Titanic
No ratings yet
Titanic
6 pages
Decision Trees in R Tutorial
No ratings yet
Decision Trees in R Tutorial
9 pages
CEP Final
No ratings yet
CEP Final
11 pages
Lecture4 Descriptive Statistics
No ratings yet
Lecture4 Descriptive Statistics
10 pages
Titanic Eda
No ratings yet
Titanic Eda
14 pages
Python
No ratings yet
Python
9 pages
Practical - Questions - Unit 1 and 2
No ratings yet
Practical - Questions - Unit 1 and 2
5 pages
Project Report
No ratings yet
Project Report
7 pages
Experiment No 11
No ratings yet
Experiment No 11
19 pages
DATASCI112 Midterm Cheat Sheet
No ratings yet
DATASCI112 Midterm Cheat Sheet
2 pages
INFO-523 Homework 1
No ratings yet
INFO-523 Homework 1
2 pages
Dsbda 8
No ratings yet
Dsbda 8
8 pages
ML Lab Manual
No ratings yet
ML Lab Manual
110 pages
MLT Lab Prep Guide
No ratings yet
MLT Lab Prep Guide
37 pages
Recipe-1-Indetifying-Variables-Types - Ipynb - Colab
No ratings yet
Recipe-1-Indetifying-Variables-Types - Ipynb - Colab
3 pages
Ultimate Python For Data Science: 200 Essential Functions and Interview Questions
No ratings yet
Ultimate Python For Data Science: 200 Essential Functions and Interview Questions
12 pages
Task 1
0% (1)
Task 1
3 pages
Data Wrangling (Data Preprocessing) : Practical Assessment 1
No ratings yet
Data Wrangling (Data Preprocessing) : Practical Assessment 1
5 pages
Pandas - Data Manipulation and Analysis Library - Educative
No ratings yet
Pandas - Data Manipulation and Analysis Library - Educative
7 pages
Comp333 wk2 Example3
No ratings yet
Comp333 wk2 Example3
3 pages
BD WPS2
No ratings yet
BD WPS2
11 pages
2.UNIT-1 R Programming
No ratings yet
2.UNIT-1 R Programming
28 pages
Machine Learning Lab: Titanic PCA & ID3 Decision Tree
No ratings yet
Machine Learning Lab: Titanic PCA & ID3 Decision Tree
19 pages
Ezyfeel Ramido SDS
No ratings yet
Ezyfeel Ramido SDS
11 pages
Prorox WM 960 Sa PDF
No ratings yet
Prorox WM 960 Sa PDF
1 page
Hawassa University: Department of Hydraulic and Water Resources Engineering
No ratings yet
Hawassa University: Department of Hydraulic and Water Resources Engineering
2 pages
Zoology - Animal Tissues - Lecture Notes
No ratings yet
Zoology - Animal Tissues - Lecture Notes
8 pages
Waveguides & Antennas Course Plan
No ratings yet
Waveguides & Antennas Course Plan
6 pages
Monthly Exam Part I Aurora English Course 1 (KD 1, KD2, PKD3)
No ratings yet
Monthly Exam Part I Aurora English Course 1 (KD 1, KD2, PKD3)
20 pages
Assignment Listening Week 13
No ratings yet
Assignment Listening Week 13
4 pages
Lifestyle Discount Merchants List
No ratings yet
Lifestyle Discount Merchants List
8 pages
Introduction To LM - Module 1, Week 1-1
No ratings yet
Introduction To LM - Module 1, Week 1-1
29 pages
Effect of Water Absorption in Polymers at Low and High Temperatures
No ratings yet
Effect of Water Absorption in Polymers at Low and High Temperatures
9 pages
Fluid Mechanics Lecture Slides
No ratings yet
Fluid Mechanics Lecture Slides
331 pages
Massey Ferguson MF 350 TRACTOR Service Parts Catalogue Manual (Part Number 819748)
100% (1)
Massey Ferguson MF 350 TRACTOR Service Parts Catalogue Manual (Part Number 819748)
15 pages
Eliade, Mircea - Buddhism
No ratings yet
Eliade, Mircea - Buddhism
6 pages
Module 5 - Good Manners and Right Conduct
100% (1)
Module 5 - Good Manners and Right Conduct
16 pages
Locality-Sensitive Hashing Scheme Based On P-Stable Distributions
No ratings yet
Locality-Sensitive Hashing Scheme Based On P-Stable Distributions
10 pages
Troubleshooting
No ratings yet
Troubleshooting
54 pages
Beyond SVGF
No ratings yet
Beyond SVGF
66 pages
Salad Preparation and Mise en Place
No ratings yet
Salad Preparation and Mise en Place
8 pages
Door Lock System Using 8051 Microcontroller
No ratings yet
Door Lock System Using 8051 Microcontroller
14 pages
Drive 2-4 iP5A
No ratings yet
Drive 2-4 iP5A
10 pages
Heat Exchanger Cleaning
100% (1)
Heat Exchanger Cleaning
26 pages
Bronchatlas: Prepared by
No ratings yet
Bronchatlas: Prepared by
36 pages
KSPRS Newsletter Vol.137
No ratings yet
KSPRS Newsletter Vol.137
21 pages
Case Study 8 (Osteoporosis)
No ratings yet
Case Study 8 (Osteoporosis)
2 pages
Universal Logic Gates
100% (1)
Universal Logic Gates
12 pages
Minn Kota 2013 Catalog
No ratings yet
Minn Kota 2013 Catalog
38 pages
Arte Oceania
No ratings yet
Arte Oceania
8 pages
Bible Quiz Genesis 1-50 - SalvationCall
No ratings yet
Bible Quiz Genesis 1-50 - SalvationCall
1 page
Power Electronics Simulation Tools
No ratings yet
Power Electronics Simulation Tools
3 pages
Lesson 1
No ratings yet
Lesson 1
14 pages

Practical Session 1: Exploratory Data Analysis: Exercise 1

Uploaded by

Practical Session 1: Exploratory Data Analysis: Exercise 1

Uploaded by

Practical Session 1 : Exploratory Data Analysis

Importation and preparation of the data

3. Plot the histogram of the variable SalePrice. What do you deduce?

Numerical features analysis

2. Plot all the histogram of these features

Include categorical features in the pipeline

Importation of the data and description of the dataset

Basic graphic analysis

2. We focus on the features ’Age’ and ’Sex’.

4. At first glance is there some link between ’Pclass’ and ’Survival’.

You might also like