0% found this document useful (0 votes)

223 views26 pages

ML Mini Project 2

This project aims to build a machine learning model to predict which passengers survived the sinking of the Titanic using features like gender, age and passenger class from a dataset. Logistic regression is used to classify passengers as survived or not survived based on these features. The dataset is split into a training set to train the model and a test set to evaluate the model's performance.

Uploaded by

Shivam Gosavi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

223 views26 pages

ML Mini Project 2

Uploaded by

Shivam Gosavi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 26

A Project Report on

“Build a Machine Learning model that predicts the type of people

who survived the Titanic Shipwreck using Passenger Data (i.e
name, age, gender, socio-economic class, etc)”

Submitted
By
Shivam Sandeep Khole [13B]
Aditya Anil Rokade [17B]
Sujeet Singh
Tanvi Chaudhari
Sakshi ahe

In partial fulfilment of the requirements for

The award of the degree of

Bachelor
in
COMPUTER ENGINEERING
For Academic Year 2022 – 2023

DEPARTMENT OF COMPUTER ENGINEERING

MET’s Institute of Engineering Bhujbal Knowledge City
Adgaon, Nashik – 422003.
1
Certificate

This is to Certify That

Tanvi Devidas Ch

Has completed the necessary Mini Project Work and Prepared the Report
on
“Build a Machine Learning model that predicts the type of people
who survived the Titanic Shipwreck using Passenger Data (i.e
name, age, gender, socio-economic class, etc)”

in Satisfactorily manner as a fulfilment of the requirement of the award of

degree of the Bachelor in Computer Engineering in the Academic Year
2022 – 2023

Project Guide
Prof. Vijay More

2
Course Objectives:
• To understand the need for Machine learning

• To explore various data pre-processing methods.

• To study and understand classification methods
• To understand the need for multi-class classifiers.
• To learn the working of clustering algorithms
• To learn fundamental neural network algorithms.

3
Course Outcomes:
On completion of the course, student will be able to–
CO1: Identify the needs and challenges of machine learning for real time applications.
CO2: Apply various data pre-processing techniques to simplify and speed up machine learning
algorithms.
CO3: Select and apply appropriately supervised machine learning algorithms for real time
applications.
CO4: Implement variants of multi-class classifier and measure its performance.
CO5: Compare and contrast different clustering algorithms. CO6: Design a neural network for
solving engineering problems.

4
Acknowledgement

We take this opportunity to express our deepest sense of gratitude and sincere
thanks to those who have helped us in completing this task. We express our
sincere thanks to our guide Prof. Vijay More, who has given us valuable
suggestions, excellent guidance, continuous encouragement and taken interest in
the completion of this work. His kind help and constant inspiration will always
help us in our future also. We thank Dr. M. U. Kharat, Head of Computer
Engineering Department, for the co-operation and encouragement for collecting
the information and preparation of data. Credit goes to our colleague’s, staff
members of Computer Engineering Department and the Institute’s Library for
their help and timely assistance.

5
Contents
Sr. No. TITLE Page no

1. Abstract 7

2. Objectives 8

3. Problem Statement 8

4. Motivation 8

5. Introduction 9

6. Theory 10

7. Conclusion 25

8. References 26

6
Abstract
This project is based on the Titanic dataset given on Kaggle. The sinking of the Titanic is one
of the most infamous shipwrecks in history. On April 15, 1912, the widely considered
“unsinkable” Titanic sank after colliding with an iceberg. Unfortunately, there weren’t enough
lifeboats for everyone on board, resulting in the death. In this project, we see how we can use
machine-learning techniques to predict survivors of the Titanic. With a dataset of 891
individuals containing features like sex, age, and class, we attempt to predict the survivors of
a small test group of 418. We are using Logistic Regression Model for the same.

7
Title of Mini-Project:

Build a Machine Learning model that predicts the type of people who survived the
Titanic Shipwreck using Passenger Data (i.e name, age, gender, socio-economic
class, etc)

Objective:

Goal: Build a predictive model that answers the question: “what sorts of
people were more likely to survive?” using passenger data like age, gender, class,
etc.

Problem Statement
Build a machine learning model that predicts the type of people who survived
the Titanic shipwreck using passenger data (i.e., name, age, gender, socio-economic class, etc.).
Dataset Link: https://www.kaggle.com/competitions/titanic/data

Motivation
To predict what type of people survived the Titanic Shipwreck using passenger data and build
its prediction model is the main motive to study this mini project.

8
Introduction:
Machine learning means the application of any computer-enabled algorithm that can be applied
against a data set to find a pattern in the data. This encompasses basically all types of data
science algorithms, supervised, unsupervised, segmentation, classification, or regression". few
important areas where machine learning can be applied are Handwriting Recognition,
Language Translation, Speech Recognition, Image Classification, Autonomous Driving. Some
features of machine learning algorithms can be observations that are used to form predictions
for image classification, the pixels are the features, for voice recognition, the pitch and volume
of the sound samples are the features and for autonomous cars, data from the cameras, range
sensors, and GPS.
Using data provided by www.kaggle.com, our goal is to apply machine-learning techniques to
successfully predict which passengers survived the sinking of the Titanic. Features like ticket
price, age, sex, and class will be used to make the predictions. Using Logistic Regression
methods, we try to predict the survival of passengers using different combinations of features.
The challenge boils down to a classification problem given a set of features.

9
Theory
Data Set:
The data we used for our project was provided on the Kaggle website. We were given 891
passenger samples for our training set and their associated labels of whether the passenger
survived. For each passenger, we were given his/her passenger class, name, sex, age, number
of siblings/spouses aboard, number of parents/children aboard, ticket number, fare, cabin
embarked, and port of embarkation.
For the test data, we had 418 samples in the same format. The dataset is not complete, meaning
that for several samples, one or many of fields were not available and marked empty (especially
in the latter fields – age, fare, cabin, and port). However, all sample points contained at least
information about gender and passenger class.
To normalize the data, we replace missing values with the mean of the remaining data set or
other values.

Understanding the Titanic Dataset

So first we will understand our titanic dataset. This is a dataset of Titanic ship passengers &
here
• Each row represents the data of 1 passenger.
• Columns represent the features. We have 10 features/ variables in this dataset.
1. Survival: This variable shows whether the person survived or not. This is our target
variable & we must predict its value. It’s a binary variable. 0 means not survived and 1
means survived.
2. pclass: The ticket class of passengers. 1st (upper class), 2nd (middle), or 3rd (lower).
3. Sex: Gender of passenger
4. Age: Age (in years) of a passenger
5. sibsp: The no. of siblings/spouses of a particular passenger who were there on the ship.
6. parch: The no. of parents/children of a particular passenger who were there on the ship.
7. ticket: Ticket Number
8. fare: Passenger fare (like 1st class ticket fare must be greater than 2nd pr 3rd class ticket
right)
9. cabin: Cabin Number
10. embarked: Port of Embarkation; From where that passenger took the ship. (C =
Cherbourg, Q = Queenstown, S = Southampton)

10
Logistic Regression:

11. A simple yet crisp description of Logistic Description would be, “it is a supervised

learning classification algorithm used to predict the probability of a target variable. The

nature of target or dependent variable is dichotomous, which means there would be

only two possible classes.” as stated in the tutorial points article.

12. The graph of logistic regression is as shown below:

What is Training Dataset?

The training data is the biggest (in -size) subset of the original dataset, which is used to train
or fit the machine learning model. Firstly, the training data is fed to the ML algorithms, which
lets them learn how to make predictions for the given task.

11
What is Test Dataset?

Once we train the model with the training dataset, it's time to test the model with the test dataset.
This dataset evaluates the performance of the model and ensures that the model can generalize
well with the new or unseen dataset. The test dataset is another subset of original data, which
is independent of the training dataset. However, it has some similar types of features and class
probability distribution and uses it as a benchmark for model evaluation once the model
training is completed. Test data is a well-organized dataset that contains data for each type of
scenario for a given problem that the model would be facing when used in the real world.
Usually, the test dataset is approximately 20-25% of the total original data for an ML project.

Accuracy:

To find the accuracy of model in confusion matrix the formula is:

Workflow

12
CODE & RESULT:

13
14
15
16
17
18
19
20
21
22
23
24
Conclusion:
The analysis revealed interesting patterns across individual-level features. Factors such as
socioeconomic status, social norms and family composition appeared to have an impact on
likelihood of survival. These conclusions, however, were derived from findings in the given
data set.
It has been observed that female survival rates are very high (approx 74%) while male survival
rates are very low. To make predictions in classification problem, the technique of logistic
regression is primarily used.
It would be interesting to play more with dataset and introducing more attributes which might
lead to better results. Various other machine learning techniques like Naive Bayes, K-NN
classification can be used to solve the problem.

25
References:

[1] Kaggle, Titanic: Machine Learning form Disaster [Online]. Available:

http://www.kaggle.com/

[2] Prediction of Survivors in Titanic Dataset: A Comparitive Study using Machine Learning
Algorithms, Tryambak Chatterlee, IJERMT-2017.

[3] Eric Lam, Chongxuan Tang, "Titanic Machine Learning from Disaster", LamTang-Titanic
Machine Learning From Disaster, 2012.

[4] Analyzing Titanic disaster using machine learning algorithms-Computing, Communication

and Automation (ICCCA), 2017 International Conference on 21 December 2017, IEEE.

[5] https://towardsdatascience.com/predicting-thesurvival-of-titanic-passengers-Niklas
Donges

[6] https://www.analyticsvidhya.com/machine-learning

[7] Wikipedia. Logistic Regression [Online]. Available:

https://en.wikipedia.org/wiki/Logistic_regression

LP3 - ML Mini-Project Report Format Shreeyas
No ratings yet
LP3 - ML Mini-Project Report Format Shreeyas
13 pages
Titanic Survival Prediction Using Machine Learning
No ratings yet
Titanic Survival Prediction Using Machine Learning
34 pages
ML Report-1
No ratings yet
ML Report-1
13 pages
Titanic Survival Prediction Project
No ratings yet
Titanic Survival Prediction Project
5 pages
Titanic Survival Prediction Using ML Miniproject
No ratings yet
Titanic Survival Prediction Using ML Miniproject
21 pages
ML WebApp for Titanic Survival Prediction
No ratings yet
ML WebApp for Titanic Survival Prediction
2 pages
Titanic Survival Prediction Using Machine Learning
No ratings yet
Titanic Survival Prediction Using Machine Learning
7 pages
ML Mini Project - Docx New (A)
No ratings yet
ML Mini Project - Docx New (A)
17 pages
ML Mini Project
No ratings yet
ML Mini Project
17 pages
ML Aniket
No ratings yet
ML Aniket
18 pages
Exploratory Data Analysis of Titanic Survival Prediction Using Machine Learning Techniques
No ratings yet
Exploratory Data Analysis of Titanic Survival Prediction Using Machine Learning Techniques
5 pages
MCA - Project Documentation Guidelines 2024-2025
No ratings yet
MCA - Project Documentation Guidelines 2024-2025
26 pages
Titanic Report ML Report
No ratings yet
Titanic Report ML Report
14 pages
Titanic Data Analysis Project
No ratings yet
Titanic Data Analysis Project
14 pages
Data Science Insights on Titanic
No ratings yet
Data Science Insights on Titanic
24 pages
Titanic Survival Prediction Using ML
No ratings yet
Titanic Survival Prediction Using ML
24 pages
Titanic Disaster Using Machine Learning
No ratings yet
Titanic Disaster Using Machine Learning
7 pages
Using Titanic Dataset For Comprehensive Machine Learning Model Training
No ratings yet
Using Titanic Dataset For Comprehensive Machine Learning Model Training
3 pages
Report TSP
No ratings yet
Report TSP
13 pages
Individual Asignment Ucs551
70% (10)
Individual Asignment Ucs551
15 pages
Titanic Survival Analysis
No ratings yet
Titanic Survival Analysis
61 pages
Mini Project ml111
No ratings yet
Mini Project ml111
2 pages
Titanic
No ratings yet
Titanic
3 pages
Titanic
No ratings yet
Titanic
3 pages
Machine Learning
100% (1)
Machine Learning
62 pages
Titanic Classification Project
No ratings yet
Titanic Classification Project
17 pages
Indraneel S (RA2211003010421)
No ratings yet
Indraneel S (RA2211003010421)
21 pages
Iml Project
No ratings yet
Iml Project
13 pages
Submitted To The Savitribai Phule Pune University, Pune FOR
No ratings yet
Submitted To The Savitribai Phule Pune University, Pune FOR
4 pages
Worksheet Titanic Python PDF
No ratings yet
Worksheet Titanic Python PDF
8 pages
Titanic Classification Project
No ratings yet
Titanic Classification Project
17 pages
Aim: Predicting The Survival of Titanic Passengers
No ratings yet
Aim: Predicting The Survival of Titanic Passengers
20 pages
EDA of Titanic Dataset Report
No ratings yet
EDA of Titanic Dataset Report
28 pages
Oomd
No ratings yet
Oomd
11 pages
Titanic Survival Prediction Guide
No ratings yet
Titanic Survival Prediction Guide
13 pages
PredictingTitanicSurvivorsusing by Applying Exploratory Data Anyltics and ML
No ratings yet
PredictingTitanicSurvivorsusing by Applying Exploratory Data Anyltics and ML
7 pages
Titanic Disaster Prediction
No ratings yet
Titanic Disaster Prediction
20 pages
Titanic Classification P
No ratings yet
Titanic Classification P
19 pages
ML Report
No ratings yet
ML Report
3 pages
Data Cleaning and Eda From Titanic Dataset Review2
No ratings yet
Data Cleaning and Eda From Titanic Dataset Review2
19 pages
Titanic - Machine Learning From Disaster: A Report ON
No ratings yet
Titanic - Machine Learning From Disaster: A Report ON
23 pages
Titanic Documentation-1722102624939
No ratings yet
Titanic Documentation-1722102624939
34 pages
Document 26 1
No ratings yet
Document 26 1
5 pages
Data Science Assignment Rules & Tasks
No ratings yet
Data Science Assignment Rules & Tasks
4 pages
Guru Gobind Singh College of Engineering & Research Centre, Nashik
No ratings yet
Guru Gobind Singh College of Engineering & Research Centre, Nashik
9 pages
Titanic Survival Prediction Guide
No ratings yet
Titanic Survival Prediction Guide
1 page
Titanic Survival Prediction Guide
No ratings yet
Titanic Survival Prediction Guide
20 pages
02 Titanic Dataset Descr
No ratings yet
02 Titanic Dataset Descr
5 pages
Titanic Survival Prediction Analysis
No ratings yet
Titanic Survival Prediction Analysis
11 pages
Neural Network Project
No ratings yet
Neural Network Project
4 pages
Titanic & Airline ML Analysis Guide
No ratings yet
Titanic & Airline ML Analysis Guide
3 pages
Titanic Survival Prediction
No ratings yet
Titanic Survival Prediction
14 pages
A Comparative Study On Machine Learning Techniques Using Titanic Dataset
No ratings yet
A Comparative Study On Machine Learning Techniques Using Titanic Dataset
6 pages
Intro To Machine Learning 101 Python Data Science v2
No ratings yet
Intro To Machine Learning 101 Python Data Science v2
101 pages
CEP Final
No ratings yet
CEP Final
11 pages
Rouse Final
No ratings yet
Rouse Final
8 pages
The Implication of Statistical Analysis and Feature Engineering For Model Building Using Machine Learning Algorithms
No ratings yet
The Implication of Statistical Analysis and Feature Engineering For Model Building Using Machine Learning Algorithms
11 pages
Machine Learning Part: Domain Overview
No ratings yet
Machine Learning Part: Domain Overview
20 pages
? Titanic Survival Prediction Project
No ratings yet
? Titanic Survival Prediction Project
3 pages
4.1 Fuzzy Logic Architecture and Set Theory
No ratings yet
4.1 Fuzzy Logic Architecture and Set Theory
16 pages
Insect Pest Image Detection and Classification Usi
No ratings yet
Insect Pest Image Detection and Classification Usi
11 pages
Gujarat Technological University
No ratings yet
Gujarat Technological University
2 pages
Nonlinear Control Systems 1st Edition Zoran Vukić Full
100% (1)
Nonlinear Control Systems 1st Edition Zoran Vukić Full
100 pages
Chapter 8 - Time Series Forecasting
No ratings yet
Chapter 8 - Time Series Forecasting
15 pages
Wilson Problem Sensitivity Analysis
No ratings yet
Wilson Problem Sensitivity Analysis
16 pages
Optional Stopping Theorem Lecture
No ratings yet
Optional Stopping Theorem Lecture
13 pages
Question Bank PDF
No ratings yet
Question Bank PDF
30 pages
Portfolio Theory Harry M. Markowitz
No ratings yet
Portfolio Theory Harry M. Markowitz
26 pages
Unit 5.2 Convolution
No ratings yet
Unit 5.2 Convolution
71 pages
Introduction To Data Mining Using Orange
No ratings yet
Introduction To Data Mining Using Orange
72 pages
Multiscale Geographically Weighted Regression (MGWR) : Annals of The American Association of Geographers
No ratings yet
Multiscale Geographically Weighted Regression (MGWR) : Annals of The American Association of Geographers
20 pages
Gabor
No ratings yet
Gabor
25 pages
Semantic Relay for Efficient Text Transmission
No ratings yet
Semantic Relay for Efficient Text Transmission
6 pages
10 Maths 3
No ratings yet
10 Maths 3
2 pages
Neural Networks Basics Course
No ratings yet
Neural Networks Basics Course
36 pages
01 - Mnist - Ipynb (4) - JupyterLab
No ratings yet
01 - Mnist - Ipynb (4) - JupyterLab
23 pages
04 - Dynamic Programming
No ratings yet
04 - Dynamic Programming
48 pages
DSP Basics for EEE/EIE Students
No ratings yet
DSP Basics for EEE/EIE Students
37 pages
Chap 4 (Decision Analysis)
No ratings yet
Chap 4 (Decision Analysis)
45 pages
Metric Unit Conversions Liters To Kiloliters Hectoliters Decaliters 1 v1
No ratings yet
Metric Unit Conversions Liters To Kiloliters Hectoliters Decaliters 1 v1
2 pages
SUMMATIVE TEST NO.1 Math 10 Second Quarter 2022-2023
100% (2)
SUMMATIVE TEST NO.1 Math 10 Second Quarter 2022-2023
2 pages
Final
No ratings yet
Final
63 pages
Mechanical Vibration Analysis
No ratings yet
Mechanical Vibration Analysis
44 pages
Gold Volatility Prediction Using A CNN-LSTM Approa
No ratings yet
Gold Volatility Prediction Using A CNN-LSTM Approa
9 pages
LPP Formulation
No ratings yet
LPP Formulation
43 pages
Kruse - Data Structures and Program Design in C 1991
100% (2)
Kruse - Data Structures and Program Design in C 1991
272 pages
Dimensionality Reduction Guide
No ratings yet
Dimensionality Reduction Guide
79 pages
The Statistical Sleuth A Course in Methods of Data Analysis Solutions Manual 3rd Edition Fred Ramsey Download Full Chapters
100% (5)
The Statistical Sleuth A Course in Methods of Data Analysis Solutions Manual 3rd Edition Fred Ramsey Download Full Chapters
82 pages
Movie Recommendation System: Using Machine Learning
No ratings yet
Movie Recommendation System: Using Machine Learning
7 pages