4/14/25, 12:10 PM assignment2.
ipynb - Colab
!pip install -q scikit-learn pandas matplotlib seaborn
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, precision_score, recall_score, roc_auc_score, confusion_matrix
from google.colab import files
uploaded = files.upload()
Choose Files train.csv
train.csv(text/csv) - 61194 bytes, last modified: 4/14/2025 - 100% done
Saving train csv to train csv
import pandas as pd
df = pd.read_csv('train.csv')
df.head()
PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked
0 1 0 3 Braund, Mr. Owen Harris male 22.0 1 0 A/5 21171 7.2500 NaN S
Cumings, Mrs. John Bradley
1 2 1 1 female 38.0 1 0 PC 17599 71.2833 C85 C
(Florence Briggs Th...
STON/O2.
2 3 1 3 Heikkinen, Miss. Laina female 26.0 0 0 7.9250 NaN S
3101282
Futrelle, Mrs. Jacques Heath
3 4 1 1 female 35.0 1 0 113803 53.1000 C123 S
(Lily May Peel)
Next steps: Generate code with df toggle_off View recommended plots New interactive sheet
# Select relevant features and copy to avoid chained assignment warnings
titanic_data = df[['Survived', 'Pclass', 'Sex', 'Age']].copy()
# Fill missing Age values with median
titanic_data['Age'] = titanic_data['Age'].fillna(titanic_data['Age'].median())
# Convert 'Sex' to numeric: female = 0, male = 1
titanic_data['Sex'] = titanic_data['Sex'].map({'female': 0, 'male': 1})
# Check cleaned data
titanic_data.head()
Survived Pclass Sex Age
0 0 3 1 22.0
1 1 1 0 38.0
2 1 3 0 26.0
3 1 1 0 35.0
4 0 3 1 35.0
Next steps: Generate code with titanic_data toggle_off View recommended plots New interactive sheet
X = titanic_data[['Pclass', 'Sex', 'Age']]
y = titanic_data['Survived']
# Train-test split (80% train, 20% test)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
model = LogisticRegression()
model.fit(X_train, y_train)
# Predict on test data
y_pred = model.predict(X_test)
accuracy = accuracy score(y test y pred)
https://colab.research.google.com/drive/1_2f18ZOIF0czondiv0Npfco5OKfbvbLJ#scrollTo=QJSHF4eaDQnt&printMode=true 1/2
4/14/25, 12:10 PM assignment2.ipynb - Colab
accuracy accuracy_score(y_test, y_pred)
roc_auc = roc_auc_score(y_test, y_pred)
# Print results
print(f"Accuracy: {accuracy * 100:.2f}%")
print(f"ROC AUC Score: {roc_auc:.2f}")
Accuracy: 81.01%
ROC AUC Score: 0.80
https://colab.research.google.com/drive/1_2f18ZOIF0czondiv0Npfco5OKfbvbLJ#scrollTo=QJSHF4eaDQnt&printMode=true 2/2