Competitions
overview
WINNING A KAGGLE COMPETITION IN PYTHON
Yauhen Babakhin
Kaggle Grandmaster
Instructor
Yauhen Babakhin
Master’s Degree in Applied Data Analysis
5 years of working experience in Data
Science
Kaggle competitions Grandmaster
Gold medals in both classic Machine
Learning and Deep Learning competitions
WINNING A KAGGLE COMPETITION IN PYTHON
WINNING A KAGGLE COMPETITION IN PYTHON
Kaggle benefits
1. Get practical experience on the real-world data
2. Develop portfolio projects
3. Meet a great Data Science community
4. Try new domain or model type
5. Keep up-to-date with the best performing methods
WINNING A KAGGLE COMPETITION IN PYTHON
Competition process
WINNING A KAGGLE COMPETITION IN PYTHON
Competition process
WINNING A KAGGLE COMPETITION IN PYTHON
Competition process
WINNING A KAGGLE COMPETITION IN PYTHON
How to participate
1. Go to h p://kaggle.com website and select the competition
2. Download the data
3. Start building the models!
WINNING A KAGGLE COMPETITION IN PYTHON
New York city taxi fare prediction
WINNING A KAGGLE COMPETITION IN PYTHON
Train and Test data
import pandas as pd # Read test data
taxi_test = pd.read_csv('taxi_test.csv')
# Read train data taxi_test.columns.to_list()
taxi_train = pd.read_csv('taxi_train.csv')
taxi_train.columns.to_list() ['key',
'pickup_datetime',
['key', 'pickup_longitude',
'fare_amount', 'pickup_latitude',
'pickup_datetime', 'dropoff_longitude',
'pickup_longitude', 'dropoff_latitude',
'pickup_latitude', 'passenger_count']
'dropoff_longitude',
'dropoff_latitude',
'passenger_count']
WINNING A KAGGLE COMPETITION IN PYTHON
Sample submission
# Read sample submission
taxi_sample_sub = pd.read_csv('taxi_sample_submission.csv')
taxi_sample_sub.head()
key fare_amount
0 2015-01-27 13:08:24.0000002 11.35
1 2015-01-27 13:08:24.0000003 11.35
2 2011-10-08 11:53:44.0000002 11.35
3 2012-12-01 21:12:12.0000002 11.35
4 2012-12-01 21:12:12.0000003 11.35
WINNING A KAGGLE COMPETITION IN PYTHON
Let's practice!
WINNING A KAGGLE COMPETITION IN PYTHON
Prepare your first
submission
WINNING A KAGGLE COMPETITION IN PYTHON
Yauhen Babakhin
Kaggle Grandmaster
What is submission
WINNING A KAGGLE COMPETITION IN PYTHON
New York city taxi fare prediction
# Read train data
taxi_train = pd.read_csv('taxi_train.csv')
taxi_train.columns.to_list()
['key',
'fare_amount',
'pickup_datetime',
'pickup_longitude',
'pickup_latitude',
'dropoff_longitude',
'dropoff_latitude',
'passenger_count']
WINNING A KAGGLE COMPETITION IN PYTHON
Problem type
import matplotlib.pyplot as plt
# Plot a histogram
taxi_train.fare_amount.hist(bins=30, alpha=0.5)
plt.show()
WINNING A KAGGLE COMPETITION IN PYTHON
Build a model
from sklearn.linear_model import LinearRegression
# Create a LinearRegression object
lr = LinearRegression()
# Fit the model on the train data
lr.fit(X=taxi_train[['pickup_longitude', 'pickup_latitude', 'dropoff_longitude',
'dropoff_latitude', 'passenger_count']],
y=taxi_train['fare_amount'])
WINNING A KAGGLE COMPETITION IN PYTHON
Predict on test set
# Select features
features = ['pickup_longitude', 'pickup_latitude',
'dropoff_longitude', 'dropoff_latitude',
'passenger_count']
# Make predictions on the test data
taxi_test['fare_amount'] = lr.predict(taxi_test[features])
WINNING A KAGGLE COMPETITION IN PYTHON
Prepare submission
# Read a sample submission file
taxi_sample_sub = pd.read_csv('taxi_sample_submission.csv')
taxi_sample_sub.head(1)
key fare_amount
0 2015-01-27 13:08:24.0000002 11.35
# Prepare a submission file
taxi_submission = taxi_test[['key', 'fare_amount']]
# Save the submission file as .csv
taxi_submission.to_csv('first_sub.csv', index=False)
WINNING A KAGGLE COMPETITION IN PYTHON
Let's practice!
WINNING A KAGGLE COMPETITION IN PYTHON
Public vs Private
leaderboard
WINNING A KAGGLE COMPETITION IN PYTHON
Yauhen Babakhin
Kaggle Grandmaster
Competition metric
Evaluation metric Type of problem
Area Under the ROC (AUC) Classi cation
F1 Score (F1) Classi cation
Mean Log Loss (LogLoss) Classi cation
Mean Absolute Error (MAE) Regression
Mean Squared Error (MSE) Regression
Mean Average Precision at K (MAPK, MAP@K) Ranking
WINNING A KAGGLE COMPETITION IN PYTHON
Test split
WINNING A KAGGLE COMPETITION IN PYTHON
Leaderboards
# Write a submission file to the disk
submission[['id', 'target']].to_csv('submission_1.csv', index=False)
Submission Public LB MSE Private LB MSE
submission_1.csv 2.895 ?
WINNING A KAGGLE COMPETITION IN PYTHON
Overfitting
WINNING A KAGGLE COMPETITION IN PYTHON
Overfitting
WINNING A KAGGLE COMPETITION IN PYTHON
Overfitting
WINNING A KAGGLE COMPETITION IN PYTHON
Public vs Private leaderboard shake-up
WINNING A KAGGLE COMPETITION IN PYTHON
Let's practice!
WINNING A KAGGLE COMPETITION IN PYTHON