Dataverse

Spaceship Titanic Passenger Transport Prediction with TabNet

This repository contains a machine learning project aimed at predicting passenger transport in the Spaceship Titanic competition hosted on Kaggle. The primary model used is TabNetClassifier, a high-performance neural network designed for tabular data.

Project Overview

The goal of this project is to accurately predict whether a passenger on the Spaceship Titanic was transported to an alternate dimension or not. The project leverages various Python libraries for data analysis, preprocessing, model training, and evaluation, including:

pandas: Data manipulation and analysis.
numpy: Numerical operations.
seaborn: Data visualization.
sklearn: Machine learning algorithms and tools.
pytorch_tabnet: Implementation of the TabNetClassifier model.

Data Preprocessing

The provided code outlines a comprehensive data preprocessing pipeline. Here are the key steps:

Imputation: Missing values are filled using the most frequent strategy.
Feature Engineering: The Cabin feature is split into three separate features: Cabin, Seat, and Type.
Encoding: Categorical features are encoded using LabelEncoder.
Data Splitting: The data is split into training and testing sets for model evaluation.

Model Training and Evaluation

The code demonstrates the training process for the TabNetClassifier model with specific hyperparameters. After training, the model's performance is evaluated using metrics such as accuracy and classification report.

Submission Generation

The repository includes a function to generate a submission file in the correct format for Kaggle submission. This function utilizes the trained TabNetClassifier model to predict passenger transport on the test dataset.

Data Exploration

The notebook further provides visualizations exploring relationships between features and the target variable. These visualizations can offer valuable insights into the data and potentially inform further feature engineering or model selection.

Future Work

This project serves as a baseline for passenger transport prediction on the Spaceship Titanic dataset. Several potential directions for future work include:

Hyperparameter Optimization: Exploring different hyperparameter combinations to improve the model's performance.
Model Ensembling: Combining predictions from multiple models, such as TabNet and other algorithms, to achieve better generalization.
Feature Engineering: Investigating additional feature engineering techniques to capture more complex relationships within the data.

Repository Contents

train.csv: Training dataset.
test.csv: Testing dataset.
submit.csv: Generated submission file.
*.ipynb: Jupyter notebook containing the code and analysis.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.gitattributes		.gitattributes
LICENSE		LICENSE
README.md		README.md
dataverse.ipynb		dataverse.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Dataverse

Spaceship Titanic Passenger Transport Prediction with TabNet

Project Overview

Data Preprocessing

Model Training and Evaluation

Submission Generation

Data Exploration

Future Work

Repository Contents

About

Uh oh!

Releases

Packages

Languages

License

WazupSteve/Dataverse

Folders and files

Latest commit

History

Repository files navigation

Dataverse

Spaceship Titanic Passenger Transport Prediction with TabNet

Project Overview

Data Preprocessing

Model Training and Evaluation

Submission Generation

Data Exploration

Future Work

Repository Contents

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages