Skip to content

WazupSteve/Dataverse

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Dataverse

Spaceship Titanic Passenger Transport Prediction with TabNet

This repository contains a machine learning project aimed at predicting passenger transport in the Spaceship Titanic competition hosted on Kaggle. The primary model used is TabNetClassifier, a high-performance neural network designed for tabular data.

Project Overview

The goal of this project is to accurately predict whether a passenger on the Spaceship Titanic was transported to an alternate dimension or not. The project leverages various Python libraries for data analysis, preprocessing, model training, and evaluation, including:

  • pandas: Data manipulation and analysis.
  • numpy: Numerical operations.
  • seaborn: Data visualization.
  • sklearn: Machine learning algorithms and tools.
  • pytorch_tabnet: Implementation of the TabNetClassifier model.

Data Preprocessing

The provided code outlines a comprehensive data preprocessing pipeline. Here are the key steps:

  • Imputation: Missing values are filled using the most frequent strategy.
  • Feature Engineering: The Cabin feature is split into three separate features: Cabin, Seat, and Type.
  • Encoding: Categorical features are encoded using LabelEncoder.
  • Data Splitting: The data is split into training and testing sets for model evaluation.

Model Training and Evaluation

The code demonstrates the training process for the TabNetClassifier model with specific hyperparameters. After training, the model's performance is evaluated using metrics such as accuracy and classification report.

Submission Generation

The repository includes a function to generate a submission file in the correct format for Kaggle submission. This function utilizes the trained TabNetClassifier model to predict passenger transport on the test dataset.

Data Exploration

The notebook further provides visualizations exploring relationships between features and the target variable. These visualizations can offer valuable insights into the data and potentially inform further feature engineering or model selection.

Future Work

This project serves as a baseline for passenger transport prediction on the Spaceship Titanic dataset. Several potential directions for future work include:

  • Hyperparameter Optimization: Exploring different hyperparameter combinations to improve the model's performance.
  • Model Ensembling: Combining predictions from multiple models, such as TabNet and other algorithms, to achieve better generalization.
  • Feature Engineering: Investigating additional feature engineering techniques to capture more complex relationships within the data.

Repository Contents

  • train.csv: Training dataset.
  • test.csv: Testing dataset.
  • submit.csv: Generated submission file.
  • *.ipynb: Jupyter notebook containing the code and analysis.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published