Skip to content

TUBerlin-project/janesa

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Jane Street Market Prediction

This repository contains a data analysis project for the Jane Street Market Prediction Kaggle competition. The goal of this competition is to create a model that predicts whether to accept or reject a trade.

Project Overview

The project is structured in three Jupyter notebooks that cover the same analysis:

  • janes-stock.ipynb: Main notebook with the complete analysis.
  • Clustering+PCA: Notebook focused on clustering and PCA.
  • feature+pca: Notebook focused on feature engineering and PCA.

The analysis includes the following steps:

  1. Data Loading and Optimization: The training data is loaded from train.csv, and its memory usage is optimized by converting float64 columns to float32.

  2. Feature Engineering:

    • Clustering: K-Means clustering is used to group similar features together.
    • PCA: Principal Component Analysis (PCA) is applied to reduce the dimensionality of the feature space.
  3. Correlation Analysis: The correlation between the principal components and the target variables (action, weight, and resp) is analyzed.

Getting Started

To run the analysis, you will need to have Python 3 and Jupyter Notebook installed. You will also need to install the following libraries:

  • numpy
  • pandas
  • matplotlib
  • scikit-learn

You can install these libraries using pip:

pip install numpy pandas matplotlib scikit-learn

Once you have installed the dependencies, you can run the Jupyter notebooks in this repository.

About

Jane Street Market Prediction

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors