Amazon_ml_challenge-solution

link of LB - https://www.hackerearth.com/challenges/competitive/amazon-ml-challenge/leaderboard/

link of Problem statement - https://www.hackerearth.com/challenges/competitive/amazon-ml-challenge/instructions/

Team name - Machine_not_learning

Participants - Pranshu rastogi , Madhav Mathur , SumanSahoo, Kshitij Mohan

The solution to the problem that we have providing is fairly simple and produces good score on 50% of data(i.e public LB) Our approach is based on creating dense bert embeddings and using models over them

The embeddings are developed in 2 broad ways

Using pre trained Sentence bert models for better representation of the
data like we used models (prominently)
a. Paraphrase Multilingual Mpnet
b. Paraphrase-mpnet-base-v
These are some of the pre trained models that produces 768 dimensional
embeddings
Fine tuning these pre models to the training data though we couldn’t
succeed in training longer as we don't have enough training resources we
were only able to finetune it around 1 epoch which is fairly low and
produces not good results

Modeling part -

Here we first used the KNN model as a very preliminary model as compatible with our resources that we could arrange. But Knn could produce very good results (64% single model) with these embeddings.

We also tried models like SVM , RF , like models but they either took too much of our gpu memory or took more than 10 hrs to produce results that are not very good as compared to the time and computation power trade off

Final ensemble We only used a weighted ensemble of KNN prediction with embeddings from different models so indeed the more simple model the better the ensemble. reached(66.7%)

Tech stack - Pytorch for embedding generation , Rapids(Cuml)for KNN are just used for our final solution. It could be replicated in 2-3 hrs with parallel execution.

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
amazon_ml_challenge		amazon_ml_challenge
amazon_ml_challenge2		amazon_ml_challenge2
amazon_ml_challenge3		amazon_ml_challenge3
amazon_ml_challenge4		amazon_ml_challenge4
README.md		README.md
amazon-clean-dataset2 (primary dataset creation).ipynb		amazon-clean-dataset2 (primary dataset creation).ipynb
amazon-ml-challenge-dataset.ipynb		amazon-ml-challenge-dataset.ipynb
amazon-ml-inference.ipynb		amazon-ml-inference.ipynb
metric-learning-pipeline.ipynb		metric-learning-pipeline.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Amazon_ml_challenge-solution

About

Releases

Packages

Languages

pranshurastogi29/Amazon_ml_challenge-solution

Folders and files

Latest commit

History

Repository files navigation

Amazon_ml_challenge-solution

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages