A collection of heterogeneous distance functions handling missing values.
-
Updated
Jan 24, 2022 - MATLAB
A collection of heterogeneous distance functions handling missing values.
A repository for various Data Science projects I've worked on, both university-related and in my spare time.
This project focuses on predicting customer churn in an e-commerce setting using machine learning techniques.
Data fetched by wafers is to be passed through the machine learning pipeline and it is to be determined whether the wafer at hand is faulty or not apparently obliterating the need and thus cost of hiring manual labour.
This repository is a collection of basic code templates for Data Preparation. All codes I am sharing are from the practical exercises I did from the Data Science Infinity Program.
Feature Engineering with Python
📘 This repository predicts OLA driver churn using ensemble methods—Bagging (Random Forest) and Boosting (XGBoost)—with KNN imputation and SMOTE. It reveals city-wise churn trends and key performance drivers, powering smarter, data-backed retention strategies for the ride-hailing industry.
This repository is totally focused on Feature Engineering Concepts in detail, I hope you'll find it helpful.
This project focuses on predicting whether a customer will default on their credit card payment in the upcoming month. Utilizing historical transaction data and customer demographics, the project employs various machine learning algorithms to distinguish between risky and non-risky customers for better credit risk management.
The company develops efficiency solutions for heavy industry. The model should predict the amount of pure gold extracted from gold ore. You have the data on extraction and purification. The model will help optimize production and eliminate unprofitable parameters.
Modelling the relationship between a player’s first-time eligible arbitration salary and multiple variables.
Machine learning models for enhanced fraud detection in e-commerce transactions, exploring feature engineering, distance prediction, and clustering analysis.
Predicting employee burnout using machine learning algorithms: Random Forest and k-Nearest Neighbors.
Data imputation is used when there are missing values in a dataset. It helps fill in these gaps with estimated values, enabling analysis and modeling. Imputation is crucial for maintaining dataset integrity and ensuring accurate insights from incomplete data.
Streamlit app developed for bank customer deposit prediction, using a fine-tuned XGBClassifier model.
Kaggle UK Used Car challenge
we perpuse a method to fill nan values using clustering
My Capstone for the HarvardX Course "Introduction to Data Science with Python"
Add a description, image, and links to the knn-imputer topic page so that developers can more easily learn about it.
To associate your repository with the knn-imputer topic, visit your repo's landing page and select "manage topics."