Use advanced feature engineering strategies and select best features from your data set with a single line of code. Created by Ram Seshadri. Collaborators welcome.
-
Updated
Feb 19, 2025 - Python
Use advanced feature engineering strategies and select best features from your data set with a single line of code. Created by Ram Seshadri. Collaborators welcome.
Fast Estimation of Linear Models with IV and High Dimensional Categorical Variables
vtreat is a data frame processor/conditioner that prepares real-world data for predictive modeling in a statistically sound manner. Distributed under choice of GPL-2 or GPL-3 license.
This package provides functions to create descriptive statistics tables for continuous and categorical variables.
Data Munging, Data Wrangling and Data Preparation Simplified
Bayesian Optimization for Categorical and Continuous Inputs
A library for the hyperparameter optimization of deep neural networks
Opinionated statistical inference engine with fluent api to make it easier for conducting statistical inference with little or no knowledge of statistical inference principles involved
A set of gretl transformers for encoding categorical variables into numeric with different techniques
How to deal with Missing Values, Categorical Variables, Pipelines, Cross-Validation, XGBoost, Data Leakage
Hypothesis-Testing-Chi2-Test-Athletes-and-Smokers. Assume Null Hypothesis as Ho: Independence of categorical variables (Athlete and Smoking not related). Thus Alternate Hypothesis as Ha: Dependence of categorical variables (Athlete and Smoking is somewhat/significantly related). As (p_value = 0.00038) < (α = 0.05); Reject Null Hypothesis i.e. De…
A Machine Learning project to predict Customer Churn including all stages of a project life cycle from data procurement to deployment.
A simple library to calculate correlation between variables. Currently provides correlation between nominal variables.
Multiple methods to (quickly) encode factor variables, using data.table
Source Code for Paper: Williams, S.Z., Zou, J., Liu, Y., Si, Y., Galea, S. and Chen, Q. (2024), Improving Survey Inference Using Administrative Records Without Releasing Individual-Level Continuous Data. Statistics in Medicine, 43: 5803-5813. https://doi.org/10.1002/sim.10270.
This is a Kaggle task inspired notebook: exploring correlation + bonus trying ppscore package
This Repo Contains Machine Learning Projects covering Supervised and Unsupervised ML algorithms. Contains solutions of various hackathon solutions (kaggle, AV , ineuron)
📖 Approaching (Almost) Any Machine Learning Problem
Feature Importance of categorical variables by converting them into dummy variables (One-hot-encoding) can skewed or hard to interpret results. Here I present a method to get around this problem using H2O.
Stata command for creating categorical variables from multiple logical conditions using power-of-two indexing
Add a description, image, and links to the categorical-variables topic page so that developers can more easily learn about it.
To associate your repository with the categorical-variables topic, visit your repo's landing page and select "manage topics."