Imputr is an open-source library that allows users to stably impute tabular data sets with ML-based and conventional techniques. It is designed to have an extremely simple, yet extensive API, making it possible for users of all levels and tasks to deploy the library in their workflows.
Install Imputr with PIP:
pip install imputrHere is an example of the simplest usage of the AutoImputer (our recommended workflow for newbies and intermediates), which by default automatically imputes the missing values for all columns with a modern version of the missForest algorithm.
from imputr import AutoImputer
import pandas as pd
# Import dataset with missing values
df = pd.read_csv("example.csv")
# Initialize AutoImputer with data
imputer = AutoImputer(data=df)
# Retrieve fully imputed dataset
imputed_df = imputer.impute()Here you can see an example of how the AutoImputer works internally.
To see what else be done with the AutoImputer API to customise its behaviour, reference our documentation.
Multiple links to documentation:
- Imputr API
- Imputr concepts
- Core class structure
- Medium blogs for more information
- Our Slack channel
- More real world examples
Imputr is an ever-evolving open source library and can always use contributors who want to help build with the community.
See the Contribution Jumpstart page to get started with your first contribution!
Imputr is distributed under an Apache License Version 2.0. A complete version can be found here. All future contributions will continue to be distributed under this license.