Releases: rasbt/mlxtend
Releases · rasbt/mlxtend
Version 0.18.0
New Features
- The
bias_variance_decompfunction now supports optionalfit_paramsfor the estimators that are fit on bootstrap samples. (#748) - The
bias_variance_decompfunction now supports Keras estimators. (#725 via @hanzigs) - Adds new
mlxtend.classifier.OneRClassifier(One Rule Classfier) class, a simple rule-based classifier that is often used as a performance baseline or simple interpretable model. (#726 - Adds new
create_counterfactualmethod for creating counterfactuals to explain model predictions. (#740)
Changes
permutation_test(mlxtend.evaluate.permutation) ìs corrected to give the proportion of permutations whose statistic is at least as extreme as the one observed. (#721 via Florian Charlier)- Fixes the McNemar confusion matrix layout to match the convention (and documentation), swapping the upper left and lower right cells. (#744 via mmarius)
Bug Fixes
- The loss in
LogisticRegressionfor logging purposes didn't include the L2 penalty for the first weight in the weight vector (this is not the bias unit). However, since this loss function was only used for logging purposes, and the gradient remains correct, this does not have an effect on the main code. (#741) - Fixes a bug in
bias_variance_decompwhere when themseloss was used, downcasting to integers caused imprecise results for small numbers. (#749)
Version 0.17.3
New Features
- Add
predict_probakwarg to bootstrap methods, to allow bootstrapping of scoring functions that take in probability values. (#700 via Adam Li) - Add a
cell_valuesparameter tomlxtend.plotting.heatmap()to optionally suppress cell annotations by settingcell_values=False. (#703
Changes
- Implemented both
use_clonesandfit_base_estimators(previouslyrefitinEnsembleVoteClassifier) forEnsembleVoteClassifierandStackingClassifier. (#670 via Katrina Ni) - Switched to using raw strings for regex in
mlxtend.textto prevent deprecation warning in Python 3.8 (#688) - Slice data in sequential forward selection before sending to parallel backend, reducing memory consumption.
Bug Fixes
- Fixes axis DeprecationWarning in matplotlib v3.1.0 and newer. (#673)
- Fixes an issue with using
meshgridinno_information_ratefunction used by thebootstrap_point632_scorefunction for the .632+ estimate. (#688) - Fixes an issue in
fpmaxthat could lead to incorrect support values. (#692 via Steve Harenberg)
Version 0.17.2
New Features
Changes
- The previously deprecated
OnehotTransactionshas been removed in favor of theTransactionEncoder. - Removed
SparseDataFramesupport in frequent pattern mining functions in favor of pandas >=1.0's new way for working sparse data. If you usedSparseDataFrameformats, please see pandas' migration guide at https://pandas.pydata.org/pandas-docs/stable/user_guide/sparse.html#migrating (#667)
Bug Fixes
Version 0.17.1
New Features
- The
SequentialFeatureSelectornow supports using pre-specified feature sets via thefixed_featuresparameter. (#578) - Adds a new
accuracy_scorefunction tomlxtend.evaluatefor computing basic classifcation accuracy, per-class accuracy, and average per-class accuracy. (#624 via Deepan Das) StackingClassifierandStackingCVClassifiernow have adecision_functionmethod, which serves as a preferred choice overpredict_probain calculating roc_auc and average_precision scores when the meta estimator is a linear model or support vector classifier. (#634 via Qiang Gu)
Changes
- Improve the runtime performance for the
apriorifrequent itemset generating function whenlow_memory=True. Settinglow_memory=False(default) is still faster for small itemsets, butlow_memory=Truecan be much faster for large itemsets and requires less memory. Also, input validation forapriori, ̀ fpgrowthandfpmaxtakes a significant amount of time when input pandas DataFrame is large; this is now dramatically reduced when input contains boolean values (and not zeros/ones), which is the case when usingTransactionEncoder`. (#619 via Denis Barbier) - Add support for newer sparse pandas DataFrame for frequent itemset algorithms. Also, input validation for
apriori, ̀ fpgrowthandfpmax` runs much faster on sparse DataFrame when input pandas DataFrame contains integer values. (#621 via Denis Barbier) - Let
fpgrowthandfpmaxdirectly work on sparse DataFrame, they were previously converted into dense Numpy arrays. (#622 via Denis Barbier)
Bug Fixes
- Fixes a bug in
mlxtend.plotting.plot_pca_correlation_graphthat caused the explaind variances not summing up to 1. Also, improves the runtime performance of the correlation computation and adds a missing function argument for the explained variances (eigenvalues) if users provide their own principal components. (#593 via Gabriel Azevedo Ferreira) - Behavior of
fpgrowthandaprioriconsistent for edgecases such asmin_support=0. (#573 via Steve Harenberg) fpmaxreturns an empty data frame now instead of raising an error if the frequent itemset set is empty. (#573 via Steve Harenberg)- Fixes and issue in
mlxtend.plotting.plot_confusion_matrix, where the font-color choice for medium-dark cells was not ideal and hard to read. #588 via sohrabtowfighi) - The
svdmode ofmlxtend.feature_extraction.PrincipalComponentAnalysisnow also n-1 degrees of freedom instead of n d.o.f. when computing the eigenvalues to match the behavior ofeigen. #595 - Disable input validation for
StackingCVClassifierbecause it causes issues if pipelines are used as input. #606
Version 0.17.0
New Features
- Added an enhancement to the existing
iris_data()such that both the UCI Repository version of the Iris dataset as well as the corrected, original
version of the dataset can be loaded, which has a slight difference in two data points (consistent with Fisher's paper; this is also the same as in R). (via #539 via janismdhanbad) - Added optional
groupsparameter toSequentialFeatureSelectorandExhaustiveFeatureSelectorfit()methods for forwarding to sklearn CV (#537 via arc12) - Added a new
plot_pca_correlation_graphfunction to themlxtend.plottingsubmodule for plotting a PCA correlation graph. (#544 via Gabriel-Azevedo-Ferreira) - Added a
zoom_factorparameter to themlxten.plotting.plot_decision_regionfunction that allows users to zoom in and out of the decision region plots. (#545) - Added a function
fpgrowththat implements the FP-Growth algorithm for mining frequent itemsets as a drop-in replacement for the existingapriorialgorithm. (#550 via Steve Harenberg) - New
heatmapfunction inmlxtend.plotting. (#552) - Added a function
fpmaxthat implements the FP-Max algorithm for mining maximal itemsets as a drop-in replacement for thefpgrowthalgorithm. (#553 via Steve Harenberg) - New
figsizeparameter for theplot_decision_regionsfunction inmlxtend.plotting. (#555 via Mirza Hasanbasic) - New
low_memoryoption for theapriorifrequent itemset generating function. Settinglow_memory=False(default) uses a substantially optimized version of the algorithm that is 3-6x faster than the original implementation (low_memory=True). (#567 via jmayse)
Changes
- Now uses the latest joblib library under the hood for multiprocessing instead of
sklearn.externals.joblib. (#547) - Changes to
StackingCVClassifierandStackingCVRegressorsuch that first-level models are allowed to generate output of non-numeric type. (#562)
Bug Fixes
- Fixed documentation of
iris_data()underiris.pyby adding a note about differences in the iris data in R and UCI machine learning repo. - Make sure that if the
'svd'mode is used in PCA, the number of eigenvalues is the same as when using'eigen'(append 0's zeros in that case) (#565)
Version 0.16.0
New Features
StackingCVClassifierandStackingCVRegressornow supportrandom_stateparameter, which, together withshuffle, controls the randomness in the cv splitting. (#523 via Qiang Gu)StackingCVClassifierandStackingCVRegressornow have a newdrop_last_probaparameter. It drops the last "probability" column in the feature set since ifTrue,
because it is redundant: p(y_c) = 1 - p(y_1) + p(y_2) + ... + p(y_{c-1}). This can be useful for meta-classifiers that are sensitive to perfectly collinear features. (#532)- Other stacking estimators, including
StackingClassifier,StackingCVClassifierandStackingRegressor, support grid search over theregressorsand even a single base regressor. (#522 via Qiang Gu) - Adds multiprocessing support to
StackingCVClassifier. (#522 via Qiang Gu) - Adds multiprocessing support to
StackingCVRegressor. (#512 via Qiang Gu) - Now, the
StackingCVRegressoralso enables grid search over theregressorsand even a single base regressor. When there are level-mixed parameters,GridSearchCVwill try to replace hyperparameters in a top-down order (see the documentation for examples details). (#515 via Qiang Gu) - Adds a
verboseparameter toapriorito show the current iteration number as well as the itemset size currently being sampled. (#519 - Adds an optional
class_nameparameter to the confusion matrix function to display class names on the axis as tick marks. (#487 via sandpiturtle)
Changes
- Due to new features, restructuring, and better scikit-learn support (for
GridSearchCV, etc.) theStackingCVRegressor's meta regressor is now being accessed via'meta_regressor__*in the parameter grid. E.g., if aRandomForestRegressoras meta- egressor was previously tuned via'randomforestregressor__n_estimators', this has now changed to'meta_regressor__n_estimators'. (#515 via Qiang Gu) - The same change mentioned above is now applied to other stacking estimators, including
StackingClassifier,StackingCVClassifierandStackingRegressor. (#522 via Qiang Gu)
Bug Fixes
- The
feature_selection.ColumnSelectornow also supports column names of typeint(in addition tostrnames) if the input is a pandas DataFrame. (#500 via tetrar124 - Fix unreadable labels in
plot_confusion_matrixfor imbalanced datasets ifshow_absolute=Trueandshow_normed=True. (#504) - Raises a more informative error if a
SparseDataFrameis passed toaprioriand the dataframe has integer column names that don't start with0due to current limitations of theSparseDataFrameimplementation in pandas. (#503) - SequentialFeatureSelector now supports DataFrame as input for all operating modes (forward/backward/floating). #506
mlxtend.evaluate.feature_importance_permutationnow correctly accepts scoring functions with proper function signature asmetricargument. #528
Version 0.15.0
New Features
- Adds a new transformer class to
mlxtend.image,EyepadAlign, that aligns face images based on the location of the eyes. (#466 by Vahid Mirjalili) - Adds a new function,
mlxtend.evaluate.bias_variance_decompthat decomposes the loss of a regressor or classifier into bias and variance terms. (#470) - Adds a
whiteningparameter toPrincipalComponentAnalysis, to optionally whiten the transformed data such that the features have unit variance. (#475)
Changes
- Changed the default solver in
PrincipalComponentAnalysisto'svd'instead of'eigen'to improve numerical stability. (#474) - The
mlxtend.image.extract_face_landmarksnow returnsNoneif no facial landmarks were detected instead of an array of all zeros. (#466)
Bug Fixes
Version 0.14.0
New Features
- Added a
scatterplotmatrixfunction to theplottingmodule. (#437) - Added
sample_weightoption toStackingRegressor,StackingClassifier,StackingCVRegressor,StackingCVClassifier,EnsembleVoteClassifier. (#438) - Added a
RandomHoldoutSplitclass to perform a random train/valid split without rotation inSequentialFeatureSelector, scikit-learnGridSearchCVetc. (#442) - Added a
PredefinedHoldoutSplitclass to perform a train/valid split, based on user-specified indices, without rotation inSequentialFeatureSelector, scikit-learnGridSearchCVetc. (#443) - Created a new
mlxtend.imagesubmodule for working on image processing-related tasks. (#457) - Added a new convenience function
extract_face_landmarksbased ondlibtomlxtend.image. (#458) - Added a
method='oob'option to themlxtend.evaluate.bootstrap_point632_scoremethod to compute the classic out-of-bag bootstrap estimate (#459) - Added a
method='.632+'option to themlxtend.evaluate.bootstrap_point632_scoremethod to compute the .632+ bootstrap estimate that addresses the optimism bias of the .632 bootstrap (#459) - Added a new
mlxtend.evaluate.ftestfunction to perform an F-test for comparing the accuracies of two or more classification models. (#460) - Added a new
mlxtend.evaluate.combined_ftest_5x2cvfunction to perform an combined 5x2cv F-Test for comparing the performance of two models. (#461) - Added a new
mlxtend.evaluate.difference_proportionstest for comparing two proportions (e.g., classifier accuracies) (#462)
Changes
- Addressed deprecations warnings in NumPy 0.15. (#425)
- Because of complications in PR (#459), Python 2.7 was now dropped; since official support for Python 2.7 by the Python Software Foundation is ending in approx. 12 months anyways, this re-focussing will hopefully free up some developer time with regard to not having to worry about backward compatibility
Bug Fixes
- Fixed an issue with a missing import in
mlxtend.plotting.plot_confusion_matrix. (#428)
Version 0.13.0
Version 0.13.0 (07/20/2018)
New Features
- A meaningful error message is now raised when a cross-validation generator is used with
SequentialFeatureSelector. (#377) - The
SequentialFeatureSelectornow accepts custom feature names via thefitmethod for more interpretable feature subset reports. (#379) - The
SequentialFeatureSelectoris now also compatible with Pandas DataFrames and uses DataFrame column-names for more interpretable feature subset reports. (#379) ColumnSelectornow works with Pandas DataFrames columns. (#378 by Manuel Garrido)- The
ExhaustiveFeatureSelectorestimator inmlxtend.feature_selectionnow is safely stoppable mid-process by control+c. (#380) - Two new functions,
vectorspace_orthonormalizationandvectorspace_dimensionalitywere added tomlxtend.mathto use the Gram-Schmidt process to convert a set of linearly independent vectors into a set of orthonormal basis vectors, and to compute the dimensionality of a vectorspace, respectively. (#382) mlxtend.frequent_patterns.apriorinow supports pandasSparseDataFrames to generate frequent itemsets. (#404 via Daniel Morales)- The
plot_confusion_matrixfunction now has the ability to show normalized confusion matrix coefficients in addition to or instead of absolute confusion matrix coefficients with or without a colorbar. The text display method has been changed so that the full range of the colormap is used. The default size is also now set based on the number of classes. - Added support for merging the meta features with the original input features in
StackingRegressor(viause_features_in_secondary) like it is already supported in the other Stacking classes. (#418) - Added a
support_onlyto theassociation_rulesfunction, which allow constructing association rules (based on the support metric only) for cropped input DataFrames that don't contain a complete set of antecedent and consequent support values. (#421)
Changes
- Itemsets generated with
aprioriare nowfrozensets (#393 by William Laney and #394) - Now raises an error if a input DataFrame to
aprioricontains non 0, 1, True, False values. #419)
Bug Fixes
- Allow mlxtend estimators to be cloned via scikit-learn's
clonefunction. (#374) - Fixes bug to allow the correct use of
refit=FalseinStackingRegressorandStackingCVRegressor(#384 and (#385) by selay01) - Allow
StackingClassifierto work with sparse matrices whenuse_features_in_secondary=True(#408 by Floris Hoogenbook) - Allow
StackingCVRegressorto work with sparse matrices whenuse_features_in_secondary=True(#416) - Allow
StackingCVClassifierto work with sparse matrices whenuse_features_in_secondary=True(#417)
Version 0.12.0
Downloads
New Features
- A new
feature_importance_permuationfunction to compute the feature importance in classifiers and regressors via the permutation importance method (#358) - The fit method of the
ExhaustiveFeatureSelectornow optionally accepts**fit_paramsfor the estimator that is used for the feature selection. (#354 by Zach Griffith) - The fit method of the
SequentialFeatureSelectornow optionally accepts
**fit_paramsfor the estimator that is used for the feature selection. (#350 by Zach Griffith)
Changes
- Replaced
plot_decision_regionscolors by a colorblind-friendly palette and adds contour lines for decision regions. (#348) - All stacking estimators now raise
NonFittedErrorsif any method for inference is called prior to fitting the estimator. (#353) - Renamed the
refitparameter of both theStackingClassifierandStackingCVClassifiertouse_clonesto be more explicit and less misleading. (#368)
Bug Fixes
- Various changes in the documentation and documentation tools to fix formatting issues (#363)
- Fixes a bug where the
StackingCVClassifier's meta features were not stored in the original order whenshuffle=True(#370) - Many documentation improvements, including links to the User Guides in the API docs (#371)