Releases · rasbt/mlxtend

New Features

The bias_variance_decomp function now supports optional fit_params for the estimators that are fit on bootstrap samples. (#748)
The bias_variance_decomp function now supports Keras estimators. (#725 via @hanzigs)
Adds new mlxtend.classifier.OneRClassifier (One Rule Classfier) class, a simple rule-based classifier that is often used as a performance baseline or simple interpretable model. (#726
Adds new create_counterfactual method for creating counterfactuals to explain model predictions. (#740)

Changes

permutation_test (mlxtend.evaluate.permutation) ìs corrected to give the proportion of permutations whose statistic is at least as extreme as the one observed. (#721 via Florian Charlier)
Fixes the McNemar confusion matrix layout to match the convention (and documentation), swapping the upper left and lower right cells. (#744 via mmarius)

Bug Fixes

The loss in LogisticRegression for logging purposes didn't include the L2 penalty for the first weight in the weight vector (this is not the bias unit). However, since this loss function was only used for logging purposes, and the gradient remains correct, this does not have an effect on the main code. (#741)
Fixes a bug in bias_variance_decomp where when the mse loss was used, downcasting to integers caused imprecise results for small numbers. (#749)

New Features

Add predict_proba kwarg to bootstrap methods, to allow bootstrapping of scoring functions that take in probability values. (#700 via Adam Li)
Add a cell_values parameter to mlxtend.plotting.heatmap() to optionally suppress cell annotations by setting cell_values=False. (#703

Changes

Implemented both use_clones and fit_base_estimators (previously refit in EnsembleVoteClassifier) for EnsembleVoteClassifier and StackingClassifier. (#670 via Katrina Ni)
Switched to using raw strings for regex in mlxtend.text to prevent deprecation warning in Python 3.8 (#688)
Slice data in sequential forward selection before sending to parallel backend, reducing memory consumption.

Bug Fixes

Fixes axis DeprecationWarning in matplotlib v3.1.0 and newer. (#673)
Fixes an issue with using meshgrid in no_information_rate function used by the bootstrap_point632_score function for the .632+ estimate. (#688)
Fixes an issue in fpmax that could lead to incorrect support values. (#692 via Steve Harenberg)

New Features

Changes

The previously deprecated OnehotTransactions has been removed in favor of the TransactionEncoder.
Removed SparseDataFrame support in frequent pattern mining functions in favor of pandas >=1.0's new way for working sparse data. If you used SparseDataFrame formats, please see pandas' migration guide at https://pandas.pydata.org/pandas-docs/stable/user_guide/sparse.html#migrating (#667)

Bug Fixes

New Features

The SequentialFeatureSelector now supports using pre-specified feature sets via the fixed_features parameter. (#578)
Adds a new accuracy_score function to mlxtend.evaluate for computing basic classifcation accuracy, per-class accuracy, and average per-class accuracy. (#624 via Deepan Das)
StackingClassifier and StackingCVClassifiernow have a decision_function method, which serves as a preferred choice over predict_proba in calculating roc_auc and average_precision scores when the meta estimator is a linear model or support vector classifier. (#634 via Qiang Gu)

Changes

Improve the runtime performance for the apriori frequent itemset generating function when low_memory=True. Setting low_memory=False (default) is still faster for small itemsets, but low_memory=True can be much faster for large itemsets and requires less memory. Also, input validation for apriori, ̀ fpgrowthandfpmaxtakes a significant amount of time when input pandas DataFrame is large; this is now dramatically reduced when input contains boolean values (and not zeros/ones), which is the case when usingTransactionEncoder`. (#619 via Denis Barbier)
Add support for newer sparse pandas DataFrame for frequent itemset algorithms. Also, input validation for apriori, ̀ fpgrowthandfpmax` runs much faster on sparse DataFrame when input pandas DataFrame contains integer values. (#621 via Denis Barbier)
Let fpgrowth and fpmax directly work on sparse DataFrame, they were previously converted into dense Numpy arrays. (#622 via Denis Barbier)

Bug Fixes

Fixes a bug in mlxtend.plotting.plot_pca_correlation_graph that caused the explaind variances not summing up to 1. Also, improves the runtime performance of the correlation computation and adds a missing function argument for the explained variances (eigenvalues) if users provide their own principal components. (#593 via Gabriel Azevedo Ferreira)
Behavior of fpgrowth and apriori consistent for edgecases such as min_support=0. (#573 via Steve Harenberg)
fpmax returns an empty data frame now instead of raising an error if the frequent itemset set is empty. (#573 via Steve Harenberg)
Fixes and issue in mlxtend.plotting.plot_confusion_matrix, where the font-color choice for medium-dark cells was not ideal and hard to read. #588 via sohrabtowfighi)
The svd mode of mlxtend.feature_extraction.PrincipalComponentAnalysis now also n-1 degrees of freedom instead of n d.o.f. when computing the eigenvalues to match the behavior of eigen. #595
Disable input validation for StackingCVClassifier because it causes issues if pipelines are used as input. #606

New Features

Added an enhancement to the existing iris_data() such that both the UCI Repository version of the Iris dataset as well as the corrected, original
version of the dataset can be loaded, which has a slight difference in two data points (consistent with Fisher's paper; this is also the same as in R). (via #539 via janismdhanbad)
Added optional groups parameter to SequentialFeatureSelector and ExhaustiveFeatureSelector fit() methods for forwarding to sklearn CV (#537 via arc12)
Added a new plot_pca_correlation_graph function to the mlxtend.plotting submodule for plotting a PCA correlation graph. (#544 via Gabriel-Azevedo-Ferreira)
Added a zoom_factor parameter to the mlxten.plotting.plot_decision_region function that allows users to zoom in and out of the decision region plots. (#545)
Added a function fpgrowth that implements the FP-Growth algorithm for mining frequent itemsets as a drop-in replacement for the existing apriori algorithm. (#550 via Steve Harenberg)
New heatmap function in mlxtend.plotting. (#552)
Added a function fpmax that implements the FP-Max algorithm for mining maximal itemsets as a drop-in replacement for the fpgrowth algorithm. (#553 via Steve Harenberg)
New figsize parameter for the plot_decision_regions function in mlxtend.plotting. (#555 via Mirza Hasanbasic)
New low_memory option for the apriori frequent itemset generating function. Setting low_memory=False (default) uses a substantially optimized version of the algorithm that is 3-6x faster than the original implementation (low_memory=True). (#567 via jmayse)

Changes

Now uses the latest joblib library under the hood for multiprocessing instead of sklearn.externals.joblib. (#547)
Changes to StackingCVClassifier and StackingCVRegressor such that first-level models are allowed to generate output of non-numeric type. (#562)

Bug Fixes

Fixed documentation of iris_data() under iris.py by adding a note about differences in the iris data in R and UCI machine learning repo.
Make sure that if the 'svd' mode is used in PCA, the number of eigenvalues is the same as when using 'eigen' (append 0's zeros in that case) (#565)

New Features

StackingCVClassifier and StackingCVRegressor now support random_state parameter, which, together with shuffle, controls the randomness in the cv splitting. (#523 via Qiang Gu)
StackingCVClassifier and StackingCVRegressor now have a new drop_last_proba parameter. It drops the last "probability" column in the feature set since if True,
because it is redundant: p(y_c) = 1 - p(y_1) + p(y_2) + ... + p(y_{c-1}). This can be useful for meta-classifiers that are sensitive to perfectly collinear features. (#532)
Other stacking estimators, including StackingClassifier, StackingCVClassifier and StackingRegressor, support grid search over the regressors and even a single base regressor. (#522 via Qiang Gu)
Adds multiprocessing support to StackingCVClassifier. (#522 via Qiang Gu)
Adds multiprocessing support to StackingCVRegressor. (#512 via Qiang Gu)
Now, the StackingCVRegressor also enables grid search over the regressors and even a single base regressor. When there are level-mixed parameters, GridSearchCV will try to replace hyperparameters in a top-down order (see the documentation for examples details). (#515 via Qiang Gu)
Adds a verbose parameter to apriori to show the current iteration number as well as the itemset size currently being sampled. (#519
Adds an optional class_name parameter to the confusion matrix function to display class names on the axis as tick marks. (#487 via sandpiturtle)

Changes

Due to new features, restructuring, and better scikit-learn support (for GridSearchCV, etc.) the StackingCVRegressor's meta regressor is now being accessed via 'meta_regressor__* in the parameter grid. E.g., if a RandomForestRegressor as meta- egressor was previously tuned via 'randomforestregressor__n_estimators', this has now changed to 'meta_regressor__n_estimators'. (#515 via Qiang Gu)
The same change mentioned above is now applied to other stacking estimators, including StackingClassifier, StackingCVClassifier and StackingRegressor. (#522 via Qiang Gu)

Bug Fixes

The feature_selection.ColumnSelector now also supports column names of type int (in addition to str names) if the input is a pandas DataFrame. (#500 via tetrar124
Fix unreadable labels in plot_confusion_matrix for imbalanced datasets if show_absolute=True and show_normed=True. (#504)
Raises a more informative error if a SparseDataFrame is passed to apriori and the dataframe has integer column names that don't start with 0 due to current limitations of the SparseDataFrame implementation in pandas. (#503)
SequentialFeatureSelector now supports DataFrame as input for all operating modes (forward/backward/floating). #506
mlxtend.evaluate.feature_importance_permutation now correctly accepts scoring functions with proper function signature as metric argument. #528

New Features

Adds a new transformer class to mlxtend.image, EyepadAlign, that aligns face images based on the location of the eyes. (#466 by Vahid Mirjalili)
Adds a new function, mlxtend.evaluate.bias_variance_decomp that decomposes the loss of a regressor or classifier into bias and variance terms. (#470)
Adds a whitening parameter to PrincipalComponentAnalysis, to optionally whiten the transformed data such that the features have unit variance. (#475)

Changes

Changed the default solver in PrincipalComponentAnalysis to 'svd' instead of 'eigen' to improve numerical stability. (#474)
The mlxtend.image.extract_face_landmarks now returns None if no facial landmarks were detected instead of an array of all zeros. (#466)

Bug Fixes

The eigenvectors maybe have not been sorted in certain edge cases if solver was 'eigen' in PrincipalComponentAnalysis and LinearDiscriminantAnalysis. (#477, #478)

New Features

Added a scatterplotmatrix function to the plotting module. (#437)
Added sample_weight option to StackingRegressor, StackingClassifier, StackingCVRegressor, StackingCVClassifier, EnsembleVoteClassifier. (#438)
Added a RandomHoldoutSplit class to perform a random train/valid split without rotation in SequentialFeatureSelector, scikit-learn GridSearchCV etc. (#442)
Added a PredefinedHoldoutSplit class to perform a train/valid split, based on user-specified indices, without rotation in SequentialFeatureSelector, scikit-learn GridSearchCV etc. (#443)
Created a new mlxtend.image submodule for working on image processing-related tasks. (#457)
Added a new convenience function extract_face_landmarks based on dlib to mlxtend.image. (#458)
Added a method='oob' option to the mlxtend.evaluate.bootstrap_point632_score method to compute the classic out-of-bag bootstrap estimate (#459)
Added a method='.632+' option to the mlxtend.evaluate.bootstrap_point632_score method to compute the .632+ bootstrap estimate that addresses the optimism bias of the .632 bootstrap (#459)
Added a new mlxtend.evaluate.ftest function to perform an F-test for comparing the accuracies of two or more classification models. (#460)
Added a new mlxtend.evaluate.combined_ftest_5x2cv function to perform an combined 5x2cv F-Test for comparing the performance of two models. (#461)
Added a new mlxtend.evaluate.difference_proportions test for comparing two proportions (e.g., classifier accuracies) (#462)

Changes

Addressed deprecations warnings in NumPy 0.15. (#425)
Because of complications in PR (#459), Python 2.7 was now dropped; since official support for Python 2.7 by the Python Software Foundation is ending in approx. 12 months anyways, this re-focussing will hopefully free up some developer time with regard to not having to worry about backward compatibility

Bug Fixes

Fixed an issue with a missing import in mlxtend.plotting.plot_confusion_matrix. (#428)

Version 0.13.0 (07/20/2018)

New Features

A meaningful error message is now raised when a cross-validation generator is used with SequentialFeatureSelector. (#377)
The SequentialFeatureSelector now accepts custom feature names via the fit method for more interpretable feature subset reports. (#379)
The SequentialFeatureSelector is now also compatible with Pandas DataFrames and uses DataFrame column-names for more interpretable feature subset reports. (#379)
ColumnSelector now works with Pandas DataFrames columns. (#378 by Manuel Garrido)
The ExhaustiveFeatureSelector estimator in mlxtend.feature_selection now is safely stoppable mid-process by control+c. (#380)
Two new functions, vectorspace_orthonormalization and vectorspace_dimensionality were added to mlxtend.math to use the Gram-Schmidt process to convert a set of linearly independent vectors into a set of orthonormal basis vectors, and to compute the dimensionality of a vectorspace, respectively. (#382)
mlxtend.frequent_patterns.apriori now supports pandas SparseDataFrames to generate frequent itemsets. (#404 via Daniel Morales)
The plot_confusion_matrix function now has the ability to show normalized confusion matrix coefficients in addition to or instead of absolute confusion matrix coefficients with or without a colorbar. The text display method has been changed so that the full range of the colormap is used. The default size is also now set based on the number of classes.
Added support for merging the meta features with the original input features in StackingRegressor (via use_features_in_secondary) like it is already supported in the other Stacking classes. (#418)
Added a support_only to the association_rules function, which allow constructing association rules (based on the support metric only) for cropped input DataFrames that don't contain a complete set of antecedent and consequent support values. (#421)

Changes

Itemsets generated with apriori are now frozensets (#393 by William Laney and #394)
Now raises an error if a input DataFrame to apriori contains non 0, 1, True, False values. #419)

Bug Fixes

Allow mlxtend estimators to be cloned via scikit-learn's clone function. (#374)
Fixes bug to allow the correct use of refit=False in StackingRegressor and StackingCVRegressor (#384 and (#385) by selay01)
Allow StackingClassifier to work with sparse matrices when use_features_in_secondary=True (#408 by Floris Hoogenbook)
Allow StackingCVRegressor to work with sparse matrices when use_features_in_secondary=True (#416)
Allow StackingCVClassifier to work with sparse matrices when use_features_in_secondary=True (#417)

Downloads

New Features

A new feature_importance_permuation function to compute the feature importance in classifiers and regressors via the permutation importance method (#358)
The fit method of the ExhaustiveFeatureSelector now optionally accepts **fit_params for the estimator that is used for the feature selection. (#354 by Zach Griffith)
The fit method of the SequentialFeatureSelector now optionally accepts
**fit_params for the estimator that is used for the feature selection. (#350 by Zach Griffith)

Changes

Replaced plot_decision_regions colors by a colorblind-friendly palette and adds contour lines for decision regions. (#348)
All stacking estimators now raise NonFittedErrors if any method for inference is called prior to fitting the estimator. (#353)
Renamed the refit parameter of both the StackingClassifier and StackingCVClassifier to use_clones to be more explicit and less misleading. (#368)

Bug Fixes

Various changes in the documentation and documentation tools to fix formatting issues (#363)
Fixes a bug where the StackingCVClassifier's meta features were not stored in the original order when shuffle=True (#370)
Many documentation improvements, including links to the User Guides in the API docs (#371)

Releases: rasbt/mlxtend

Version 0.18.0

New Features

Changes

Bug Fixes

Uh oh!

Version 0.17.3

New Features

Changes

Bug Fixes

Uh oh!

Version 0.17.2

New Features

Changes

Bug Fixes

Uh oh!

Version 0.17.1

New Features

Changes

Bug Fixes

Uh oh!

Version 0.17.0

New Features

Changes

Bug Fixes

Uh oh!

Version 0.16.0

New Features

Changes

Bug Fixes

Uh oh!

Version 0.15.0

New Features

Changes

Bug Fixes

Uh oh!

Version 0.14.0

New Features

Changes

Bug Fixes

Uh oh!

Version 0.13.0

Version 0.13.0 (07/20/2018)

New Features

Changes

Bug Fixes

Uh oh!

Version 0.12.0

Downloads

New Features

Changes

Bug Fixes

Uh oh!