article

Free access

Risk bounds for the majority vote: from a PAC-Bayesian analysis to a learning algorithm

Authors:

Pascal Germain,

Alexandre Lacasse,

François Laviolette,

Mario Marchand,

Jean-Francis RoyAuthors Info & Claims

The Journal of Machine Learning Research, Volume 16, Issue 1

Pages 787 - 860

Published: 01 January 2015 Publication History

PDF eReader Publisher Site

Abstract

We propose an extensive analysis of the behavior of majority votes in binary classification. In particular, we introduce a risk bound for majority votes, called the C-bound, that takes into account the average quality of the voters and their average disagreement. We also propose an extensive PAC-Bayesian analysis that shows how the C-bound can be estimated from various observations contained in the training data. The analysis intends to be self-contained and can be used as introductory material to PAC-Bayesian statistical learning theory. It starts from a general PAC-Bayesian perspective and ends with uncommon PAC-Bayesian bounds. Some of these bounds contain no Kullback-Leibler divergence and others allow kernel functions to be used as voters (via the sample compression setting). Finally, out of the analysis, we propose the MinCq learning algorithm that basically minimizes the C-bound. MinCq reduces to a simple quadratic program. Aside from being theoretically grounded, MinCq achieves state-of-the-art performance, as shown in our extensive empirical comparison with both AdaBoost and the Support Vector Machine.

References

[1]

Arindam Banerjee. On Bayesian bounds. In ICML, pages 81-88, 2006.

Digital Library

[2]

Luc Bégin, Pascal Germain, François Laviolette, and Jean-Francis Roy. PAC-Bayesian theory for transductive learning. In AISTATS, pages 105-113, 2014.

[3]

C.L. Blake and C.J. Merz. UCI Repository of Machine Learning Databases. Department of Information and Computer Science, Irvine, CA: University of California, http://www.ics.uci.edu/~mlearn/MLRepository.html, 1998.

[4]

John Blitzer, Mark Dredze, and Fernando Pereira. Biographies, bollywood, boom-boxes and blenders: Domain adaptation for sentiment classification. In Annual Meeting-Association for Computational Linguistics, volume 45, page 440, 2007.

[5]

Leo Breiman. Bagging predictors. Machine Learning, 24(2):123-140, 1996.

[6]

Leo Breiman. Random forests. Machine Learning, 45(1):5-32, 2001.

Digital Library

[7]

Olivier Catoni. PAC-Bayesian Supervised Classification: The Thermodynamics of Statistical Learning. Monograph series of the Institute of Mathematical Statistics, http://arxiv.org/abs/0712.0248, 2007.

[8]

Minmin Chen, Kilian Q. Weinberger, and John Blitzer. Co-training for domain adaptation. In NIPS, pages 2456-2464, 2011.

Digital Library

[9]

Corinna Cortes and Vladimir Vapnik. Support-vector networks. Machine Learning, 20(3): 273-297, 1995.

[10]

Thomas M. Cover and Joy A. Thomas. Elements of Information Theory, chapter 12. Wiley, 1991.

Digital Library

[11]

Nello Cristianini and John Shawe-Taylor. An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods. Cambridge University Press, Cambridge, U.K., 2000.

Digital Library

[12]

Joachim Dahl and Lieven Vandenberghe. CVXOPT, 2007. http://mloss.org/software/view/34/.

[13]

Amit Daniely, Sivan Sabato, Shai Ben-David, and Shai Shalev-Shwartz. Multiclass learnability and the ERM principle. CoRR, abs/1308.2893, 2013.

[14]

Philip Derbeko, Ran El-Yaniv, and Ron Meir. Explicit learning curves for transduction and application to clustering and compression algorithms. J. Artif. Intell. Res. (JAIR), 22: 117-142, 2004.

Digital Library

[15]

Mark Dredze, Alex Kulesza, and Koby Crammer. Multi-domain learning by confidence-weighted parameter combination. Machine Learning, 79(1-2):123-149, 2010.

Digital Library

[16]

Mahdi Milani Fard and Joelle Pineau. PAC-Bayesian model selection for reinforcement learning. In NIPS, pages 1624-1632, 2010.

[17]

Mahdi Milani Fard, Joelle Pineau, and Csaba Szepesvári. PAC-Bayesian policy evaluation for reinforcement learning. In UAI, pages 195-202, 2011.

[18]

Sally Floyd and Manfred Warmuth. Sample compression, learnability, and the Vapnik-Chervonenkis dimension. Machine Learning, 21(3):269-304, 1995.

Digital Library

[19]

A. Gelman, J.B. Carlin, H.S. Stern, and D.B. Rubin. Bayesian Data Analysis. Chapman & Hall/CRC, 2004. ISBN 9781584883883.

[20]

Pascal Germain, Alexandre Lacasse, François Laviolette, and Mario Marchand. PAC-Bayesian learning of linear classifiers. In ICML, page 45, 2009.

Digital Library

[21]

Pascal Germain, Alexandre Lacoste, François Laviolette, Mario Marchand, and Sara Shanian. A PAC-Bayes sample-compression approach to kernel methods. In ICML, pages 297-304, 2011.

[22]

Pascal Germain, Amaury Habrard, François Laviolette, and Emilie Morvant. A PAC-Bayesian approach for domain adaptation with specialization to linear classifiers. In ICML (3), pages 738-746, 2013.

[23]

Sébastien Giguère, François Laviolette, Mario Marchand, and Khadidja Sylla. Risk bounds and learning algorithms for the regression approach to structured output prediction. In ICML (1), pages 107-114, 2013.

[24]

Matthew Higgs and John Shawe-Taylor. A PAC-Bayes bound for tailored density estimation. In ALT, pages 148-162, 2010.

Digital Library

[25]

Alexandre Lacasse, François Laviolette, Mario Marchand, Pascal Germain, and Nicolas Usunier. PAC-Bayes bounds for the risk of the majority vote and the variance of the Gibbs classifier. In NIPS, pages 769-776, 2006.

[26]

Alexandre Lacoste, François Laviolette, and Mario Marchand. Bayesian comparison of machine learning algorithms on single and multiple datasets. In AISTATS, pages 665- 675, 2012.

[27]

John Langford. Tutorial on practical prediction theory for classification. Journal of Machine Learning Research, 6:273-306, 2005.

Digital Library

[28]

John Langford and Matthias Seeger. Bounds for averaging classifiers. Technical report, Carnegie Mellon, Departement of Computer Science, 2001.

[29]

John Langford and John Shawe-Taylor. PAC-Bayes & margins. In NIPS, pages 423-430, 2002.

[30]

François Laviolette and Mario Marchand. PAC-Bayes risk bounds for sample-compressed Gibbs classifiers. In ICML, pages 481-488, 2005.

Digital Library

[31]

François Laviolette and Mario Marchand. PAC-Bayes risk bounds for stochastic averages and majority votes of sample-compressed classifiers. Journal of Machine Learning Research, 8:1461-1487, 2007.

Digital Library

[32]

François Laviolette, Mario Marchand, and Jean-Francis Roy. From PAC-Bayes bounds to quadratic programs for majority votes. In ICML, pages 649-656, 2011.

[33]

Yann Lecun and Corinna Cortes. The MNIST database of handwritten digits. URL http://yann.lecun.com/exdb/mnist/.

[34]

Guy Lever, François Laviolette, and John Shawe-Taylor. Distribution-dependent PAC-Bayes priors. In ALT, pages 119-133, 2010.

Digital Library

[35]

Guy Lever, François Laviolette, and John Shawe-Taylor. Tighter PAC-Bayes bounds through distribution-dependent priors. Theoretical Computer Science, 473:4-28, 2013.

Digital Library

[36]

Ben London, Bert Huang, Benjamin Taskar, and Lise Getoor. PAC-Bayesian collective stability. In AISTATS, pages 585-594, 2014.

[37]

Andreas Maurer. A note on the PAC-Bayesian theorem. CoRR, cs.LG/0411099, 2004.

[38]

David McAllester. Some PAC-Bayesian theorems. Machine Learning, 37(3):355-363, 1999.

Digital Library

[39]

David McAllester. PAC-Bayesian stochastic model selection. Machine Learning, 51(1):5-21, 2003a.

Digital Library

[40]

David McAllester. Simplified PAC-Bayesian margin bounds. In COLT, pages 203-215, 2003b.

[41]

David McAllester. Generalization bounds and consistency for structured labeling. In Gökhan Bakir, Thomas Hofmann, Bernhard Schölkopf, Alexander J. Smola, Ben Taskar, and S. V. N. Vishwanathan, editors, Predicting Structured Data, chapter 11, pages 247- 261. MIT Press, Cambridge, MA, 2007.

[42]

David McAllester. A PAC-Bayesian tutorial with a dropout bound. CoRR, abs/1307.2118, 2013.

[43]

W. Mendenhall. Nonparametric statistics. Introduction to Probability and Statistics, 604, 1983.

[44]

F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research, 12:2825-2830, 2011.

Digital Library

[45]

Liva Ralaivola, Marie Szafranski, and Guillaume Stempfel. Chromatic PAC-Bayes bounds for non-iid data: Applications to ranking and stationary β-mixing processes. Journal of Machine Learning Research, 11:1927-1956, 2010.

Digital Library

[46]

Gerard Salton and Christopher Buckley. Term-weighting approaches in automatic text retrieval. Information Processing & Management, 24(5):513-523, 1988.

Digital Library

[47]

Robert E. Schapire and Yoram Singer. Improved boosting using confidence-rated predictions. Machine Learning, 37(3):297-336, 1999.

Digital Library

[48]

Robert E. Schapire, Yoav Freund, Peter Bartlett, and Wee Sun Lee. Boosting the margin: A new explanation for the effectiveness of voting methods. The Annals of Statistics, 26: 1651-1686, 1998.

[49]

Bernhard Schölkopf, Ralf Herbrich, and Alex J. Smola. A generalized representer theorem. In COLT/EuroCOLT, pages 416-426, 2001.

Digital Library

[50]

Matthias Seeger. PAC-Bayesian generalization bounds for Gaussian processes. Journal of Machine Learning Research, 3:233-269, 2002.

Digital Library

[51]

Matthias Seeger. Bayesian Gaussian Process Models: PAC-Bayesian Generalisation Error Bounds and Sparse Approximations. PhD thesis, University of Edinburgh, 2003.

[52]

Yevgeny Seldin and Naftali Tishby. PAC-Bayesian generalization bound for density estimation with application to co-clustering. In AISTATS, pages 472-479, 2009.

[53]

Yevgeny Seldin and Naftali Tishby. PAC-Bayesian analysis of co-clustering and beyond. Journal of Machine Learning Research, 11:3595-3646, 2010.

Digital Library

[54]

Yevgeny Seldin, Peter Auer, François Laviolette, John Shawe-Taylor, and Ronald Ortner. PAC-Bayesian analysis of contextual bandits. In NIPS, pages 1683-1691, 2011.

Digital Library

[55]

Yevgeny Seldin, François Laviolette, Nicolò Cesa-Bianchi, John Shawe-Taylor, and Peter Auer. PAC-Bayesian inequalities for martingales. IEEE Transactions on Information Theory, 58(12):7086-7093, 2012.

Digital Library

[56]

Chunhua Shen and Hanxi Li. Boosting through optimization of margin distributions. IEEE Transactions on Neural Networks, 21(4):659-666, 2010.

Digital Library

[57]

Ilya O. Tolstikhin and Yevgeny Seldin. PAC-Bayes-empirical-bernstein inequality. In NIPS, pages 109-117, 2013.

[58]

Malik Younsi. Proof of a combinatorial conjecture coming from the PAC-Bayesian machine learning theory. ArXiv E-Prints, 2012. URL http://arxiv.org/abs/1209.0824v1.

Cited By

Bai SWang DMuller TCheng PChen JDastani MSichman JAlechina NDignum V(2024)Stability of Weighted Majority Voting under Estimated WeightsProceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems10.5555/3635637.3662856(96-104)Online publication date: 6-May-2024
https://dl.acm.org/doi/10.5555/3635637.3662856
Aminian GBu YToni LRodrigues MWornell G(2024)Information-Theoretic Characterizations of Generalization Error for the Gibbs AlgorithmIEEE Transactions on Information Theory10.1109/TIT.2023.332961770:1(632-655)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1109/TIT.2023.3329617
Theisen RKim HYang YHodgkinson LMahoney MOh ANaumann TGloberson ASaenko KHardt MLevine S(2023)When are ensembles really effective?Proceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3666781(15015-15026)Online publication date: 10-Dec-2023
https://dl.acm.org/doi/10.5555/3666122.3666781
Show More Cited By

Risk bounds for the majority vote: from a PAC-Bayesian analysis to a learning algorithm
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Supervised learning
    2. Machine learning approaches

Recommendations

Erratum: risk bounds for the majority vote: from a PAC-Bayesian analysis to a learning algorithm

This work shows that the demonstration of Proposition 15 of Germain et al. (2015) is awed and the proposition is false in a general setting. This proposition gave an inequality that upper-bounds the variance of the margin of a weighted majority vote ...
Risk upper bounds for general ensemble methods with an application to multiclass classification

This paper generalizes a pivotal result from the PAC-Bayesian literature -the C - bound - primarily designed for binary classification to the general case of ensemble methods of voters with arbitrary outputs. We provide a generic version of the C - ...
Classifier Ensemble Generation for the Majority Vote Rule
CIARP '08: Proceedings of the 13th Iberoamerican congress on Pattern Recognition: Progress in Pattern Recognition, Image Analysis and Applications

This paper addresses the problem of classifier ensemble generation. The goal is to obtain an ensemble to achieve maximum recognition gains with the lowest number of classifiers. The final decision is taken following a majority vote rule. If the ...

Comments

Information & Contributors

Information

Published In

The Journal of Machine Learning Research Volume 16, Issue 1

January 2015

3855 pages

ISSN:1532-4435

EISSN:1533-7928

Editors:
Kevin Murphy
Google
,
Bernhard Schölkopf
MPI for Intelligent Systems

Issue’s Table of Contents

Publisher

JMLR.org

Publication History

Published: 01 January 2015

Revised: 01 September 2014

Published in JMLR Volume 16, Issue 1

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

27
Total Citations
View Citations
266
Total Downloads

Downloads (Last 12 months)66
Downloads (Last 6 weeks)40

Reflects downloads up to 21 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Bai SWang DMuller TCheng PChen JDastani MSichman JAlechina NDignum V(2024)Stability of Weighted Majority Voting under Estimated WeightsProceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems10.5555/3635637.3662856(96-104)Online publication date: 6-May-2024
https://dl.acm.org/doi/10.5555/3635637.3662856
Aminian GBu YToni LRodrigues MWornell G(2024)Information-Theoretic Characterizations of Generalization Error for the Gibbs AlgorithmIEEE Transactions on Information Theory10.1109/TIT.2023.332961770:1(632-655)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1109/TIT.2023.3329617
Theisen RKim HYang YHodgkinson LMahoney MOh ANaumann TGloberson ASaenko KHardt MLevine S(2023)When are ensembles really effective?Proceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3666781(15015-15026)Online publication date: 10-Dec-2023
https://dl.acm.org/doi/10.5555/3666122.3666781
Bauvin BCapponi CClerc FGermain PKoço SCorbeil JEvans RShpitser I(2023)Sample boosting algorithm (SamBA) - an interpretable greedy ensemble classifier based on local expertise for fat dataProceedings of the Thirty-Ninth Conference on Uncertainty in Artificial Intelligence10.5555/3625834.3625847(130-140)Online publication date: 31-Jul-2023
https://dl.acm.org/doi/10.5555/3625834.3625847
Bai SWang DMuller TCheng PChen JAgmon NAn BRicci AYeoh W(2023)Stability of Weighted Majority Voting under Estimated WeightsProceedings of the 2023 International Conference on Autonomous Agents and Multiagent Systems10.5555/3545946.3599045(2691-2693)Online publication date: 30-May-2023
https://dl.acm.org/doi/10.5555/3545946.3599045
Mai T(2023)A reduced-rank approach to predicting multiple binary responses through machine learningStatistics and Computing10.1007/s11222-023-10314-333:6Online publication date: 13-Oct-2023
https://dl.acm.org/doi/10.1007/s11222-023-10314-3
Biggs FZantedeschi VGuedj BKoyejo SMohamed SAgarwal ABelgrave DCho KOh A(2022)On margins and generalisation for voting classifiersProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3600976(9713-9726)Online publication date: 28-Nov-2022
https://dl.acm.org/doi/10.5555/3600270.3600976
Luo TLiu Y(2022)Machine truth serum: a surprisingly popular approach to improving ensemble methodsMachine Language10.1007/s10994-022-06183-y112:3(789-815)Online publication date: 12-Jul-2022
https://dl.acm.org/doi/10.1007/s10994-022-06183-y
Foong ABruinsma WBurt DTurner RRanzato MBeygelzimer ADauphin YLiang PVaughan J(2021)How tight can PAC-Bayes be in the small data regime?Proceedings of the 35th International Conference on Neural Information Processing Systems10.5555/3540261.3540574(4093-4105)Online publication date: 6-Dec-2021
https://dl.acm.org/doi/10.5555/3540261.3540574
Ma JDeng JMei QRanzato MBeygelzimer ADauphin YLiang PVaughan J(2021)Subgroup generalization and fairness of graph neural networksProceedings of the 35th International Conference on Neural Information Processing Systems10.5555/3540261.3540342(1048-1061)Online publication date: 6-Dec-2021
https://dl.acm.org/doi/10.5555/3540261.3540342
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Media

Figures

Other

Tables

View Issue’s Table of Contents