skip to main content
article
Free access

Risk bounds for the majority vote: from a PAC-Bayesian analysis to a learning algorithm

Published: 01 January 2015 Publication History

Abstract

We propose an extensive analysis of the behavior of majority votes in binary classification. In particular, we introduce a risk bound for majority votes, called the C-bound, that takes into account the average quality of the voters and their average disagreement. We also propose an extensive PAC-Bayesian analysis that shows how the C-bound can be estimated from various observations contained in the training data. The analysis intends to be self-contained and can be used as introductory material to PAC-Bayesian statistical learning theory. It starts from a general PAC-Bayesian perspective and ends with uncommon PAC-Bayesian bounds. Some of these bounds contain no Kullback-Leibler divergence and others allow kernel functions to be used as voters (via the sample compression setting). Finally, out of the analysis, we propose the MinCq learning algorithm that basically minimizes the C-bound. MinCq reduces to a simple quadratic program. Aside from being theoretically grounded, MinCq achieves state-of-the-art performance, as shown in our extensive empirical comparison with both AdaBoost and the Support Vector Machine.

References

[1]
Arindam Banerjee. On Bayesian bounds. In ICML, pages 81-88, 2006.
[2]
Luc Bégin, Pascal Germain, François Laviolette, and Jean-Francis Roy. PAC-Bayesian theory for transductive learning. In AISTATS, pages 105-113, 2014.
[3]
C.L. Blake and C.J. Merz. UCI Repository of Machine Learning Databases. Department of Information and Computer Science, Irvine, CA: University of California, http://www.ics.uci.edu/~mlearn/MLRepository.html, 1998.
[4]
John Blitzer, Mark Dredze, and Fernando Pereira. Biographies, bollywood, boom-boxes and blenders: Domain adaptation for sentiment classification. In Annual Meeting-Association for Computational Linguistics, volume 45, page 440, 2007.
[5]
Leo Breiman. Bagging predictors. Machine Learning, 24(2):123-140, 1996.
[6]
Leo Breiman. Random forests. Machine Learning, 45(1):5-32, 2001.
[7]
Olivier Catoni. PAC-Bayesian Supervised Classification: The Thermodynamics of Statistical Learning. Monograph series of the Institute of Mathematical Statistics, http://arxiv.org/abs/0712.0248, 2007.
[8]
Minmin Chen, Kilian Q. Weinberger, and John Blitzer. Co-training for domain adaptation. In NIPS, pages 2456-2464, 2011.
[9]
Corinna Cortes and Vladimir Vapnik. Support-vector networks. Machine Learning, 20(3): 273-297, 1995.
[10]
Thomas M. Cover and Joy A. Thomas. Elements of Information Theory, chapter 12. Wiley, 1991.
[11]
Nello Cristianini and John Shawe-Taylor. An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods. Cambridge University Press, Cambridge, U.K., 2000.
[12]
Joachim Dahl and Lieven Vandenberghe. CVXOPT, 2007. http://mloss.org/software/view/34/.
[13]
Amit Daniely, Sivan Sabato, Shai Ben-David, and Shai Shalev-Shwartz. Multiclass learnability and the ERM principle. CoRR, abs/1308.2893, 2013.
[14]
Philip Derbeko, Ran El-Yaniv, and Ron Meir. Explicit learning curves for transduction and application to clustering and compression algorithms. J. Artif. Intell. Res. (JAIR), 22: 117-142, 2004.
[15]
Mark Dredze, Alex Kulesza, and Koby Crammer. Multi-domain learning by confidence-weighted parameter combination. Machine Learning, 79(1-2):123-149, 2010.
[16]
Mahdi Milani Fard and Joelle Pineau. PAC-Bayesian model selection for reinforcement learning. In NIPS, pages 1624-1632, 2010.
[17]
Mahdi Milani Fard, Joelle Pineau, and Csaba Szepesvári. PAC-Bayesian policy evaluation for reinforcement learning. In UAI, pages 195-202, 2011.
[18]
Sally Floyd and Manfred Warmuth. Sample compression, learnability, and the Vapnik-Chervonenkis dimension. Machine Learning, 21(3):269-304, 1995.
[19]
A. Gelman, J.B. Carlin, H.S. Stern, and D.B. Rubin. Bayesian Data Analysis. Chapman & Hall/CRC, 2004. ISBN 9781584883883.
[20]
Pascal Germain, Alexandre Lacasse, François Laviolette, and Mario Marchand. PAC-Bayesian learning of linear classifiers. In ICML, page 45, 2009.
[21]
Pascal Germain, Alexandre Lacoste, François Laviolette, Mario Marchand, and Sara Shanian. A PAC-Bayes sample-compression approach to kernel methods. In ICML, pages 297-304, 2011.
[22]
Pascal Germain, Amaury Habrard, François Laviolette, and Emilie Morvant. A PAC-Bayesian approach for domain adaptation with specialization to linear classifiers. In ICML (3), pages 738-746, 2013.
[23]
Sébastien Giguère, François Laviolette, Mario Marchand, and Khadidja Sylla. Risk bounds and learning algorithms for the regression approach to structured output prediction. In ICML (1), pages 107-114, 2013.
[24]
Matthew Higgs and John Shawe-Taylor. A PAC-Bayes bound for tailored density estimation. In ALT, pages 148-162, 2010.
[25]
Alexandre Lacasse, François Laviolette, Mario Marchand, Pascal Germain, and Nicolas Usunier. PAC-Bayes bounds for the risk of the majority vote and the variance of the Gibbs classifier. In NIPS, pages 769-776, 2006.
[26]
Alexandre Lacoste, François Laviolette, and Mario Marchand. Bayesian comparison of machine learning algorithms on single and multiple datasets. In AISTATS, pages 665- 675, 2012.
[27]
John Langford. Tutorial on practical prediction theory for classification. Journal of Machine Learning Research, 6:273-306, 2005.
[28]
John Langford and Matthias Seeger. Bounds for averaging classifiers. Technical report, Carnegie Mellon, Departement of Computer Science, 2001.
[29]
John Langford and John Shawe-Taylor. PAC-Bayes & margins. In NIPS, pages 423-430, 2002.
[30]
François Laviolette and Mario Marchand. PAC-Bayes risk bounds for sample-compressed Gibbs classifiers. In ICML, pages 481-488, 2005.
[31]
François Laviolette and Mario Marchand. PAC-Bayes risk bounds for stochastic averages and majority votes of sample-compressed classifiers. Journal of Machine Learning Research, 8:1461-1487, 2007.
[32]
François Laviolette, Mario Marchand, and Jean-Francis Roy. From PAC-Bayes bounds to quadratic programs for majority votes. In ICML, pages 649-656, 2011.
[33]
Yann Lecun and Corinna Cortes. The MNIST database of handwritten digits. URL http://yann.lecun.com/exdb/mnist/.
[34]
Guy Lever, François Laviolette, and John Shawe-Taylor. Distribution-dependent PAC-Bayes priors. In ALT, pages 119-133, 2010.
[35]
Guy Lever, François Laviolette, and John Shawe-Taylor. Tighter PAC-Bayes bounds through distribution-dependent priors. Theoretical Computer Science, 473:4-28, 2013.
[36]
Ben London, Bert Huang, Benjamin Taskar, and Lise Getoor. PAC-Bayesian collective stability. In AISTATS, pages 585-594, 2014.
[37]
Andreas Maurer. A note on the PAC-Bayesian theorem. CoRR, cs.LG/0411099, 2004.
[38]
David McAllester. Some PAC-Bayesian theorems. Machine Learning, 37(3):355-363, 1999.
[39]
David McAllester. PAC-Bayesian stochastic model selection. Machine Learning, 51(1):5-21, 2003a.
[40]
David McAllester. Simplified PAC-Bayesian margin bounds. In COLT, pages 203-215, 2003b.
[41]
David McAllester. Generalization bounds and consistency for structured labeling. In Gökhan Bakir, Thomas Hofmann, Bernhard Schölkopf, Alexander J. Smola, Ben Taskar, and S. V. N. Vishwanathan, editors, Predicting Structured Data, chapter 11, pages 247- 261. MIT Press, Cambridge, MA, 2007.
[42]
David McAllester. A PAC-Bayesian tutorial with a dropout bound. CoRR, abs/1307.2118, 2013.
[43]
W. Mendenhall. Nonparametric statistics. Introduction to Probability and Statistics, 604, 1983.
[44]
F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research, 12:2825-2830, 2011.
[45]
Liva Ralaivola, Marie Szafranski, and Guillaume Stempfel. Chromatic PAC-Bayes bounds for non-iid data: Applications to ranking and stationary β-mixing processes. Journal of Machine Learning Research, 11:1927-1956, 2010.
[46]
Gerard Salton and Christopher Buckley. Term-weighting approaches in automatic text retrieval. Information Processing & Management, 24(5):513-523, 1988.
[47]
Robert E. Schapire and Yoram Singer. Improved boosting using confidence-rated predictions. Machine Learning, 37(3):297-336, 1999.
[48]
Robert E. Schapire, Yoav Freund, Peter Bartlett, and Wee Sun Lee. Boosting the margin: A new explanation for the effectiveness of voting methods. The Annals of Statistics, 26: 1651-1686, 1998.
[49]
Bernhard Schölkopf, Ralf Herbrich, and Alex J. Smola. A generalized representer theorem. In COLT/EuroCOLT, pages 416-426, 2001.
[50]
Matthias Seeger. PAC-Bayesian generalization bounds for Gaussian processes. Journal of Machine Learning Research, 3:233-269, 2002.
[51]
Matthias Seeger. Bayesian Gaussian Process Models: PAC-Bayesian Generalisation Error Bounds and Sparse Approximations. PhD thesis, University of Edinburgh, 2003.
[52]
Yevgeny Seldin and Naftali Tishby. PAC-Bayesian generalization bound for density estimation with application to co-clustering. In AISTATS, pages 472-479, 2009.
[53]
Yevgeny Seldin and Naftali Tishby. PAC-Bayesian analysis of co-clustering and beyond. Journal of Machine Learning Research, 11:3595-3646, 2010.
[54]
Yevgeny Seldin, Peter Auer, François Laviolette, John Shawe-Taylor, and Ronald Ortner. PAC-Bayesian analysis of contextual bandits. In NIPS, pages 1683-1691, 2011.
[55]
Yevgeny Seldin, François Laviolette, Nicolò Cesa-Bianchi, John Shawe-Taylor, and Peter Auer. PAC-Bayesian inequalities for martingales. IEEE Transactions on Information Theory, 58(12):7086-7093, 2012.
[56]
Chunhua Shen and Hanxi Li. Boosting through optimization of margin distributions. IEEE Transactions on Neural Networks, 21(4):659-666, 2010.
[57]
Ilya O. Tolstikhin and Yevgeny Seldin. PAC-Bayes-empirical-bernstein inequality. In NIPS, pages 109-117, 2013.
[58]
Malik Younsi. Proof of a combinatorial conjecture coming from the PAC-Bayesian machine learning theory. ArXiv E-Prints, 2012. URL http://arxiv.org/abs/1209.0824v1.

Cited By

View all
  • (2024)Stability of Weighted Majority Voting under Estimated WeightsProceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems10.5555/3635637.3662856(96-104)Online publication date: 6-May-2024
  • (2024)Information-Theoretic Characterizations of Generalization Error for the Gibbs AlgorithmIEEE Transactions on Information Theory10.1109/TIT.2023.332961770:1(632-655)Online publication date: 1-Jan-2024
  • (2023)When are ensembles really effective?Proceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3666781(15015-15026)Online publication date: 10-Dec-2023
  • Show More Cited By
  1. Risk bounds for the majority vote: from a PAC-Bayesian analysis to a learning algorithm

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      The Journal of Machine Learning Research  Volume 16, Issue 1
      January 2015
      3855 pages
      ISSN:1532-4435
      EISSN:1533-7928
      Issue’s Table of Contents

      Publisher

      JMLR.org

      Publication History

      Published: 01 January 2015
      Revised: 01 September 2014
      Published in JMLR Volume 16, Issue 1

      Author Tags

      1. PAC-Bayesian theory
      2. ensemble methods
      3. learning theory
      4. majority vote
      5. sample compression

      Qualifiers

      • Article

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)66
      • Downloads (Last 6 weeks)40
      Reflects downloads up to 21 Dec 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Stability of Weighted Majority Voting under Estimated WeightsProceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems10.5555/3635637.3662856(96-104)Online publication date: 6-May-2024
      • (2024)Information-Theoretic Characterizations of Generalization Error for the Gibbs AlgorithmIEEE Transactions on Information Theory10.1109/TIT.2023.332961770:1(632-655)Online publication date: 1-Jan-2024
      • (2023)When are ensembles really effective?Proceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3666781(15015-15026)Online publication date: 10-Dec-2023
      • (2023)Sample boosting algorithm (SamBA) - an interpretable greedy ensemble classifier based on local expertise for fat dataProceedings of the Thirty-Ninth Conference on Uncertainty in Artificial Intelligence10.5555/3625834.3625847(130-140)Online publication date: 31-Jul-2023
      • (2023)Stability of Weighted Majority Voting under Estimated WeightsProceedings of the 2023 International Conference on Autonomous Agents and Multiagent Systems10.5555/3545946.3599045(2691-2693)Online publication date: 30-May-2023
      • (2023)A reduced-rank approach to predicting multiple binary responses through machine learningStatistics and Computing10.1007/s11222-023-10314-333:6Online publication date: 13-Oct-2023
      • (2022)On margins and generalisation for voting classifiersProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3600976(9713-9726)Online publication date: 28-Nov-2022
      • (2022)Machine truth serum: a surprisingly popular approach to improving ensemble methodsMachine Language10.1007/s10994-022-06183-y112:3(789-815)Online publication date: 12-Jul-2022
      • (2021)How tight can PAC-Bayes be in the small data regime?Proceedings of the 35th International Conference on Neural Information Processing Systems10.5555/3540261.3540574(4093-4105)Online publication date: 6-Dec-2021
      • (2021)Subgroup generalization and fairness of graph neural networksProceedings of the 35th International Conference on Neural Information Processing Systems10.5555/3540261.3540342(1048-1061)Online publication date: 6-Dec-2021
      • Show More Cited By

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Login options

      Full Access

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media