skip to main content
article
Free access

Surrogate regret bounds for bipartite ranking via strongly proper losses

Published: 01 January 2014 Publication History

Abstract

The problem of bipartite ranking, where instances are labeled positive or negative and the goal is to learn a scoring function that minimizes the probability of mis-ranking a pair of positive and negative instances (or equivalently, that maximizes the area under the ROC curve), has been widely studied in recent years. A dominant theoretical and algorithmic framework for the problem has been to reduce bipartite ranking to pairwise classification; in particular, it is well known that the bipartite ranking regret can be formulated as a pairwise classification regret, which in turn can be upper bounded using usual regret bounds for classification problems. Recently, Kotlowski et al. (2011) showed regret bounds for bipartite ranking in terms of the regret associated with balanced versions of the standard (non-pairwise) logistic and exponential losses. In this paper, we show that such (nonpairwise) surrogate regret bounds for bipartite ranking can be obtained in terms of a broad class of proper (composite) losses that we term as strongly proper. Our proof technique is much simpler than that of Kotlowski et al. (2011), and relies on properties of proper (composite) losses as elucidated recently by Reid and Williamson (2010, 2011) and others. Our result yields explicit surrogate bounds (with no hidden balancing terms) in terms of a variety of strongly proper losses, including for example logistic, exponential, squared and squared hinge losses as special cases. An important consequence is that standard algorithms minimizing a (non-pairwise) strongly proper loss, such as logistic regression and boosting algorithms (assuming a universal function class and appropriate regularization), are in fact consistent for bipartite ranking; moreover, our results allow us to quantify the bipartite ranking regret in terms of the corresponding surrogate regret. We also obtain tighter surrogate bounds under certain low-noise conditions via a recent result of Clémençon and Robbiano (2011).

References

[1]
Shivani Agarwal, Thore Graepel, Ralf Herbrich, Sariel Har-Peled, and Dan Roth. Generalization bounds for the area under the ROC curve. Journal of Machine Learning Research, 6:393-425, 2005.
[2]
Nir Ailon and Mehryar Mohri. An efficient reduction of ranking to classification. In Proceedings of the 21st Annual Conference on Learning Theory, 2008.
[3]
Maria-Florina Balcan, Nikhil Bansal, Alina Beygelzimer, Don Coppersmith, John Langford, and Gregory B. Sorkin. Robust reductions from ranking to classification. Machine Learning, 72:139-153, 2008.
[4]
Peter Bartlett, Michael Jordan, and Jon McAuliffe. Convexity, classification and risk bounds. Journal of the American Statistical Association, 101(473):138-156, 2006.
[5]
David Buffoni, Clément Calauzenes, Patrick Gallinari, and Nicolas Usunier. Learning scoring functions with order-preserving losses and standardized supervision. In Proceedings of the 28th International Conference on Machine Learning, 2011.
[6]
Andreas Buja, Werner Stuetzle, and Yi Shen. Loss functions for binary class probability estimation: Structure and applications. Technical report, University of Pennsylvania, November 2005.
[7]
Christopher J. C. Burges, Tal Shaked, Erin Renshaw, Ari Lazier, Matt Deeds, Nicole Hamilton, and Gregory N. Hullender. Learning to rank using gradient descent. In Proceedings of the 22nd International Conference on Machine Learning, 2005.
[8]
Clément Calauzènes, Nicolas Usunier, and Patrick Gallinari. On the (non-)existence of convex, calibrated surrogate losses for ranking. In Advances in Neural Information Processing Systems 25, pages 197-205. 2012.
[9]
Stéphan Clémençon and Sylvain Robbiano. Minimax learning rates for bipartite ranking and plug-in rules. In Proceedings of the 28th International Conference on Machine Learning, 2011.
[10]
Stéphan Clémençon and Nicolas Vayatis. Ranking the best instances. Journal of Machine Learning Research, 8:2671-2699, 2007.
[11]
Stéphan Clémençon, Gabor Lugosi, and Nicolas Vayatis. Ranking and empirical minimization of U-statistics. Annals of Statistics, 36:844-874, 2008.
[12]
Corinna Cortes and Mehryar Mohri. AUC optimization vs. error rate minimization. In Sebastian Thrun, Lawrence Saul, and Bernhard Schölkopf, editors, Advances in Neural Information Processing Systems 16. MIT Press, Cambridge, MA, 2004.
[13]
David Cossock and Tong Zhang. Statistical analysis of Bayes optimal subset ranking. IEEE Transactions on Information Theory, 54(11):5140-5154, 2008.
[14]
John Duchi, Lester Mackey, and Michael I. Jordan. On the consistency of ranking algorithms. In Proceedings of the 27th International Conference on Machine Learning, 2010.
[15]
Yoav Freund, Raj Iyer, Robert E. Schapire, and Yoram Singer. An efficient boosting algorithm for combining preferences. Journal of Machine Learning Research, 4:933-969, 2003.
[16]
Tilmann Gneiting and Adrian E. Raftery. Strictly proper scoring rules, prediction, and estimation. Journal of the American Statistical Association, 102(477):359-378, 2007.
[17]
Arlo D. Hendrickson and Robert J. Buehler. Proper scores for probability forecasters. The Annals of Mathematical Statistics, 42:1916-1921, 1971.
[18]
Ralf Herbrich, Thore Graepel, and Klaus Obermayer. Large margin rank boundaries for ordinal regression. Advances in Large Margin Classifiers, pages 115-132, 2000.
[19]
Thorsten Joachims. Optimizing search engines using clickthrough data. In Proceedings of the 8th ACM Conference on Knowledge Discovery and Data Mining, 2002.
[20]
Wojciech Kotlowski, Krzysztof Dembczynski, and Eyke Huellermeier. Bipartite ranking through minimization of univariate loss. In Proceedings of the 28th International Conference on Machine Learning, 2011.
[21]
Yanyan Lan, Jiafeng Guo, Xueqi Cheng, and Tie-Yan Liu. Statistical consistency of ranking methods in a rank-differentiable probability space. In Advances in Neural Information Processing Systems 25, pages 1241-1249. 2012.
[22]
Aditya K. Menon, Harikrishna Narasimhan, Shivani Agarwal, and Sanjay Chawla. On the statistical consistency of algorithms for binary classification under class imbalance. In Proceedings of the 30th International Conference on Machine Learning, 2013.
[23]
Alain Rakotomamonjy. Optimizing area under ROC curves with SVMs. In Proceedings of the ECAI-2004 Workshop on ROC Analysis in AI, 2004.
[24]
Harish G. Ramaswamy and Shivani Agarwal. Classification calibration dimension for general multiclass losses. In Advances in Neural Information Processing Systems 25, pages 2087-2095. 2012.
[25]
Harish G. Ramaswamy, Shivani Agarwal, and Ambuj Tewari. Convex calibrated surrogates for low-rank loss matrices with applications to subset ranking losses. In Advances in Neural Information Processing Systems 26, pages 1475-1483. 2013.
[26]
Pradeep Ravikumar, Ambuj Tewari, and Eunho Yang. On NDCG consistency of listwise ranking methods. In Proceedings of the 14th International Conference on Artificial Intelligence and Statistics, 2010, volume 15 of JMLR Workshop and Conference Proceedings, pages 618-626, 2011.
[27]
Mark D. Reid and Robert C. Williamson. Composite binary losses. Journal of Machine Learning Research, 11:2387-2422, 2010.
[28]
Mark D. Reid and Robert C. Williamson. Information, divergence and risk for binary experiments. Journal of Machine Learning Research, 12:731-817, 2011.
[29]
Cynthia Rudin and Robert E. Schapire. Margin-based ranking and an equivalence between adaboost and rankboost. Journal of Machine Learning Research, 10:2193-2232, 2009.
[30]
Leonard J. Savage. Elicitation of personal probabilities and expectations. Journal of the American Statistical Association, 66(336):783-801, 1971.
[31]
Mark J. Schervish. A general method for comparing probability assessors. The Annals of Statistics, 17:1856-1879, 1989.
[32]
Oliver Stein. Twice differentiable characterizations of convexity notions for functions on full dimensional convex sets. Schedae Informaticae, 21:55-63, 2012.
[33]
Alexandre B. Tsybakov. Optimal aggregation of classifiers in statistical learning. The Annals of Statistics, 32(1):135-166, 2004.
[34]
Kazuki Uematsu and Yoonkyung Lee. On theoretically optimal ranking functions in bipartite ranking. Technical Report 863, Department of Statistics, The Ohio State University, December 2011.
[35]
Fen Xia, Tie-Yan Liu, Jue Wang, Wensheng Zhang, and Hang Li. Listwise approach to learning to rank: Theory and algorithm. In Proceedings of the 25th International Conference on Machine Learning, 2008.
[36]
Tong Zhang. Statistical behavior and consistency of classification methods based on convex risk minimization. The Annals of Statistics, 32(1):56-134, 2004.

Cited By

View all
  • (2024)Robust AUC maximization for classification with pairwise confidence comparisonsFrontiers of Computer Science: Selected Publications from Chinese Universities10.1007/s11704-023-2709-518:4Online publication date: 1-Aug-2024
  • (2023)H-consistency bounds for pairwise misranking loss surrogatesProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3619399(23743-23802)Online publication date: 23-Jul-2023
  • (2023)Counterfactual Prediction Under Outcome Measurement ErrorProceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency10.1145/3593013.3594101(1584-1598)Online publication date: 12-Jun-2023
  • Show More Cited By

Index Terms

  1. Surrogate regret bounds for bipartite ranking via strongly proper losses

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        The Journal of Machine Learning Research  Volume 15, Issue 1
        January 2014
        4085 pages
        ISSN:1532-4435
        EISSN:1533-7928
        Issue’s Table of Contents

        Publisher

        JMLR.org

        Publication History

        Published: 01 January 2014
        Revised: 01 December 2013
        Published in JMLR Volume 15, Issue 1

        Author Tags

        1. area under ROC curve (AUC)
        2. bipartite ranking
        3. proper losses
        4. regret bounds
        5. statistical consistency
        6. strongly proper losses

        Qualifiers

        • Article

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)31
        • Downloads (Last 6 weeks)3
        Reflects downloads up to 28 Oct 2024

        Other Metrics

        Citations

        Cited By

        View all
        • (2024)Robust AUC maximization for classification with pairwise confidence comparisonsFrontiers of Computer Science: Selected Publications from Chinese Universities10.1007/s11704-023-2709-518:4Online publication date: 1-Aug-2024
        • (2023)H-consistency bounds for pairwise misranking loss surrogatesProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3619399(23743-23802)Online publication date: 23-Jul-2023
        • (2023)Counterfactual Prediction Under Outcome Measurement ErrorProceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency10.1145/3593013.3594101(1584-1598)Online publication date: 12-Jun-2023
        • (2021)Surrogate regret bounds for polyhedral lossesProceedings of the 35th International Conference on Neural Information Processing Systems10.5555/3540261.3541911(21569-21580)Online publication date: 6-Dec-2021
        • (2021)Two-sided fairness in rankings via Lorenz dominanceProceedings of the 35th International Conference on Neural Information Processing Systems10.5555/3540261.3540919(8596-8608)Online publication date: 6-Dec-2021
        • (2021)Propensity-scored Probabilistic Label TreesProceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3404835.3463084(2252-2256)Online publication date: 11-Jul-2021
        • (2021)A review on instance ranking problems in statistical learningMachine Language10.1007/s10994-021-06122-3111:2(415-463)Online publication date: 18-Nov-2021
        • (2020)Convex calibrated surrogates for the multi-label F-measureProceedings of the 37th International Conference on Machine Learning10.5555/3524938.3525981(11246-11255)Online publication date: 13-Jul-2020
        • (2019)Multilabel reductionsProceedings of the 33rd International Conference on Neural Information Processing Systems10.5555/3454287.3455238(10600-10611)Online publication date: 8-Dec-2019
        • (2018)A no-regret generalization of hierarchical softmax to extreme multi-label classificationProceedings of the 32nd International Conference on Neural Information Processing Systems10.5555/3327345.3327532(6358-6368)Online publication date: 3-Dec-2018
        • Show More Cited By

        View Options

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Get Access

        Login options

        Full Access

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media