article

Free access

Internal Regret with Partial Monitoring: Calibration-Based Optimal Algorithms

Author:

Vianney PerchetAuthors Info & Claims

The Journal of Machine Learning Research, Volume 12

Pages 1893 - 1921

Published: 01 July 2011 Publication History

Abstract

We provide consistent random algorithms for sequential decision under partial monitoring, when the decision maker does not observe the outcomes but receives instead random feedback signals. Those algorithms have no internal regret in the sense that, on the set of stages where the decision maker chose his action according to a given law, the average payoff could not have been improved in average by using any other fixed law. They are based on a generalization of calibration, no longer defined in terms of a Voronoï diagram but instead of a Laguerre diagram (a more general concept). This allows us to bound, for the first time in this general framework, the expected average internal, as well as the usual external, regret at stage n by O(n^-1/3), which is known to be optimal.

References

[1]

P. Auer, N. Cesa-Bianchi, Y. Freund, and R. E. Schapire. The nonstochastic multiarmed bandit problem. SIAM J. Comput., 32:48-77 (electronic), 2002/03.

[2]

F. Aurenhammer. A criterion for the affine equivalence of cell complexes in R^d and convex polyhedra in R^d+1. Discrete Comput. Geom., 2:49-64, 1987.

[3]

K. Azuma. Weighted sums of certain dependent random variables. Tôhoku Math. J. (2), 19:357- 367, 1967.

[4]

L. J. Billera and B. Sturmfels. Fiber polytopes. The Annals of Mathematics, 135(3):pp. 527-549, 1992.

[5]

D. Blackwell. An analog of the minimax theorem for vector payoffs. Pacific J. Math., 6:1-8, 1956a.

[6]

D. Blackwell. Controlled random walks. In Proceedings of the International Congress of Mathematicians, 1954, Amsterdam, vol. III, pages 336-338, 1956b.

[7]

A. Blum and Y. Mansour. From external to internal regret. J. Mach. Learn. Res., 8:1307-1324 (electronic), 2007.

[8]

R. C. Buck. Partition of space. Amer. Math. Monthly, 50:541-544, 1943.

[9]

N. Cesa-Bianchi and G. Lugosi. Prediction, Learning, and Games. Cambridge University Press, Cambridge, 2006.

[10]

N. Cesa-Bianchi, G. Lugosi, and G. Stoltz. Minimizing regret with label efficient prediction. IEEE Trans. Inform. Theory, 51:2152-2162, 2005.

[11]

X. Chen and H. White. Laws of large numbers for Hilbert space-valued mixingales with applications. Econometric Theory, 12:284-304, 1996.

[12]

A. P. Dawid. The well-calibrated Bayesian. J. Amer. Statist. Assoc., 77:605-613, 1982.

[13]

D. P. Foster and R. V. Vohra. Calibrated learning and correlated equilibrium. Games Econom. Behav., 21:40-55, 1997.

[14]

D. P. Foster and R. V. Vohra. Asymptotic calibration. Biometrika, 85:379-390, 1998.

[15]

D. A. Freedman. On tail probabilities for martingales. Ann. Probability, 3:100-118, 1975.

[16]

D. Fudenberg and D. K. Levine. Conditional universal consistency. Games Econom. Behav., 29: 104-130, 1999.

[17]

J. Hannan. Approximation to Bayes risk in repeated play. In Contributions to the Theory of Games, volume 3 of Annals of Mathematics Studies, pages 97-139. Princeton University Press, Princeton, N. J., 1957.

[18]

S. Hart and A. Mas-Colell. A simple adaptive procedure leading to correlated equilibrium. Econometrica, 68:1127-1150, 2000.

[19]

W. Hoeffding. Probability inequalities for sums of bounded random variables. J. Amer. Statist. Assoc., 58:13-30, 1963.

[20]

T. Jaksch, R. Ortner, and P. Auer. Near-optimal regret bounds for reinforcement learning. J. Mach. Learn. Res., 11:1563-1600, 2010.

[21]

E. Lehrer and E. Solan. Learning to play partially-specified equilibrium. manuscript, 2007.

[22]

C. E. Lemke and J. T. Howson, Jr. Equilibrium points of bimatrix games. J. Soc. Indust. Appl. Math., 12:413-423, 1964.

[23]

G. Lugosi, S. Mannor, and G. Stoltz. Strategies for prediction under imperfect monitoring. Math. Oper. Res., 33:513-528, 2008.

[24]

V. Perchet. Calibration and internal no-regret with random signals. Proceedings of the 20th International Conference on Algorithmic Learning Theory, pages 68-82, 2009.

[25]

J. Rambau and G. M. Ziegler. Projections of polytopes and the generalized Baues conjecture. Discrete Comput. Geom., 16:215-237, 1996.

[26]

R. T. Rockafellar. Convex Analysis. Princeton Mathematical Series, No. 28. Princeton University Press, Princeton, N.J., 1970.

[27]

A. Rustichini. Minimizing regret: the general case. Games Econom. Behav., 29:224-243, 1999.

[28]

E. Seneta. Nonnegative Matrices and Markov Chains. Springer Series in Statistics. Springer-Verlag, New York, second edition, 1981.

[29]

S. Sorin. Supergames. In Game theory and applications (Columbus, OH, 1987), Econom. Theory Econometrics Math. Econom., pages 46-63. Academic Press, San Diego, CA, 1990.

[30]

S. Sorin. Lectures on Dynamics in Games. Unpublished Lecture Notes, 2008.

[31]

V. Yurinskii. Exponential inequalities for sums of random vectors. Journal of Multivariate Analysis, 6:473-499, 1976.

[32]

G. Ziegler. Lectures on Polytopes, volume 152 of Graduate Texts in Mathematics. Springer-Verlag, New York, 1995.

Cited By

Kamble VLoiseau PWalrand J(2024)An Approximate Dynamic Programming Approach to Repeated Games with Vector LossesOperations Research10.1287/opre.2022.233472:1(373-388)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1287/opre.2022.2334
Perchet VQuincampoix M(2015)On a Unified Framework for Approachability with Full or Partial MonitoringMathematics of Operations Research10.1287/moor.2014.068640:3(596-610)Online publication date: 1-Mar-2015
https://dl.acm.org/doi/10.1287/moor.2014.0686
Bartók GFoster DPál DRakhlin ASzepesvári C(2014)Partial Monitoring—Classification, Regret Bounds, and AlgorithmsMathematics of Operations Research10.1287/moor.2014.066339:4(967-997)Online publication date: 1-Nov-2014
https://dl.acm.org/doi/10.1287/moor.2014.0663

Index Terms

Internal Regret with Partial Monitoring: Calibration-Based Optimal Algorithms

Recommendations

Partial Monitoring—Classification, Regret Bounds, and Algorithms

In a partial monitoring game, the learner repeatedly chooses an action, the environment responds with an outcome, and then the learner suffers a loss and receives a feedback signal, both of which are fixed functions of the action and the outcome. The goal ...
Regret Bounds and Minimax Policies under Partial Monitoring

This work deals with four classical prediction settings, namely full information, bandit, label efficient and bandit label efficient as well as four different notions of regret: pseudo-regret, expected regret, high probability regret and tracking the ...
From External to Internal Regret

External regret compares the performance of an online algorithm, selecting among N actions, to the performance of the best of those actions in hindsight. Internal regret compares the loss of an online algorithm to the loss of a modified online algorithm,...

Comments

Information & Contributors

Information

Published In

The Journal of Machine Learning Research Volume 12, Issue

2/1/2011

3426 pages

ISSN:1532-4435

EISSN:1533-7928

Issue’s Table of Contents

Publisher

JMLR.org

Publication History

Published: 01 July 2011

Published in JMLR Volume 12

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
76
Total Downloads

Downloads (Last 12 months)17
Downloads (Last 6 weeks)6

Reflects downloads up to 12 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Kamble VLoiseau PWalrand J(2024)An Approximate Dynamic Programming Approach to Repeated Games with Vector LossesOperations Research10.1287/opre.2022.233472:1(373-388)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1287/opre.2022.2334
Perchet VQuincampoix M(2015)On a Unified Framework for Approachability with Full or Partial MonitoringMathematics of Operations Research10.1287/moor.2014.068640:3(596-610)Online publication date: 1-Mar-2015
https://dl.acm.org/doi/10.1287/moor.2014.0686
Bartók GFoster DPál DRakhlin ASzepesvári C(2014)Partial Monitoring—Classification, Regret Bounds, and AlgorithmsMathematics of Operations Research10.1287/moor.2014.066339:4(967-997)Online publication date: 1-Nov-2014
https://dl.acm.org/doi/10.1287/moor.2014.0663

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Media

Figures

Other

Tables

View Issue’s Table of Contents