skip to main content
article
Free access

Internal Regret with Partial Monitoring: Calibration-Based Optimal Algorithms

Published: 01 July 2011 Publication History

Abstract

We provide consistent random algorithms for sequential decision under partial monitoring, when the decision maker does not observe the outcomes but receives instead random feedback signals. Those algorithms have no internal regret in the sense that, on the set of stages where the decision maker chose his action according to a given law, the average payoff could not have been improved in average by using any other fixed law. They are based on a generalization of calibration, no longer defined in terms of a Voronoï diagram but instead of a Laguerre diagram (a more general concept). This allows us to bound, for the first time in this general framework, the expected average internal, as well as the usual external, regret at stage n by O(n-1/3), which is known to be optimal.

References

[1]
P. Auer, N. Cesa-Bianchi, Y. Freund, and R. E. Schapire. The nonstochastic multiarmed bandit problem. SIAM J. Comput., 32:48-77 (electronic), 2002/03.
[2]
F. Aurenhammer. A criterion for the affine equivalence of cell complexes in R d and convex polyhedra in R d+1. Discrete Comput. Geom., 2:49-64, 1987.
[3]
K. Azuma. Weighted sums of certain dependent random variables. Tôhoku Math. J. (2), 19:357- 367, 1967.
[4]
L. J. Billera and B. Sturmfels. Fiber polytopes. The Annals of Mathematics, 135(3):pp. 527-549, 1992.
[5]
D. Blackwell. An analog of the minimax theorem for vector payoffs. Pacific J. Math., 6:1-8, 1956a.
[6]
D. Blackwell. Controlled random walks. In Proceedings of the International Congress of Mathematicians, 1954, Amsterdam, vol. III, pages 336-338, 1956b.
[7]
A. Blum and Y. Mansour. From external to internal regret. J. Mach. Learn. Res., 8:1307-1324 (electronic), 2007.
[8]
R. C. Buck. Partition of space. Amer. Math. Monthly, 50:541-544, 1943.
[9]
N. Cesa-Bianchi and G. Lugosi. Prediction, Learning, and Games. Cambridge University Press, Cambridge, 2006.
[10]
N. Cesa-Bianchi, G. Lugosi, and G. Stoltz. Minimizing regret with label efficient prediction. IEEE Trans. Inform. Theory, 51:2152-2162, 2005.
[11]
X. Chen and H. White. Laws of large numbers for Hilbert space-valued mixingales with applications. Econometric Theory, 12:284-304, 1996.
[12]
A. P. Dawid. The well-calibrated Bayesian. J. Amer. Statist. Assoc., 77:605-613, 1982.
[13]
D. P. Foster and R. V. Vohra. Calibrated learning and correlated equilibrium. Games Econom. Behav., 21:40-55, 1997.
[14]
D. P. Foster and R. V. Vohra. Asymptotic calibration. Biometrika, 85:379-390, 1998.
[15]
D. A. Freedman. On tail probabilities for martingales. Ann. Probability, 3:100-118, 1975.
[16]
D. Fudenberg and D. K. Levine. Conditional universal consistency. Games Econom. Behav., 29: 104-130, 1999.
[17]
J. Hannan. Approximation to Bayes risk in repeated play. In Contributions to the Theory of Games, volume 3 of Annals of Mathematics Studies, pages 97-139. Princeton University Press, Princeton, N. J., 1957.
[18]
S. Hart and A. Mas-Colell. A simple adaptive procedure leading to correlated equilibrium. Econometrica, 68:1127-1150, 2000.
[19]
W. Hoeffding. Probability inequalities for sums of bounded random variables. J. Amer. Statist. Assoc., 58:13-30, 1963.
[20]
T. Jaksch, R. Ortner, and P. Auer. Near-optimal regret bounds for reinforcement learning. J. Mach. Learn. Res., 11:1563-1600, 2010.
[21]
E. Lehrer and E. Solan. Learning to play partially-specified equilibrium. manuscript, 2007.
[22]
C. E. Lemke and J. T. Howson, Jr. Equilibrium points of bimatrix games. J. Soc. Indust. Appl. Math., 12:413-423, 1964.
[23]
G. Lugosi, S. Mannor, and G. Stoltz. Strategies for prediction under imperfect monitoring. Math. Oper. Res., 33:513-528, 2008.
[24]
V. Perchet. Calibration and internal no-regret with random signals. Proceedings of the 20th International Conference on Algorithmic Learning Theory, pages 68-82, 2009.
[25]
J. Rambau and G. M. Ziegler. Projections of polytopes and the generalized Baues conjecture. Discrete Comput. Geom., 16:215-237, 1996.
[26]
R. T. Rockafellar. Convex Analysis. Princeton Mathematical Series, No. 28. Princeton University Press, Princeton, N.J., 1970.
[27]
A. Rustichini. Minimizing regret: the general case. Games Econom. Behav., 29:224-243, 1999.
[28]
E. Seneta. Nonnegative Matrices and Markov Chains. Springer Series in Statistics. Springer-Verlag, New York, second edition, 1981.
[29]
S. Sorin. Supergames. In Game theory and applications (Columbus, OH, 1987), Econom. Theory Econometrics Math. Econom., pages 46-63. Academic Press, San Diego, CA, 1990.
[30]
S. Sorin. Lectures on Dynamics in Games. Unpublished Lecture Notes, 2008.
[31]
V. Yurinskii. Exponential inequalities for sums of random vectors. Journal of Multivariate Analysis, 6:473-499, 1976.
[32]
G. Ziegler. Lectures on Polytopes, volume 152 of Graduate Texts in Mathematics. Springer-Verlag, New York, 1995.

Cited By

View all
  • (2024)An Approximate Dynamic Programming Approach to Repeated Games with Vector LossesOperations Research10.1287/opre.2022.233472:1(373-388)Online publication date: 1-Jan-2024
  • (2015)On a Unified Framework for Approachability with Full or Partial MonitoringMathematics of Operations Research10.1287/moor.2014.068640:3(596-610)Online publication date: 1-Mar-2015
  • (2014)Partial Monitoring—Classification, Regret Bounds, and AlgorithmsMathematics of Operations Research10.1287/moor.2014.066339:4(967-997)Online publication date: 1-Nov-2014

Recommendations

Comments

Information & Contributors

Information

Published In

The Journal of Machine Learning Research  Volume 12, Issue
2/1/2011
3426 pages
ISSN:1532-4435
EISSN:1533-7928
Issue’s Table of Contents

Publisher

JMLR.org

Publication History

Published: 01 July 2011
Published in JMLR Volume 12

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)17
  • Downloads (Last 6 weeks)6
Reflects downloads up to 12 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)An Approximate Dynamic Programming Approach to Repeated Games with Vector LossesOperations Research10.1287/opre.2022.233472:1(373-388)Online publication date: 1-Jan-2024
  • (2015)On a Unified Framework for Approachability with Full or Partial MonitoringMathematics of Operations Research10.1287/moor.2014.068640:3(596-610)Online publication date: 1-Mar-2015
  • (2014)Partial Monitoring—Classification, Regret Bounds, and AlgorithmsMathematics of Operations Research10.1287/moor.2014.066339:4(967-997)Online publication date: 1-Nov-2014

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media