Archive for IMS

Monte Carlo with infinite variances [a surveyal guide]

Posted in Books, Statistics, University life with tags , , , , , , , , , , , , on January 14, 2026 by xi'an

Watch out!, Reiichiro Kawai has just published a survey on infinite variance Monte Carlo methods in Probability Surveys, which is most welcomed as this issue is customarily ignored by both the literature and the practitioners. Radford Neal‘s warning about the dangers of using the harmonic mean estimator of the evidence (as in Newton and Raftery 1996) is an illustration that remains pertinent to this day. In that sense, the survey relates to specific, earlier if recent attempts, such as Chatterjee and Diaconis (2015) or Vehtari et al (2015), with its Pareto correction.

In its recapitulation of the basics of Monte Carlo (closely corresponding to my own introduction of the topic in undergraduate classes), the paper indicates that the consistency of the variance estimator is enough to replace the true variance with its estimator and maintain the CLT. I have often if vaguely wondered at the impact (if any) a variance estimator with (itself) an infinite variance would have. A note to this effect appears at the end of Section 1.2. While being involved from the start, importance sampling has to wait till section 3.2 to be formally introduced. It is also interesting to note that the original result on the optimal importance variance being zero when the integrand is always positive (or negative) is extended here, by noting that a zero variance estimator can always be found by breaking the integrand f into its positive and negative parts, and using now two single samples for the respective integrals. I thus find Example 6 rather unhelpful, even though the entire literature contains such examples with no added value of formal optimal importance samplers. A comment at the end of Example 6 is opens the door to a short discussion of reparametrisation in simulation, a topic rarely discussed in the literature. The use of Rao-Blackwellization as a variance reduction technique that is open to switching from infinite to finite variance, is emphasised as well in Section 2.1.

In relation with a recent musing of mine during a seminar in Warwick, the novel part in the survey on the limited usefulness of control variate is of interest, even though one could predict that linear regression is not doing very well in infinite variance environments. Examples 8 and 9 are most helpful in this respect. It is similarly revealing if unsurprising that basic antithetic variables do not help. The warning about detecting or failing to detect infinite variance situations is well-received.

While theoretically correct, the final section about truncation limit is more exploratory, in that truncation can produce biased answers, whose magnitude is not assessed within the experiment.

a (sunny, crisp) day at ICSDS 2025

Posted in pictures, Running, Statistics, Travel, University life with tags , , , , , , , , , , , , , , , , , , , , , , , , , , , , on December 19, 2025 by xi'an

While my first day at ICSDS 2025 was somewhat hectic, having realised late the night before that I was giving a talk!—I had forgotten I had submitted a title at registration time and never received any communication from the organisers, including (or excluding) a request for an abstract. I thus hastily updated my November talk in Sevilla for my December talk in Sevilla! but paid less attention than needed to the sessions I attended—, Wednesday was more peaceful—esp. after a 16K run along the Guadalquivir—and I engaged into two great Bayesian learning sessions, one that seemed designed for me!, involving my (40y long friend) Ed George on his latest result on proper prior minimaxity and shrinkage, with our late friend Bill Strawderman as a co-author since they worked on the problem prior to Bill’s demise, Charles Margossian on variational inference preserving some symmetries in the target and hence keeping the same statistics, with elliptically symmetric families, and Fletcher Christensen on DIC for some mixed models, with references to our “DIC’s eights” paper (but still picking one version of DIC in the end!)

The second session was on prediction learning!—with me as the chair, as I realized one minute before! AI !—with (my friend) Veronika Rockova using AI predictions as a prior predictive and connecting them with Bayesian nonparametrics, Kenyon Ng (who visited me last Spring) on a similar approach using pretrained transformers like TabPFN and martingale posterior inference, Lorenzo Cappello in a generalisation of martingale prediction and Andrea Ghiglietti on the mathematics of an involved urn system.


The afternoon session was a plenary talk by Daniela Witten in the magnificent building of the Real Fabrica de Tabacos, but the room was unfortunately too small for the audience and I could not enter. Hopefully her talk will have a significant intersection with the CRiSM colloquium she delivers in Warwick late January. I thus walked around the old town till the following poster session, held in the Real Fabrica courtyard, under the sun. As I got involved into a deep discussion of the relevance of mirror meetings (which I defend!) versus the dangers on principal (parent) conferences (which can be mitigated by the mirror conference participants registering, to some extent, for the principle one)—more to come on the ‘Og and in the ISBA Bulletin!—, I did not peruse the available posters, sorry…

And, by the way, the conference organisers also revealed the location of ICSDS 2026 which is Croatia, my first bet! In the city of Split we visited in 2023.

prequential posteriors

Posted in Books, Statistics, University life with tags , , , , , , , , , , , , , on December 15, 2025 by xi'an

Data assimilation is a fundamental task in updating forecasting models upon observing new data, with applications ranging from weather prediction to online reinforcement learning. Deep generative forecasting models (DGFMs) have shown excellent performance in these areas, but assimilating data into such models is challenging due to their intractable likelihood functions. This limitation restricts the use of standard Bayesian data assimilation methodologies for DGFMs. To overcome this, we introduce prequential posteriors, based upon a predictive-sequential (prequential) loss function; an approach naturally suited for temporally dependent data which is the focus of forecasting tasks. Since the true data-generating process often lies outside the assumed model class, we adopt an alternative notion of consistency and prove that, under mild conditions, both the prequential loss minimizer and the prequential posterior concentrate around parameters with optimal predictive performance. For scalable inference, we employ easily parallelizable wastefree sequential Monte Carlo (SMC) samplers with preconditioned gradient-based kernels, enabling efficient exploration of high-dimensional parameter spaces such as those in DGFMs. We validate our method on both a synthetic multi-dimensional time series and a real-world meteorological dataset; highlighting its practical utility for data assimilation for complex dynamical systems.

¡Vuelo a Sevilla otra vez!

Posted in pictures, Statistics, Travel, University life, Wines with tags , , , , , , , , , , , , , on December 14, 2025 by xi'an

scalable Monte Carlo for Bayesian learning [book review]

Posted in Books, Statistics, University life with tags , , , , , , , , , , , , , , , , , , , , on September 26, 2025 by xi'an

This book by Paul Fearnhead, Christopher Nemeth, Chris Oates, and Chris Sherlock is part of the IMS Monograph series. And published by Cambridge University Press. It covers most recent developments in MCMC methods, namely stochastic gradient MCMC (Chap. 3), non-reversible MCMC (Chap. 4), continuous-time MCMC (Chap. 5), and assessing and improving MCMC (Chap. 6). I find the book remarkable in its attention to rigour and clarity, without falling into overly technical derivations. It is perfectly suited for a graduate course to students with a solid mathematical background. In short, had I considered a new edition of our Monte Carlo Statistical Methods book to incorporate these advances, I could not done such a good job!

The first chapter provides a quick refresher of the background, from Monte Carlo principles, to Markov chains, SDEs, and the kernel “trick” (which requires a dozen pages of exposition). Nonetheless, it contains side remarks of true interest, including some suggestions I had not previously seen, as for instance an unusual introduction of the HMC algorithm as an underdamped Langevin diffusion. Chapter 2 prolongates this recap by covering reversible MCMC algorithms and the attached optimal scalings. This is done in a particularly friendly presentation that I intend to use in my own course. The HMC section is probably the best coverage I have seen on the topic, including most naturally the leapfrog steps.

Chapter 3 gets into stochastic gradient MCMC as an approximate MCMC, with nice arguments and formal convergence bounds. Again quite efficiently, if focussing almost solely on Gaussian settings (but including a neural network example). Similarly, Chapter 4 provides intuitive (if informal) arguments on the worth of non-reversible algorithms that are well-suited to a textbook of this level. This chapter introduces a PDMP sampler like the discrete bouncy particle sampler.

Chapter 5 is a (nicely) monstrous coverage of continuous time MCMC samplers that reaches very recent advances on PDMPs. The focus is on expressing them as limits, in order to derive mixing rates without extreme mathematical steps. (The chapter even includes a mention to the coordinate sampler that my PhD student Wu Changye derived in 2018!) Again a chapter I plan to use when teaching MCM methods, if possibly skipping some of the 66 pages.

Chapter 6 completes the monograph with a presentation of convergence assessment tools and diagnostics, exploiting the kernel trick, as well as convergence bounds that reflect very recent research in that domain. The conclusive section on optimal weights and optimal thinning will presumably be new to most readers. (Making me wonder if a link can be found with our importance Markov chain construct.)

[Disclaimer about potential self-plagiarism as usual: this post or an edited version will eventually appear in my Books Review section in CHANCE.]