Archive for DIC

a (sunny, crisp) day at ICSDS 2025

Posted in pictures, Running, Statistics, Travel, University life with tags , , , , , , , , , , , , , , , , , , , , , , , , , , , , on December 19, 2025 by xi'an

While my first day at ICSDS 2025 was somewhat hectic, having realised late the night before that I was giving a talk!—I had forgotten I had submitted a title at registration time and never received any communication from the organisers, including (or excluding) a request for an abstract. I thus hastily updated my November talk in Sevilla for my December talk in Sevilla! but paid less attention than needed to the sessions I attended—, Wednesday was more peaceful—esp. after a 16K run along the Guadalquivir—and I engaged into two great Bayesian learning sessions, one that seemed designed for me!, involving my (40y long friend) Ed George on his latest result on proper prior minimaxity and shrinkage, with our late friend Bill Strawderman as a co-author since they worked on the problem prior to Bill’s demise, Charles Margossian on variational inference preserving some symmetries in the target and hence keeping the same statistics, with elliptically symmetric families, and Fletcher Christensen on DIC for some mixed models, with references to our “DIC’s eights” paper (but still picking one version of DIC in the end!)

The second session was on prediction learning!—with me as the chair, as I realized one minute before! AI !—with (my friend) Veronika Rockova using AI predictions as a prior predictive and connecting them with Bayesian nonparametrics, Kenyon Ng (who visited me last Spring) on a similar approach using pretrained transformers like TabPFN and martingale posterior inference, Lorenzo Cappello in a generalisation of martingale prediction and Andrea Ghiglietti on the mathematics of an involved urn system.


The afternoon session was a plenary talk by Daniela Witten in the magnificent building of the Real Fabrica de Tabacos, but the room was unfortunately too small for the audience and I could not enter. Hopefully her talk will have a significant intersection with the CRiSM colloquium she delivers in Warwick late January. I thus walked around the old town till the following poster session, held in the Real Fabrica courtyard, under the sun. As I got involved into a deep discussion of the relevance of mirror meetings (which I defend!) versus the dangers on principal (parent) conferences (which can be mitigated by the mirror conference participants registering, to some extent, for the principle one)—more to come on the ‘Og and in the ISBA Bulletin!—, I did not peruse the available posters, sorry…

And, by the way, the conference organisers also revealed the location of ICSDS 2026 which is Croatia, my first bet! In the city of Split we visited in 2023.

model uncertainty and missing data: an objective BAyesian perspective

Posted in Books, Statistics, Travel, University life with tags , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , on September 16, 2025 by xi'an

My Spanish and objective Bayesian friends Gonzalo García-Donato, María Eugenia Castellanos, Stefano Cabras, Alicia Quirós, and Anabel Forte wrote an fairly exciting paper in BA that is open to discussion (for a few more days), to be discussed on 05 November (4:00 PM UTC | 11:00 AM EST | 5:00 PM CET).

The interplay between missing data and model uncertainty—two classic statistical problems—leads to primary questions that we formally address from an objective Bayesian perspective. For the general regression problem, we discuss the probabilistic justification of Rubin’s rules applied to the usual components of Bayesian variable selection, arguing that prior predictive marginals should be central to the pursued methodology. In the regression settings, we explore the conditions of prior distributions that make the missing data mechanism ignorable, provided that it is missing at random or completely at random. Moreover, when comparing multiple linear models, we provide a complete methodology for dealing with special cases, such as variable selection or uncertainty regarding model errors. In numerous simulation experiments, we demonstrate that our method outperforms or equals others, in consistently producing results close to those obtained using the full dataset. In general, the difference increases with the percentage of missing data and the correlation between the variables used for imputation.

The so-called Rubin’s identity is simply the representation of the posterior probability of a model γ given the observed data x⁰, p(γ|x⁰), as the integrated posterior probability of a model given both observed and latent data,  p(γ|x⁰, x¹), against the marginal of latent x¹ given observed x⁰. Since this marginal involves the probabilities p(γ|x⁰), this representation is not directly useful for a numerical implementation.

In this paper, missingness relates to some entries of either the covariates or the response variate. Which is less common but more realistic, especially if some covariates do not contribute to the response. (The missingness mechanism does not matter if the data is missing at random (à la Rubin). The computational solution (p9) is rather standard, simulating the missing variables given the observed variables. In my opinion, the elephant in the room is the super-delicate selection of a prior distribution on the missing covariates, as methinks this impacts in a considerable manner the actual value of the Bayes factor, hence the selection of the surviving model. (As a side remark, we are credited in Celeux et al. (2006) to have “extended DIC for missing data models or when missing data were present”, but our point was instead to point out the arbitrariness of the very definition of DIC in such contexts.)

“The standard Bayesian method for addressing the absence of prior information uses improper distributions. In estimation problems (the model is fixed), the impropriety of priors does not imply any additional difficulty as long as the posterior is proper” (p9)

The authors point out the well-known difficulty with improper priors but still resort to improper priors on the parameters shared by all models—which I dispute as being adequate, despite the arguments put forward on p15, right Haar measure or not—, while sticking to proper priors on the model-dependent parameters. Which unsurprisingly become Zellner’s g-priors. Or rather g’-priors, although the discussion seems to resolve into the (model-free) factor g’ being equal to 1 as for the g-priors. Again a strong term in the derivation of the Bayes factor.

Bayesian Inference: Theory, Methods, Computations [book review]

Posted in Statistics with tags , , , , , , , , , , , , , , , , , , , , , on November 12, 2024 by xi'an

Bayesian Inference: Theory, Methods, Computations by Silvelyn Zwanzig and Rauf Ahmad, both from Uppsala University, is a recent book published by Chapman & Hall / CRC Press. About 300p long (plus appendices), it covers the core aspects of Bayesian inference, namely the decision theoretic motivations, its asymptotic validation, the specifics of estimation and testing, and the computational approximations (MC, MCMC, ABC, VB), with entries on prior specification and Normal linear models. And some R codes. It is (and feels like) constructed from Master and PhD courses (at Uppsala University), with a rigorous mathematical presentation and many examples, some related to biostatistics. Drawings from the first author’s daughter are included in most chapters, to this reviewer’s bemusement. From a further personal viewpoint, the book also reads rather close to my (Bayesian) choice of a Bayesian textbook, which proves rather accurate since several chapters are inspired by my own Bayesian Choice. as acknowledged therein. As well as by the more recent Statistical Decision Theory: Estimation, Testing, and Selection by Liese & Miescke (2008) and Introduction to the Theory of Statistical Inference by Liero & Zwanzig (2011). Witness, for instance, an example of prior construction for capture-recapture experiments on lizards as analysed by my PhD student Dupuis (1995) [with a curious switch to the authors on p.263] and  also included in The Bayesian Choice (with drawing 2.9 incorrect in that the lizards there have marks on their backs, instead of the code adopted by the ecologists, namely cutting one specific phalange for each capture).

Other minor quandaries: The usual issue of quoting the wrong edition for creating a method, as when citing Jeffreys (1946) for inventing non-informative priors [p.53], failing to point out the parameterisation invariance of intrinsic losses [p.95]considering that Bayes factors are only relevant for obtaining evidence against the null hypothesis [p.216], recommending BIC and DIC (!) [pp.232-6], advocating sampling importance resampling (SIR) for approximate sampling from the target (omitting infinite variance issues) [p.253], defining annealing as using “several trial distributions” [p.261], a mistake in ABC-MCMC [p.274] since the case when the simulated data is too far from the actual data should lead to a repetition rather than a pure rejection.

All in all, a reasonable textbook with some recent input, but still lacking in originality, if I may subjectively say so.

[Disclaimer about potential self-plagiarism: this post or an edited version of it could possibly appear in my Books Review section in CHANCE.]

venISBA¹

Posted in Books, pictures, Running, Statistics, Travel, University life with tags , , , , , , , , , , , , , , , , , , , , , on July 3, 2024 by xi'an

As in previous days, I had an early morn run over the Liberty bridge, with the sunrise as a reward, plus a short swim in the local Sant’ Alviso pool, as I managed to register as a membre this time!, then leading to an hurried breakfast (sad!) not to miss the opening ceremony. My first choice of session was for probabilistic numerics: the first talk was [not Sylvia’s but Lewis Fry’s] Richardson extrapolation by Chris Oates (& coauthors) that causes acceleration in the approximation of functional values (a Taylor expansion at core). Probabilistically numerised via Gaussian processes. Solving linear systems by Jon Cockayne (et al, 2021), with acceleration of iterative methods like conjugate gradient again via Gaussian processing, requiring some knowledge about the condition number of the matrix involved in the linear equations. Interestingly realising the UQ is poor. And Masha Naslidnyk on maximum mean discrepancy likelihood free inference. Since the MMD cannot be computed in closed form, it is approximated via an RKHS kernel and showing that the resulting upper bound converges an optimal rate, achieving thus better precision with less evaluations. MMD has been used in several instances in the ABC literature as well as for generalised Bayesian inference (e.g. by Pierre Alquier or Rito Dutta and their coauthors).  Missing other interesting parallel sessions like Data Integration organised by David Rossell.

Also, I chaired the Foundation lecture of my long time friend Kerrie Mengersen on the future of Bayesian analysis, going through many of the projects she drove over the past years (decades!) of modelling via Bayesian statistics in a variety of actual settings, solving challenges of correcting data, reducing dimension, producing complex interfaces with data providers and users. While we had to stop for the following session, it felt like the discussion could have gone on forever! (And a great line of Hakuna my data glimpsed from a passing slide!)

The second multiple session i attended was on Optimal transport and Bayesian learning, with Long Nguyen (who kindly invited to and even more kindly hosted me through Ho Chi Min City last summer) proposing a Wasserstein dendrogram to cluster in mixture models. Which seems to depend much on the parameterisation and the distance, if I understood his presentation. DIC made an appearance but this meant the Dendrogram Information Criterion! Then Hugo Lavenant talked about merging of opinions as how quickly two different priors come to agree when the data size increases. In a non-parametric setting, using a completely random measure framework, with two different measures and optimal transport distances. And then Ricardo Baptista considered intractable posteriors to build sort of a normalising flow (as indicated later). Assuming availability of joint samples from the prior x predictive density and using a triangular transform that favours the marginal x posterior decomposition.

Congrats to our PhD student Emma Kopp who won the best applied presentation at BAYSM on Sunday!!!

 

statistical modeling with R [book review]

Posted in Books, Statistics with tags , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , on June 10, 2023 by xi'an

Statistical Modeling with R (A dual frequentist and Bayesian approach for life scientists) is a recent book written by Pablo Inchausti, from Uruguay. In a highly personal and congenial style (witness the preface), with references to (fiction) books that enticed me to buy them. The book was sent to me by the JASA book editor for review and I went through the whole of it during my flight back from Jeddah. [Disclaimer about potential self-plagiarism: this post or a likely edited version of it will eventually appear in JASA. If not CHANCE, for once.]

The very first sentence (after the preface) quotes my late friend Steve Fienberg, which is definitely starting on the right foot. The exposition of the motivations for writing the book is quite convincing, with more emphasis than usual put on the notion and limitations of modeling. The discourse is overall inspirational and contains many relevant remarks and links that make it worth reading it as a whole. While heavily connected with a few R packages like fitdist, fitistrplus, brms (a  front for Stan), glm, glmer, the book is wisely bypassing the perilous reef of recalling R bases. Similarly for the foundations of probability and statistics. While lacking in formal definitions, in my opinion, it reads well enough to somehow compensate for this very lack. I also appreciate the coherent and throughout continuation of the parallel description of Bayesian and non-Bayesian analyses, an attempt that often too often quickly disappear in other books. (As an aside, note that hardly anyone claims to be a frequentist, except maybe Deborah Mayo.) A new model is almost invariably backed by a new dataset, if a few being somewhat inappropriate as in the mammal sleep patterns of Chapter 5. Or in Fig. 6.1.

Given that the main motivation for the book (when compared with references like BDA) is heavily towards the practical implementation of statistical modelling via R packages, it is inevitable that a large fraction of Statistical Modeling with R is spent on the analysis of R outputs, even though it sometimes feels a wee bit too heavy for yours truly.  The R screen-copies are however produced in moderate quantity and size, even though the variations in typography/fonts (at least on my copy?!) may prove confusing. Obviously the high (explosive?) distinction between regression models may eventually prove challenging for the novice reader. The specific issue of prior input (or “defining priors”) is briefly addressed in a non-chapter (p.323), although mentions are made throughout preceding chapters. I note the nice appearance of hierarchical models and experimental designs towards the end, but would have appreciated some discussions on missing topics such as time series, causality, connections with machine learning, non-parametrics, model misspecification. As an aside, I appreciated being reminded about the apocryphal nature of Ockham’s much cited quotePluralitas non est ponenda sine necessitate“.

Typo Jeffries found in Fig. 2.1, along with a rather sketchy representation of the history of both frequentist and Bayesian statistics. And Jon Wakefield’s book (with related purpose of presenting both versions of parametric inference) was mistakenly entered as Wakenfield’s in the bibliography file. Some repetitions occur. I do not like the use of the equivalence symbol ≈ for proportionality. And I found two occurrences of the unavoidable “the the” typo (p.174 and p.422). I also had trouble with some sentences like “long-run, hypothetical distribution of parameter estimates known as the sampling distribution” (p.27), “maximum likelihood estimates [being] sufficient” (p.28), “Jeffreys’ (1939) conjugate priors” [which were introduced by Raiffa and Schlaifer] (p.35), “A posteriori tests in frequentist models” (p.130), “exponential families [having] limited practical implications for non-statisticians” (p.190), “choice of priors being correct” (p.339), or calling MCMC sample terms “estimates” (p.42), and issues with some repetitions, missing indices for acronyms, packages, datasets, but did not bemoan the lack homework sections (beyond suggesting new datasets for analysis).

A problematic MCMC entry is found when calibrating the choice of the Metropolis-Hastings proposal towards avoiding negative values “that will generate an error when calculating the log-likelihood” (p.43) since it suggests proposed values should not exceed the support of the posterior (and indicates a poor coding of the log-likelihood!). I also find the motivation for the full conditional decomposition behind the Gibbs sampler (p.47) unnecessarily confusing. (And automatically having a Metropolis-Hastings step within Gibbs as on Fig. 3.9 brings another magnitude of confusion.) The Bayes factor section is very terse. The derivation of the Kullback-Leibler representation (7.3) as an expected log likelihood ratio seems to be missing a reference measure. Of course, seeing a detailed coverage of DIC (Section 7.4) did not suit me either, even though the issue with mixtures was alluded to (with no detail whatsoever). The Nelder presentation of the generalised linear models felt somewhat antiquated, since the addition of the scale factor a(φ) sounds over-parameterized.

But those are minor quibble in relation to a book that should attract curious minds of various background knowledge and expertise in statistics, as well as work nicely to support an enthusiastic teacher of statistical modelling. I thus recommend this book most enthusiastically.