PyMC’s cover photo
PyMC

PyMC

Research Services

Probabilistic Programming in Python

About us

PyMC (formerly PyMC3) is a Python package for Bayesian statistical modeling focusing on advanced Markov chain Monte Carlo (MCMC) and variational inference (VI) algorithms. Its flexibility and extensibility make it applicable to a large suite of problems.

Website
https://docs.pymc.io
Industry
Research Services
Company size
11-50 employees
Type
Nonprofit
Founded
2010

Employees at PyMC

Updates

  • PyMC reposted this

    How do you evaluate a rookie hitter with 50 plate appearances? It's the same problem as forecasting sales for a new store, or estimating a drug effect in a small clinic. Unbalanced grouped data is everywhere. Here's a baseball analytics example, one of several case studies we'll build in our London Bayesian modeling workshop next week. Same 2023 MLB season, two models. The left panel gives every player their own independent estimate. But look at the left side of the plot, where the low-AB players live. Their point estimates are all over the place. Some rookies look like future stars, others look like they should be sent back to the minors, and their intervals are wide enough that either call could easily flip. You are not looking at skill. You are looking at noise amplified by small sample sizes. The right panel is a partial pooling model, where the green dots are the new estimates. The gray x's show where the independent model had them. Now look at the left side: the green dots cluster much more tightly than the gray x's. The extreme estimates get pulled back toward the center. Nate Eaton: 53 AB, .105 in the independent model, .228 after partial pooling. Korey Lee: 65 AB, .101 independent, .227 partial. Both had a handful of bad at-bats and the independent model treated them like they had revealed their true talent. The partial-pooling model knows better. It learns from the whole league, shrinks the noisy extremes back toward the context, and gives you a calibrated estimate instead of a coin flip. Everyday players on the right side barely move. 600 plate appearances do not need borrowing. The model is smart enough to leave them alone. This is one of the most useful methods in applied statistics and it has nothing to do with priors or philosophy. It is a tool for making sound inferences from unbalanced, grouped data. Once you have partial pooling in your toolkit you stop fitting independent models to nested data. 2.5 days in London, June 8–10. Hierarchical models are Session 4. We build the full workflow live: partial pooling, varying intercepts and slopes, non-centered parameterisation, and prediction for new groups. EDIT: ONLY A FEW SEATS REMAINING! 👉 https://dub.sh/ANFk8VH Code LONDON10 gets you 10% off. Monday is day one. Don't sit this one out!

    • No alternative text description for this image
  • PyMC reposted this

    While searching for content on Bayesian inference, I accidentally found out there's a gym exercise called "Bayesian cable curls" lol How does this even make sense? Do you start with a prior belief that you're strong, then update it after the first set? If anyone knows the origin of this, let me know! In the meanwhile, maybe PyMC should just add a pymc-fitness package?

    • No alternative text description for this image
  • PyMC reposted this

    A trial reports p = 0.03. Is it credible? Under the Reverse-Bayes framework, a finding generally needs p ≈ 0.005 to be achieve intrinsic credibility at the 5% level. By that standard, p = 0.03 isn’t close. But “lower the threshold” isn’t the interesting part. Reverse-Bayes asks a different question: Given the estimate and its uncertainty, what prior belief would make this result convincing? That flips the usual Bayesian debate.   Instead of arguing about whose prior is correct, you solve for the prior that makes the finding credible — and then ask whether that prior is plausible. The key tool is the Critical Prior Interval (CPI), developed by Robert Matthews (Aston University), who also reviewed and contributed edits to this post. His foundational work on the framework is the subject of the full write-up. For significant findings, it yields a scepticism limit: the minimum effect size you'd already need to believe for the data to be convincing. For nonsignificant findings, it yields an advocacy limit: the largest effect size the data can support even under the most generous interpretation. A few examples: → GREAT trial (1992): OR = 0.47, p ≈ 0.04. Sounds impressive. But the scepticism limit implies you’d need to already believe in a 90% mortality reduction to find it credible. Later meta-analysis estimated OR ≈ 0.83. → ORBITA trial (2018): stents vs sham for stable angina, p = 0.20. Reported as “no benefit.” But the advocacy limit was +115 seconds — the trial was uninformative, not definitive. → RECOVERY trial (2020): corticosteroids for COVID-19. Both significant and credible. To dismiss the result, a sceptic would need to believe mortality reductions couldn’t exceed 16%. Same p-value framework.   Radically different evidential weight. The Critical Prior Interval makes that difference visible — turning a single p-value into a statement about the evidence. Full write-up with derivations, four trial examples, and connections to Bayes factors and meta-analysis: https://lnkd.in/gDcmSvBA

  • PyMC reposted this

    I am really excited to share that I will be a co-mentor with the amazing Jesse Grabowski for Google Summer of Code (GSoC) 2026! If you are a student that is interested in Bayesian State Space models, we encourage you to check out our “Scalable Online Bayesian State Space Model” project that involves leveraging mathematical optimization techniques (marginalization, structured linear algebra, Cholesky-based covariance representations) with computational improvements such as GPU acceleration and distributed execution within the PyMC ecosystem. Learn more here: https://lnkd.in/gMRP5Mij This initiative offers a hands‑on chance to dive deep into cutting‑edge Bayesian methods while building skills that are highly sought after in industry. By contributing to the PyMC open‑source community, students will gain valuable experience in collaborative software development, version control, and community engagement.

  • PyMC reposted this

    What if you could download Bayesian modeling skills into your brain, directly from the creator of PyMC himself? Christopher Fonnesbeck took what he learned in over 20+ years of Bayesian modeling and put it into an AI skill you can just install. Pass rates for difficult models jump from 60% to 93%. The hard tasks see the biggest gains: stochastic volatility goes from 0% to 67%, horseshoe regression from 33% to 100%. Runtime drops from 19 minutes to under 3. But that was just one martial arts program. We wanted to build the whole training construct. Decision-Hub is an open registry for agentic data science skills. Every skill ships with automated evals and security grading. Your agent asks the hub what it needs to do, and like Morpheus, it downloads the right skill straight into your agent's brain. Oh, and you can also add your own skills if you are the maintainer of a PyData repo. Unfortunately, no one has uploaded a Kung Fu skill... yet. Benchmark: https://lnkd.in/ePQZUXgB PyMC skill: https://lnkd.in/exmaH2hX Decision Hub: https://hub.decision.ai

    • No alternative text description for this image
  • PyMC reposted this

    As promised last week, I just ported the second post I wrote on polynomial regression in the Bambi examples gallery over to my blog. This post really uses polynomial regression as thin veil to actually do some linear algebra and opine about overfitting and extrapolation. https://lnkd.in/eD8aCzKD #BayesianStatistics #MachineLearning #DataScience #Python

  • PyMC reposted this

    With the Milan 2026 Olympics underway, I revisited a model I built to predict the 2020 Olympics medal counts. Using PyMC, I built a Bayesian hierarchical time-series model with three steps: 🥇: Team strength over time modeled with a momentum random walk 🥈: A Bernouilli hurdle to determine which teams will win at least one medal 🥉: A Dirichlet-multinomial distribution to allocate the medals among the winners I was hoping to complete this before the games started... but this proved more complex than I had anticipated! Despite a limited dataset I think I have been able to capture some of the underlying dynamics and show why the range of credible outcomes is quite wide. You may find the dataset and notebooks linked in the article. https://lnkd.in/gd6Baah3

  • PyMC reposted this

    Bayesian statistics has had a long, close connection with theories of rational choice and stochastic optimization. However, in modern probabilistic programming languages, most of the tooling is geared to building and fitting models. If you want to make decisions with those models afterward, you are mostly on your own. I wrote about some neat tooling in the PyMC ecosystem (Pytensor) that makes it easy to take a statistical model and convert it into a tool for optimal decision making. You'll also learn a bit about symbolic computing along the way. Check it out: https://lnkd.in/g-VMiDGZ

Similar pages

Browse jobs