Jirka Borovec Borda

Hi there 👋 I'm Jirka Borovec

Machine Learning & Data Science researcher with a Ph.D. in Medical Imaging and years of R&D and consulting experience. I solve real-world problems by crafting state-of-the-art algorithms and turning them into robust, community-driven Python libraries 🐍. Passionate about open source, reproducible research, and scalable ML infrastructure.

🛠️ Developer Track

Create & maintain several open-source Python packages used by thousands of developers
Contributed code, CI/CD pipelines, issue reports & reviews across the ML ecosystem
Strong focus on testing, automation, and developer experience — from pre-commit hooks to GitHub Actions
achieved top-tier rankings across notebooks, competitions, and datasets, applying practical ML skills to real-world challenge problems and sharing my findings

🧑‍🏫 Manager Track

Built and led a team to deliver a scalable video-analysis platform from prototype to production
Director of Open Source at Lightning AI — led the OSS team for 3+ years, driving feature roadmaps, release cycles, and cross-team coordination across PyTorch Lightning, TorchMetrics, and the broader Lightning ecosystem. Mentored contributors, scaled community engagement, and ensured quality across 10+ active repositories
LinkedIn Learning certified in Leadership Foundations, Leadership: Practical Skills, and Leading Your Team Through Change

🎓 Academic Track

Ph.D. in Medical Imaging — Czech Technical University in Prague
15+ journal articles & 20+ conference papers (ISBI, ICIP, ACCV, MICCAI workshops)
Reviewer for IEEE TMI, TCIA and major international conferences
Co-organized the ANHIR challenge on histological image registration

🚀 Open Source & Projects

Long-term open-source contributor and maintainer. My work spans ML frameworks, developer tooling, and computer vision — always aiming to make research more reproducible and engineering more enjoyable.

Active projects I maintain:

👁️ supervision The go-to Python toolkit for plugging any detection or segmentation model into real-world CV pipelines. Unlike framework-specific tools, it works with YOLO, Transformers, or any custom model out of the box — providing a unified API for tracking, filtering, annotating, and chaining operations that would otherwise require glue code.
🎯 RF-DETR A new take on real-time object detection that brings transformer accuracy to YOLO-level speeds. Stands out by matching or beating state-of-the-art on COCO while being straightforward to fine-tune on custom datasets — no complex anchor tuning or NMS hacks needed.
♻️ pyDeprecate Born from the pain of managing API changes in large libraries like PyTorch Lightning. A zero-dependency tool that lets library authors deprecate, rename, and redirect functions or classes with automatic call forwarding — so users get clear migration warnings instead of silent breakage.
🗄️ cachier Unlike functools.lru_cache, cachier persists results across sessions and even across machines. Ideal for caching expensive computations like API calls or data processing — supports MongoDB and file-based backends with built-in staleness handling, so cached results stay fresh without manual invalidation.
📈 pyRepoStats Fills the gap between git log and full analytics platforms by generating quick contribution stats that include issues and PR activity. Built for maintainers who want a lightweight health check on their projects without setting up dashboards.

Emeritus maintainer — projects I co-created and still partially supervise:

⚡ PyTorch Lightning The most widely adopted framework for scaling PyTorch — used by thousands of teams from academic labs to Fortune 500 companies. Eliminates training loop boilerplate and lets the same code run on a laptop GPU or a 10,000-GPU cluster without changes, bridging the gap between research prototypes and production systems.
📏 TorchMetrics The standard metrics library for the PyTorch ecosystem, solving the surprisingly hard problem of computing correct metrics in distributed training. Ships 100+ metrics for classification, regression, NLP, and retrieval — all with automatic accumulation and device synchronization that just works across multi-GPU setups.

Past core maintainer projects:

🛠️ Lightning Utilities The shared foundation that keeps all Lightning projects consistent and maintainable. Extracts common patterns — packaging helpers, testing utilities, CLI tooling, and CI/CD workflows — into one place so that fixes and improvements propagate across the entire ecosystem automatically.
🔩 Lightning Bolts A community-driven collection of reference implementations — VAEs, GANs, SimCLR, and more — built on PyTorch Lightning. Designed to give researchers battle-tested baselines they can reproduce in one command and extend for their own experiments.
⚡ Lightning Flash Made transfer learning as simple as a few lines of code across 15+ tasks — image classification, object detection, text classification, tabular data, and more. Built on PyTorch Lightning, it let practitioners go from idea to baseline in minutes instead of hours.
🌩️ Lightning Thunder A source-to-source compiler for PyTorch that delivers up to 40% faster training and inference through kernel fusion, operator optimization, and GPU memory management. Unlike opaque compilers, Thunder provides a transparent, Pythonic IR that developers can inspect and customize — with composable plugins for distributed training, quantization, and CUDA Graphs.
📚 Lightning Tutorials The official tutorial collection powering the PyTorch Lightning documentation. Uses a script-based format instead of heavy notebooks — automatically converting to executable notebooks with full reproducibility tracking, CI-tested across CPU, GPU, and TPU to ensure every example actually runs.
🔄 Ecosystem CI The safety net for the entire Lightning ecosystem — automatically runs downstream test suites against every nightly build and release candidate. Catches breaking changes before they ship, ensuring that hundreds of dependent projects don't break on upgrade day.
🧠 LitGPT An opinionated, hackable codebase for working with 20+ LLMs — GPT, Llama, Mistral, and more. Unlike heavyweight frameworks, LitGPT uses plain PyTorch with no abstraction layers, making it easy to modify any part of the training pipeline while still getting optimized performance out of the box.

Past research projects:

🖼️ pyImSegm A complete image segmentation pipeline developed during Ph.D. research, combining superpixels, graph cuts, and region growing for medical imaging. Used in multiple published studies on histological tissue analysis and designed to be reproducible from raw data to final results.
📊 BIRL The benchmarking engine behind the ANHIR grand challenge at ISBI, which brought together teams worldwide to compare image registration methods on histological data. Automates the full pipeline from running registration to evaluating alignment accuracy using expert-annotated landmarks.

Notable contributions to other projects: ultralytics/YOLOv5, DIPY and more...

📊 GitHub Stats

🏅 Kaggle Stats

💖 Support & Consulting

If you find my open-source work useful, consider sponsoring me 💚 I'm also available for consulting & contract work in ML, MLOps, and Python engineering — see SUPPORT.md for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Jirka Borovec Borda

Sponsoring

Achievements

Achievements

Highlights

Organizations

Block or report Borda