Highlights
Stars
This is a repo with links to everything you'd ever want to learn about data engineering
Data Engineering Zoomcamp is a free 9-week course on building production-ready data pipelines. The next cohort starts in January 2026. Join the course here 👇🏼
Free MLOps course from DataTalks.Club
Causal Inference for the Brave and True. A light-hearted yet rigorous approach to learning about impact estimation and causality.
Python interactive dashboards for learning data science
An introduction to network analysis and applied graph theory using Python and NetworkX
Reproducible Machine Learning for Credit Card Fraud Detection - Practical Handbook
Open source book about making Python packages.
A tutorial for setting an SQL code generator with the OpenAI API
Scikit-learn compatible decision trees beyond those offered in scikit-learn
⭕️ Data Engineering for Data Scientists
Repository fo Data Engineering Course
Integrated tool for model development and validation
Using SimPy to create OB patient flow models
Decomposing Global AUC into Cluster-Level Contributions for Localized Model Diagnostics
OB simulation metamodeling research project
IPython notebooks, R markdown docs, data, other stuff from tutorial blog posts on hselab.org
NHANES-GCP: Leveraging the Google Cloud Platform and BigQuery ML for reproducible machine learning with data from the National Health and Nutrition Examination Survey
Komodo Health's submission to the 2022 ACIC Causality Data Challenge
An exploration of first baseman receiving for the 2023 SMT Data Challenge
A personal reference for common actions in different languages and frameworks