๐ Iโm a Senior at Pomona College, majoring in Mathematics & Statistics with a minor in Data Science.
๐ Iโm interested in Bayesian methods, probabilistic modeling, and applied statistics, with projects spanning healthcare, biomechanics, and baseball analytics.
-
Senior Thesis (2025โ26): Bayesian Joint Modeling of Pain and Depression
- Developing a hierarchical Bayesian framework to jointly model chronic pain and depression outcomes in transgender healthcare.
- Comparing empirical Bayes vs. full Bayes approaches, with EB+ correction.
- Running simulation studies on posterior contraction for small subgroups and building decision-focused MCID calibration curves.
-
Data Science Capstone (2025โ26): Cloud-Based Baseball Analytics Infrastructure
- Designing a cloud-based architecture to ingest and persist tracking API data using PostgreSQL and Python ETL pipelines.
- Building automated statistical reporting workflows (Python, SQL, CI) to deliver reproducible summaries of pitch-level data.
- Prototyping a Streamlit app for interactive data visualization and model exploration.
-
Quantitative Analyst Associate, Philadelphia Phillies (Summer 2025)
- Directed independent research on the Automated Ball-Strike Challenge System (AAA level).
- Built SQL + Python pipelines (BigQuery, nested CTEs, window functions) to compute per-pitch challenge run values and estimate opportunity cost (
$L$ ).
-
Biomechanical Drivers of Pitch Velocity (2024)
- Analyzed Driveline OpenBiomechanics and TrackMan datasets to identify biomechanical predictors of velocity.
- Built nonlinear models (XGBoost) with tuned CV, achieving ~2โ3 mph prediction error.
- Applied feature importance analysis to study energy transfer through the pitching kinetic chain.
-
Pomona-Pitzer Baseball Analytics (2023โpresent)
- Co-Director of Analytics & Data Engineering.
- Developed Stuff+/Pitching+/Location+ models, opponent scouting pipelines, and automated workflows to support NCAA DIII decision-making.
Programming: Python, R, SQL, Bash
Statistical & ML Methods: Bayesian inference, probabilistic modeling, calibration, cross-validation, feature engineering
Libraries: XGBoost, scikit-learn, tidymodels, tidyverse, pandas, NumPy, Matplotlib, SHAP
Databases & Workflow: Google BigQuery, PostgreSQL, Git/GitHub, LaTeX