A modular ML pipeline for automated stellar spectral classification from LAMOST DR5 spectra, built around physically motivated features, dimensionality reduction, and model interpretability.
43 019 spectra Γ 183 spectroscopic features β XGBoost β 87% balanced accuracy (ROC-AUC 0.964)
What makes it science, not just ML:
- Features derived from real spectral physics β line equivalent widths, Ca II H&K, Balmer series, Mg b triplet, color indices
- SHAP analysis reveals metallicity indicators dominate classification over temperature β challenges classical MK assumptions
- Cross-matched with Gaia DR3 (GSP-Phot: T_eff, log g, [M/H], distance) for astrophysical validation
- Dimensionality reduction suite: PCA Β· UMAP Β· t-SNE Β· Autoencoder
- Fully reproducible β W&B tracked, 85+ runs logged
Scientific Computing
Machine Learning & Interpretability
Dev & Docs
| Domain | Focus |
|---|---|
| π Stellar spectroscopy | Spectral classification, line physics, MK system |
| π€ Interpretable ML | SHAP, feature attribution, physically-grounded models |
| π Dimensionality reduction | PCA, UMAP, t-SNE, autoencoders on astronomical data |
| π Anomaly detection | HDBSCAN clustering, outlier identification in survey data |
| π Survey astrophysics | LAMOST DR5, Gaia DR3 cross-matching, large spectral datasets |
I'm an undergraduate physics student working toward graduate research at the intersection of astrophysics, scientific machine learning, and model interpretability.
My approach: treat ML not as a black box, but as a scientific instrument β one that should explain why it classifies, not just how well.
Central finding of AstroSpectro: SHAP values reveal that metallicity indicators (Ca II H&K, Mg b) are stronger classifiers than classical temperature proxies (Balmer lines) in LAMOST DR5 β a result that challenges MK classification assumptions and opens questions about feature-space physics.
Targeting M.Sc. applications ~November 2026 in astrophysics / astroinformatics & start in September 2027.
Building tools that make AI scientifically accountable β one spectrum at a time.