Master's student at Korea University's NLP&AI Lab, advised by Dr. Heuiseok Lim. I work at the intersection of LLM evaluation and mechanistic interpretability to make models measurable, transparent, and trustworthy.
- LLM Evaluation
- Ability decomposition and benchmark auditing (mixture-of-abilities, contamination checks, robustness sweeps)
- Reproducible pipelines, unified metrics, longitudinal tracking, leaderboard design
- Evaluation that correlates with user-perceived capability and downstream utility
- Mechanistic Interpretability
- Circuits/features via sparse autoencoders, probing, attribution, and targeted patching/ablations
- Causal tracing and intervention studies to identify mechanisms behind reasoning and coding
- Model-edit-aware analyses to understand when changes help or harm capabilities
- AI Safety and Reliability
- Auditing models for harmful behaviors and failure modes (e.g., deception, bias, adversarial vulnerability)
- Continuous knowledge editing with retrieval for time-evolving domains (e.g., law)
If you're working on evaluation, interpretability, or AI safety, I'm happy to connect.