I am a first-year M.S. student in Computer Science at UC San Diego, advised by Prof. Hao Zhang, and I hold a B.S. from ShanghaiTech University advised by Prof. Kewei Tu. My research lies at the intersection of Natural Language Processing and Machine Learning Systems. I am particularly passionate about designing efficient architectures for Long-Context Modeling and exploring the frontiers of World Models to bridge system efficiency with model capability.
Currently, I focus on scalable training and inference for generative models. I am the lead author of FlashMHF (under review), where I proposed a novel Multi-Head FFN architecture backed by IO-aware Triton/CUDA kernels. Additionally, as a core contributor to FastVideo in Hao AI Lab, I am working on new model aggregation and optimized kernel implementations to accelerate video generation systems.
Looking ahead, I aim to extend my work on FlashMHF to broader LLM backbones and delve deeper into World Models within the FastVideo framework. I am also actively exploring retrieval-based methods and Continual Learning to solve the challenges of long-context understanding in foundation models.
Technical Focus: NLP Triton/CUDA LLM Architecture Video Generation