-
@thu-ml, Tsinghua University
- Beijing, China
-
14:01
(UTC +09:00) - https://bingrui-li.github.io/
- @bingruili_
- @bingruil.bsky.social
Stars
State-of-the-Art Deep Learning scripts organized by models - easy to train and deploy with reproducible accuracy and performance on enterprise-grade infrastructure.
QLoRA: Efficient Finetuning of Quantized LLMs
SimCLRv2 - Big Self-Supervised Models are Strong Semi-Supervised Learners
Qwen3-omni is a natively end-to-end, omni-modal LLM developed by the Qwen team at Alibaba Cloud, capable of understanding text, audio, images, and video, as well as generating speech in real time.
Structured state space sequence models
Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads
Fast and Easy Infinite Neural Networks in Python
Seed1.5-VL, a vision-language foundation model designed to advance general-purpose multimodal understanding and reasoning, achieving state-of-the-art performance on 38 out of 60 public benchmarks.
OLMoE: Open Mixture-of-Experts Language Models
Single File, Single GPU, From Scratch, Efficient, Full Parameter Tuning library for "RL for LLMs"
Understanding Training Dynamics of Deep ReLU Networks
Assignments for Berkeley CS 285: Deep Reinforcement Learning (Fall 2020)
A package of distributionally robust optimization (DRO) methods. Implemented via cvxpy and PyTorch
A library for unit scaling in PyTorch
Code for the paper: Why Transformers Need Adam: A Hessian Perspective
A python package providing a benchmark with various specified distribution shift patterns.
[ICLR 2025] How Does Critical Batch Size Scale in Pre-training?
Code for "A Regularized Newton Method for Nonconvex Optimization with Global and Local Complexity Guarantees"
paper accepted into naacl