-
University of Maryland
- College Park
- lichang-chen.github.io
Stars
Automated tool for running Python programs in a streamlined manner
PyTorch implementation of Deep Reinforcement Learning: Policy Gradient methods (TRPO, PPO, A2C) and Generative Adversarial Imitation Learning (GAIL). Fast Fisher vector product TRPO.
This repo mainly contains CS234 (Spring 2024) assignment's coding problems
Stanford CS234: Reinforcement Learning assignments and practices
This is a Phi Family of SLMs book for getting started with Phi Models. Phi a family of open sourced AI models developed by Microsoft. Phi models are the most capable and cost-effective small langua…
Using PPO, I am attempting to solve the cartpole environment
Student version of Assignment 1 for Stanford CS336 - Language Modeling From Scratch
The LLM Training Puzzles by Sasha Rush
A version of verl to support diverse tool use
LeetCode Training and Evaluation Dataset
LM engine is a library for pretraining/finetuning LLMs
What would you do with 1000 H100s...
Puzzles for learning Triton
Recipes to train the self-rewarding reasoning LLMs.
Solve puzzles. Improve your pytorch.
https://huyenchip.com/ml-interviews-book/
Codebase for Iterative DPO Using Rule-based Rewards
This repo is meant to serve as a guide for Machine Learning/AI technical interviews.
[ACL'25 Oral] What Happened in LLMs Layers when Trained for Fast vs. Slow Thinking: A Gradient Perspective
A bibliography and survey of the papers surrounding o1
Recipes to train reward model for RLHF.
OpenSpiel is a collection of environments and algorithms for research in general reinforcement learning and search/planning in games.
ODIN: Disentangled Reward Mitigates Hacking in RLHF (ICML 2024)
🎨 ML Visuals contains figures and templates which you can reuse and customize to improve your scientific writing.