Pinned Loading
-
sail-sg/oat
sail-sg/oat Public🌾 OAT: A research-friendly framework for LLM online alignment, including reinforcement learning, preference learning, etc.
-
-
sail-sg/understand-r1-zero
sail-sg/understand-r1-zero PublicUnderstanding R1-Zero-Like Training: A Critical Perspective
-
mosecorg/mosec
mosecorg/mosec PublicA high-performance ML model serving framework, offers dynamic batching and CPU/GPU pipelines to fully exploit your compute machine
-
spiral-rl/spiral
spiral-rl/spiral PublicSPIRAL: Self-Play on Zero-Sum Games Incentivizes Reasoning via Multi-Agent Multi-Turn Reinforcement Learning
-
sail-sg/Precision-RL
sail-sg/Precision-RL PublicDefeating the Training-Inference Mismatch via FP16
Something went wrong, please refresh the page to try again.
If the problem persists, check the GitHub status page or contact support.
If the problem persists, check the GitHub status page or contact support.