-
ZKJ.com
- Beijing, China
- https://blog.parsing.nl
- https://orcid.org/0000-0002-3838-640X
Highlights
- Pro
Stars
Scripts and doc for https://www.dolthub.com/repositories/chenditc/investment_data
An efficient prompt optimization method that uses zeroth-order method to optimize the prompts for black-box LLMs.
This is the homepage of a new book entitled "Mathematical Foundations of Reinforcement Learning."
Autonomously train research-agent LLMs on custom data using reinforcement learning and self-verification.
My learning notes for ML SYS.
FlashMLA: Efficient Multi-head Latent Attention Kernels
A very simple GRPO implement for reproducing r1-like LLM thinking.
Fully open data curation for reasoning models
An Easy-to-use, Scalable and High-performance Agentic RL Framework based on Ray (PPO & DAPO & REINFORCE++ & VLM & TIS & vLLM & Ray & Async RL)
A financial agent for investment research
Pretraining and inference code for a large-scale depth-recurrent language model
A lightweight reproduction of DeepSeek-R1-Zero with indepth analysis of self-reflection behavior.
Fully open reproduction of DeepSeek-R1
Minimal reproduction of DeepSeek R1-Zero
Data processing for and with foundation models! ๐ ๐ ๐ฝ โก๏ธ โก๏ธ๐ธ ๐น ๐ท
LLMs build upon Evol Insturct: WizardLM, WizardCoder, WizardMath
Code implementation of synthetic continued pretraining
SGLang is a high-performance serving framework for large language models and multimodal models.
Free, simple, fast interactive diagrams for any GitHub repository