Stars
Train the smallest LM you can that fits in 16MB. Best model wins!
🌐 Make websites accessible for AI agents. Automate tasks online with ease.
Your own personal AI assistant. Any OS. Any Platform. The lobster way. 🦞
Claude Code is an agentic coding tool that lives in your terminal, understands your codebase, and helps you code faster by executing routine tasks, explaining complex code, and handling git workflo…
Agent-R1: Training Powerful LLM Agents with End-to-End Reinforcement Learning
Benchmarking long-form factuality in large language models. Original code for our paper "Long-form factuality in large language models".
High-Performance Text Deduplication Toolkit
Tooling for exact and MinHash deduplication of large-scale text datasets
[CVPR 2025] A Comprehensive Benchmark for Document Parsing and Evaluation
🚀🚀 「大模型」2小时完全从0训练64M的小参数GPT!🌏 Train a 64M-parameter GPT from scratch in just 2h!
[NeurIPS'25] Official codebase for "SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution"
Train transformer language models with reinforcement learning.
Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.
AI memory OS for LLM and Agent systems(moltbot,clawdbot,openclaw), enabling persistent Skill memory for cross-task skill reuse and evolution.
Fast Multimodal Semantic Deduplication & Filtering
A PyTorch native platform for training generative AI models
[ACL'24] Superfiltering: Weak-to-Strong Data Filtering for Fast Instruction-Tuning
🔥 The Web Data API for AI - Power AI agents with clean web data
🛏 An HTML to Markdown converter written in JavaScript