Skip to content
View wanghia's full-sized avatar

Block or report wanghia

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Train the smallest LM you can that fits in 16MB. Best model wins!

Python 4,706 3,099 Updated Apr 9, 2026

🌐 Make websites accessible for AI agents. Automate tasks online with ease.

Python 86,799 10,019 Updated Apr 9, 2026

Your own personal AI assistant. Any OS. Any Platform. The lobster way. 🦞

TypeScript 353,268 71,295 Updated Apr 9, 2026

Claude Code is an agentic coding tool that lives in your terminal, understands your codebase, and helps you code faster by executing routine tasks, explaining complex code, and handling git workflo…

Shell 111,700 18,636 Updated Apr 9, 2026

The LLM Evaluation Framework

Python 14,653 1,341 Updated Apr 9, 2026

Agent-R1: Training Powerful LLM Agents with End-to-End Reinforcement Learning

Python 1,346 90 Updated Apr 6, 2026
Python 342 18 Updated May 24, 2025

Abseil Common Libraries (C++)

C++ 17,179 2,999 Updated Apr 9, 2026

Benchmarking long-form factuality in large language models. Original code for our paper "Long-form factuality in large language models".

Python 680 83 Updated Apr 8, 2026
Python 4 1 Updated Nov 20, 2025

A lightweight suffix-sorting library

C 404 92 Updated Mar 25, 2020

High-Performance Text Deduplication Toolkit

C++ 62 3 Updated Aug 25, 2025

Tooling for exact and MinHash deduplication of large-scale text datasets

Rust 78 8 Updated Mar 24, 2026

[CVPR 2025] A Comprehensive Benchmark for Document Parsing and Evaluation

Python 1,644 168 Updated Apr 9, 2026

PyTorch-native post-training at scale

Python 666 97 Updated Apr 9, 2026

Fast State-of-the-Art Static Embeddings

Python 2,021 120 Updated Apr 3, 2026

🚀🚀 「大模型」2小时完全从0训练64M的小参数GPT!🌏 Train a 64M-parameter GPT from scratch in just 2h!

Python 46,274 5,695 Updated Apr 9, 2026

[NeurIPS'25] Official codebase for "SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution"

Python 688 60 Updated Mar 16, 2025

Train transformer language models with reinforcement learning.

Python 17,988 2,632 Updated Apr 9, 2026

Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.

Python 2,981 252 Updated Apr 8, 2026

AI memory OS for LLM and Agent systems(moltbot,clawdbot,openclaw), enabling persistent Skill memory for cross-task skill reuse and evolution.

Python 8,242 716 Updated Apr 9, 2026

诺亚盘古大模型研发背后的真正的心酸与黑暗的故事。

11,419 1,328 Updated Jul 9, 2025

Fast Multimodal Semantic Deduplication & Filtering

Python 909 56 Updated Jan 20, 2026

A PyTorch native platform for training generative AI models

Python 5,219 778 Updated Apr 9, 2026

[ACL'24] Superfiltering: Weak-to-Strong Data Filtering for Fast Instruction-Tuning

Python 187 16 Updated Jun 25, 2025

🔥 The Web Data API for AI - Power AI agents with clean web data

TypeScript 106,489 6,902 Updated Apr 9, 2026

🛏 An HTML to Markdown converter written in JavaScript

HTML 11,029 975 Updated Apr 3, 2026
Next