Stars
Trace origins, shared sources, and contamination risk
Aligning Model-Generated Rubrics with Human Standards
Fully autonomous & self-evolving research from idea to paper. Chat an Idea. Get a Paper. 🦞
Claw-Eval is an evaluation harness for evaluating LLM as agents. All tasks verified by humans.
BullshitBench measures whether AI models challenge nonsensical prompts instead of confidently answering them, created by Peter Gostev.
A holistic benchmark for LLM abstention
A lightweight, AI-native training framework for large language models. Designed for fast iteration, reproducible experiments, and modular configuration across SFT, RLVR, and evaluation workflows.
Edit Banana: A framework for converting statistical formats into editable.
Claude Code skill that removes signs of AI-generated writing from text
Elevate your AI research writing, no more tedious polishing ✨
Comprehensive open-source library of AI research and engineering skills for any AI model. Package the skills and your claude code/codex/gemini agent will be an AI research agent with full horsepowe…
Official Implementation of "ToolSafe: Enhancing Tool Invocation Safety of LLM-based Agents via Proactive Step-level Guardrail and Feedback"
OpenJudge: A Unified Framework for Holistic Evaluation and Quality Rewards
Turn paper/text/topic into editable research figures, technical route diagrams, and presentation slides.
MinerU-HTML: An SLM-powered HTML main content extractor that outputs clean HTML bodies. Perfect for Deep Research Agents, RAG applications, and training data generation.
Dingo: A Comprehensive AI Data, Model and Application Quality Evaluation Tool
Performant framework for training, analyzing and visualizing Sparse Autoencoders (SAEs) and their frontier variants.
北京大学软件与微电子学院硕士生课程知识点、作业等汇总【Summary of Knowledge Points and Assignments of Peking University Integrated Circuit Major Courses】
[ACL 2025 Best Theme Paper] This is the official implementation for the paper: "Meta-rater: A Multi-dimensional Data Selection Method for Pre-training Language Models"
Tools for OpenDataArena: Fair, Open, and Transparent Arena for Data
Training Sparse Autoencoders on Language Models
[ICLR 25 Oral] RM-Bench: Benchmarking Reward Models of Language Models with Subtlety and Style
All Cursor AI's official download links for both the latest and older versions, making it easy for you to update, downgrade, and choose any version. 🚀
An open-source AI agent that brings the power of Gemini directly into your terminal.
📝 python package to calculate readability statistics of a text object - paragraphs, sentences, articles.
a free, non-AI python grammar checker 📝✅
Scalable data pre processing and curation toolkit for LLMs