Stars
A Library for Advanced Deep Time Series Models for General Time Series Analysis.
Open-source framework for test-time compute scaling of LLMs. Includes a visual debugger for inspecting reasoning traces and an endpoint that provides OpenAI-compatible API.
A framework for easy running and evaluating your TSAD algorithm.
Public-facing codebase accompanying: "Reasoning Cache: Continual Improvement Over Long Horizons via Short-Horizon RL"
SWE-Lego: Pushing the Limits of Supervised Fine-tuning for Software Issue Resolving
A Framework for LLM-based Multi-Agent Reinforced Training and Inference
Proof of thought : LLM-based reasoning using Z3 theorem proving with multiple backend support (SMT2 and JSON DSL)
Specula: A framework for finding deep bugs in system code using TLA+
Full Autonomy Stack for Mecanum Wheel Platform
Grafana Tempo is a high volume, minimal dependency distributed tracing backend.
GPU Cluster Monitoring (GCM): Large-Scale AI Research Cluster Monitoring
A Slurm cluster using docker-compose
Enable three-way bidirectional sync among Overleaf, GitHub, and your local machine for AI-assisted writing. Also share paper writing Claude skills across multiple paper projects.
Graph Self-Supervised Learning Toolkit
Time cost estimator of LLM's distributed training [HiPC 2025]
Grammars of Formal Uncertainty [NeurIPS 2025] - Adapted for LEAN. Unofficial Implementation. Potentially unstable.
Conditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models
Lean 4 programming language and theorem prover
A benchmark for LLMs on complicated tasks in the terminal
[NeurIPS 2025] Grammars of Formal Uncertainty: When to Trust LLMs in Automated Reasoning Tasks