- New York, New York
Highlights
- Pro
Lists (1)
Sort Name ascending (A-Z)
Stars
Official CLI and Python SDK for Prime Intellect - access GPU compute, remote sandboxes, RL environments, and distributed training infrastructure for AI development at scale.
Self-learning data agent that grounds its answers in 6 layers of context. Inspired by OpenAI's in-house implementation.
Qwen3-Coder is the code version of Qwen3, the large language model series developed by Qwen team.
Generic building-block toolbox for training neural networks with adaptive and recursive execution. It provides reusable components to control iteration, stopping, and unrolling during training, ena…
PaperBanana: Automating Academic Illustration For AI Scientists
From Word to World: Can Large Language Models be Implicit Text-based World Models?
Staging area for a public release of Theorizer
A tool to use the Ai2 Open Coding Agents Soft-Verified Efficient Repository Agents (SERA) model with Claude Code
Your own personal AI assistant. Any OS. Any Platform. The lobster way. 🦞
A framework for testing and evaluating AI agents across various task domains, designed for misalignment interpretability research.
📑 PageIndex: Document Index for Vectorless, Reasoning-based RAG
Dual-control RL environment for incident response training with adversarial evidence, OpenEnv-compatible, plus evaluation tooling and datasets.
A Python library for LLM-based evaluation using weighted rubrics.
Run, deploy and monitor CLI agents in secure cloud sandboxes.
Generate High-Quality Synthetics, Train, Measure, and Evaluate in a Single Pipeline
Inspect: A framework for large language model evaluations
CyberGym is a large-scale, high-quality cybersecurity evaluation framework designed to rigorously assess the capabilities of AI agents on real-world vulnerability analysis tasks.
Harness for running and evaluating AI agents against RL environments
Anthropic's original performance take-home, now open for you to try!
This is AI implementation (not official) of the DreamGym framework from the paper "Scaling Agent Learning via Experience Synthesis" (arXiv:2511.03773).