ai-code
Can Language Models Solve Olympiad Programming?
SWE-bench: Can Language Models Resolve Real-world Github Issues?
AlphaVerus: Formally Verified Code Generation through Self-Improving Translation and Treefinement
A generative AI extension for JupyterLab
Repository for the paper "Large Language Model-Based Agents for Software Engineering: A Survey". Keep updating.
CLEVER: Code Lean Evaluation for Verified End-to-end Reasoning
Verina (Verifiable Code Generation Arena) is a high-quality benchmark enabling a comprehensive and modular evaluation of code, specification, and proof generation as well as their compositions.
DistAI: Data-Driven Automated Invariant Learning for Distributed Protocols
DafnyBench: A Benchmark for Formal Software Verification
[TMLR] A curated list of language modeling researches for code (and other software engineering activities), plus related datasets.
Code for "Clause2Inv: A Generate-Combine-Check Framework for Loop Invariant Inference" at ISSTA 2025
Write formal proofs in natural language and LaTeX.
💫 Toolkit to help you get started with Spec-Driven Development
Research code artifacts for Code World Model (CWM) including inference tools, reproducibility, and documentation.
[EMNLP 2024] CodeJudge: Evaluating Code Generation with Large Language Models