0% found this document useful (0 votes)

18 views30 pages

Acgv 2

The document outlines the process for creating and submitting advanced coding problems for a data collection project, emphasizing the use of a CLI tool for setup and problem creation. It details the required directory structure, file types, and metadata needed for submissions, as well as guidelines for ensuring problems are challenging for AI agents. Additionally, it includes validation processes, testing requirements, and prohibited sources to maintain quality and integrity in submissions.

Uploaded by

narendradasu6

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views30 pages

Acgv 2

Uploaded by

narendradasu6

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 30

# Advanced Code Genera on Veriﬁers - Data Collec on Project

## Quick Start - Create a New Problem

The fastest way to get started is using our CLI tool. First, set up your environment:

```bash

uv sync

uv run python setup_precommit.py

```

Then create a new problem with the interac ve CLI:

```bash

uv run python -m scripts.create_problem

```

The CLI will:

- Prompt you to select a programming language

- Ask for your problem name

- Get your git username automa cally

- Create all required ﬁles with proper templates

- Generate the correct directory structure with today's date

- Set up branch naming as `<language>/<name>` (e.g., `python/graph-algorithms`)

**That's it!** Your problem structure is ready - just ﬁll in the templates.

## Branch Naming
Branch names must follow the format: `<language>/<name>`

**Examples:**

- `python/graph-algorithms` - Topic-based grouping

- `javascript/add-sor ng-problems` - Feature descrip on

- `cpp/batch-1` - Batch iden ﬁer

- `java/advanced-data-structures` - Subject area

This allows you to group mul ple related problems in a single PR.

## Project Background

This project collects challenging human-reviewed prompts and test implementa ons for
advanced coding subjects. The goal is to create graduate-level problems that challenge advanced
AI agents across 11 programming languages.

For detailed methodology, problem crea on guidelines, and diﬃculty assessment criteria, see
**[INSTRUCTIONS.md](INSTRUCTIONS.md)**.

## Required Directory Structure

Each submission must follow this exact format:

```

languages/{LANGUAGE}/{YYYY-MM-DD-username-problem-name}/

├── metadata.json # Problem metadata and conﬁgura on

├── problem.txt # Main problem statement

├── background.txt # Domain-speciﬁc context (if needed)

├── src/
│ └── main.py # Reference solu on (entrypoint)

├── test.py # Comprehensive test suite

└── README.md # Problem-speciﬁc documenta on

```

## Required Files Overview

| File | Purpose | Requirements |

|------|---------|-------------|

| `metadata.json` | Problem metadata | Diﬃculty, labels, test coverage, dependencies |

| `problem.txt` | Problem statement | Clear descrip on, examples, constraints |

| `background.txt` | Domain context | Deﬁni ons needed (can be empty) |

| `src/main.*` | Reference solu on | Fully executable, documented, complete |

| `test.*` | Test suite | 10-20+ test cases, uses standard frameworks |

| `README.md` | Instruc ons | How to run, test commands, dependencies |

### metadata.json Example

```json

"name": "problem-name",

"diﬃculty": "Medium|Hard|Expert",

"subtask": "Code Genera on|Code Edi ng",

"subject_labels": ["Array", "Hash Table", "Algorithm"],

"test_coverage": {

"line_coverage": "X%",

"branch_coverage": "Y%"

"test_ﬁlename": "test.py", // (op onal), will default to test.[ext]

"entrypoint": "main.py", // (op onal), will default to main.[ext]

"setup_command": "pip install --upgrade pip", // (op onal) this can be any command that
needs to happen before execu on of the main code happens, note: we will install dependencies

"run_command": "python src/main.py", // (op onal), this is the priority of tes ng:
test_coverage_command -> test_command -> run_command

"test_coverage_command": "python -m pytest --cov=src.main --cov-branch --cov-report=term-

missing test.py",

"dependencies": {

"numpy": ">=1.26.0"

"test_dependencies": {

"pytest": ">=8.4.0"

"references": {

"problem_source_repo": "h ps://github.com/...",

"problem_source_ﬁle": "h ps://github.com/..."

```

### Problem-speciﬁc README.md Template

Each problem needs its own README with:

```markdown

# Problem Name

## How to Run

uv run python src/main.py

## How to Run Tests with Coverage

uv run python -m pytest --cov=src.main --cov-branch --cov-report=term-missing test.py

```

## Problem Requirements

### Must Include

- Graduate-level diﬃculty that challenges advanced AI agents

- Clear speciﬁca ons with unambiguous input/output

- Mul -step reasoning requiring problem decomposi on

- **Comprehensive tes ng** with 10-20+ test cases covering edge cases

- Complete working solu on with proper documenta on

### Must Not Include

- Mul ple-choice ques ons

- Proof problems

- Surface-level problems easily solved by AI

- Ambiguous problem statements

- Problems from prohibited sources (see below)

## Diﬃculty Levels

Test your problem against AI agents:

- **Medium**: Required agent (Nova Premier) fails, addi onal agent (GPT/Claude) succeeds

- **Hard**: Both required and addi onal agents fail with mild issues

- Expert: Both agents fail with severe issues

## Valida on & Tes ng

### Automated Valida on

```bash

# Validate only your changes (recommended during development)

uv run python -m scripts.validate_submission --changed

# Validate all submissions

uv run python -m scripts.validate_submission

# Validate speciﬁc problem

uv run python -m scripts.validate_submission --problem two-sum

```

### Manual Tes ng

```bash

# Navigate to your problem directory

cd languages/{LANGUAGE}/{YYYY-MM-DD-username-problem-name}/

# Test your solu on

uv run python -m pytest --cov=src.main --cov-branch test.py

# Verify solu on runs

uv run python src/main.py

```

### Pre-commit Hooks

The setup script automa cally conﬁgures pre-commit hooks that will:

- Validate branch names (must be `<language>/<name>`)

- Check submission structure

- Format Python code

- Run valida on before each commit

## Automated Evalua on Pipeline

When you submit a pull request, your problems undergo comprehensive automated evalua on
across three valida on systems:

### Requirements Valida on

Veriﬁes that your problem meets all quality standards:

- **Metadata Completeness**: All required fields, proper file structure, command specifica ons

- **Content Quality**: Clear I/O speciﬁca ons, comprehensive tes ng, proper dependencies

- Problem Characteris cs: Graduate-level complexity, mul -step reasoning, unambiguous

language

- Test Standards: Framework compliance, adequate coverage, comprehensive cases

- Format Compliance: Natural language, proper entry points, no prohibited formats

### Model Diﬃculty Valida on

Tests your problem against mul ple AI models to ensure appropriate diﬃculty:

- **Challenge Level**: Validates that advanced AI agents struggle with your problem

- **Difficulty Correla on**: Confirms stated difficulty matches actual agent performance

- **Mul -Agent Tes ng**: Requires at least 2 models to fail for proper challenge level

- **Solu on Veriﬁca on**: Tests that valid solu ons can be generated and executed

### Execu on Valida on

Ensures your code and tests work correctly in a clean environment:

- **Code Execu on**: Runs your solu on against provided test cases

- **Test Suite Valida on**: Veriﬁes all tests pass with your reference solu on

- Coverage Analysis: Conﬁrms test coverage meets minimum requirements

- **Environment Setup**: Tests dependency installa on and setup commands

- Cross-Pla orm Compa bility: Validates execu on across diﬀerent environments

### Evalua on Results

A er evalua on completes, you'll receive detailed feedback on:

- PASSED: All valida ons successful - problem ready for review

- WARNINGS: Minor issues that don't block acceptance

- FAILED: Cri cal issues requiring ﬁxes before approval

Results are automa cally posted as PR comments with speciﬁc guidance for any required
improvements.

## Execu on Environments

All submissions are tested in clean, isolated Docker containers with the following speciﬁca ons:

### System Resources

- Timeout: 120 seconds per execu on

- Memory Limit: 512MB

- CPU Limit: 1.0 core

### Supported Languages & Environments

| Language | Docker Image | Package Manager | Test Framework | Coverage Tool |

|----------|--------------|-----------------|----------------|---------------|

| Python | `python:3.11-slim` | pip | pytest | pytest-cov |

| Java | `amazoncorre o:17-alpine-jdk` | Maven | JUnit 5 | JaCoCo |

| JavaScript | `node:18-alpine` | npm | Jest | Jest Coverage |

| TypeScript | `node:18-alpine` | npm | Jest | Jest Coverage |

| Go | `golang:1.21-alpine` | go mod | Built-in | Built-in |

| C | `gcc:latest` | apk | Built-in | gcov |

| C++ | `gcc:latest` | apk | Built-in | gcov |

| Ruby | `ruby:3.2-alpine` | gem | RSpec/Minitest | SimpleCov |

| PHP | `php:8.2-cli-alpine` | Composer | PHPUnit | PHPUnit Coverage |

| COBOL | `esolang/cobol` | Built-in | Built-in | Manual |

### Command Execu on Order

1. Setup Commands: Dependency installa on, environment conﬁgura on

2. **Test Execu on**: Run your test suite with coverage (if available)

3. **Result Parsing**: Extract test results, coverage metrics, and execu on logs

### Language-Speciﬁc Considera ons

The executor copies all ﬁles from your problem directory to maintain complete context.
However, some languages have speciﬁc requirements:

**Java**:

- Use `entrypoint` and `test_ﬁlename` in metadata.json to specify correct class names

- Example: `"entrypoint": "StringU ls.java"`, `"test_ﬁlename": "TestStringU ls.java"`

**Go**:

- All `.go` ﬁles are automa cally placed in the same package (root directory) during execu on

- This allows test ﬁles to access func ons from main ﬁles regardless of original directory
structure

- Your tests and main code will be in the same package during execu on

**General**: All auxiliary files (like `test_with_coverage.rb`, configura on files, etc.) are
automa cally available during execu on.
### **Dependency Management**

Each language automa cally handles dependencies speciﬁed in `metadata.json`:

```json

"dependencies": {

"package-name": ">=1.0.0"

"test_dependencies": {

"tes ng-framework": ">=2.0.0"

```

Auto-generated dependency ﬁles:

- **Python**: `requirements.txt`

- Java: `pom.xml` with Maven conﬁgura on

- JavaScript/TypeScript: `package.json` with npm dependencies

- Rust: `Cargo.toml` with workspace setup

- Go: `go.mod` with module dependencies

- Ruby: `Gemﬁle` with bundler

- PHP: `composer.json` with Composer

### Custom Commands

You can override default behavior using metadata ﬁelds:

```json

{
"setup_command": "custom setup script",

"run_command": "custom execu on command",

"test_command": "custom test command",

"test_coverage_command": "custom coverage command"

```

### Environment Variables

Language-speciﬁc environment variables are automa cally set:

- Python: `PYTHONPATH`, `PYTHONUNBUFFERED`, `PYTHONDONTWRITEBYTECODE`

- Java: `JAVA_HOME`, `MAVEN_OPTS`, `MAVEN_CONFIG`

- Node.js: `NODE_ENV`, `NPM_CONFIG_CACHE`, `NPM_CONFIG_PREFIX`

- Rust: `CARGO_HOME`, `CARGO_TARGET_DIR`, `RUSTUP_HOME`

- Go: `GOPATH`, `GOPROXY`, `GOMOD`

### Tes ng Guidelines

- Use standard tes ng frameworks for your language

- Include 10-20+ comprehensive test cases

- Aim for >80% code coverage where possible

- Test edge cases and error condi ons

- Ensure tests pass with your reference solu on

## Prohibited Sources

Do not use problems from:

- ML Engineering: huggingface/transformers.js, huggingface/transformers, pytorch/extension-

cpp, jax-ml/jax
- Scien ﬁc Coding: cer k/theore cal-physics, gszauer/GamePhysicsCookbook, sxs-
collabora on/spectre

## Submission Checklist

Before submi ng, verify:

- [ ] Used CLI tool or followed exact directory structure

- [ ] Branch name follows `<language>/<name>` format

- [ ] All required ﬁles present and complete

- [ ] Problem tested against AI agents for appropriate diﬃculty

- [ ] Solu on is executable and passes all tests

- [ ] Test suite has 10-20+ comprehensive test cases

- [ ] Problem-speciﬁc README includes run/test commands

- [ ] Metadata.json properly ﬁlled out

- [ ] No prohibited sources used

## Contribu ng

For detailed contribu on guidelines, see [CONTRIBUTING.md](CONTRIBUTING.md).

For comprehensive project methodology and problem crea on guidelines, see

[INSTRUCTIONS.md](INSTRUCTIONS.md).

## Support

Ques ons? Check the sample problem at `languages/Python/2025-06-09-stytarenko-two-sum/`

for the deﬁni ve format example.

`JUST AN EXAMPLE OF A RIGHT STRUCTURE, NOT A REFERENCE IN TERMS OF COMPLEXITY!`

# FAQ

- How do I check what's wrong with my PR?

- [Loom](h ps://www.loom.com/share/0c35b0374cbd41c29aa875aa263cd835?sid=debfcf75-
b45e-44f3-9144-7dc5626376a8)

# Contribu ng to Rainforest Coding Veriﬁers

## Quick Start - Recommended Approach

The fastest and most reliable way to contribute is using our CLI tool:

```bash

# Set up your environment

uv sync

uv run python setup_precommit.py

# Create a new problem interac vely

uv run python -m scripts.create_problem

```

The CLI will handle all the setup for you, including proper ﬁle structure and branch naming.

## Branch Naming Requirements

Branch names must follow the format: `<language>/<name>`

**Examples:**
- `python/graph-algorithms` - Topic-based grouping

- `javascript/add-sor ng-problems` - Feature descrip on

- `cpp/batch-1` - Batch iden ﬁer

- `java/advanced-data-structures` - Subject area

This allows you to group mul ple related problems in a single PR.

## Required Directory Structure

Each submission must follow this exact format:

```

languages/{LANGUAGE}/{YYYY-MM-DD-username-problem-name}/

├── metadata.json # Problem metadata and conﬁgura on

├── problem.txt # Main problem statement

├── background.txt # Domain-speciﬁc context (if needed)

├── src/

│ └── main.py # Reference solu on (entrypoint)

├── test.py # Comprehensive test suite

└── README.md # Problem-speciﬁc documenta on

```

### Directory Naming Format

```

YYYY-MM-DD-username-problem-name

```

**Examples:**
- `2024-01-15-john-two-sum`

- `2024-01-15-sarah-binary-search`

- `2024-01-16-mike-reverse-linked-list`

- `2024-01-15-john-ﬁbonacci-recursive` (same day, diﬀerent problem)

**Components:**

- **Date**: Use the date when you start working on the problem (YYYY-MM-DD format)

- **Username**: Your GitHub username (lowercase, replace spaces/special chars with hyphens)

- Name: Brief, descrip ve name using hyphens instead of spaces

## Required Files

| File | Purpose | Requirements |

|------|---------|-------------|

| `metadata.json` | Problem metadata | Diﬃculty, labels, test coverage, dependencies |

| `problem.txt` | Problem statement | Clear descrip on, examples, constraints |

| `background.txt` | Domain context | Deﬁni ons needed (can be empty) |

| `src/main.*` | Reference solu on | Fully executable, documented, complete |

| `test.*` | Test suite | 10-20+ test cases, uses standard frameworks |

| `README.md` | Instruc ons | How to run, test commands, dependencies |

## Problem Requirements

### Must Include

- Graduate-level diﬃculty that challenges advanced AI agents

- Clear speciﬁca ons with unambiguous input/output

- Mul -step reasoning requiring problem decomposi on

- **Comprehensive tes ng** with 10-20+ test cases covering edge cases

- Complete working solu on with proper documenta on

### Must Not Include

- Mul ple-choice ques ons

- Proof problems

- Surface-level problems easily solved by AI

- Ambiguous problem statements

- Problems from prohibited sources

## Valida on & Tes ng

### Before Submi ng

Run valida on on your changes:

```bash

# Validate only your changes (recommended during development)

uv run python -m scripts.validate_submission --changed

# Test your solu on

cd languages/{LANGUAGE}/{YYYY-MM-DD-username-problem-name}/

uv run python -m pytest --cov=src.main --cov-branch test.py

uv run python src/main.py

```

### Pre-commit Hooks

The setup script automa cally conﬁgures hooks that will:

- Validate branch names (must be `<language>/<name>`)

- Check submission structure

- Format code

- Run valida on before each commit

## Submission Workﬂow

### 1. Setup (One- me)

```bash

uv sync

uv run python setup_precommit.py

```

### 2. Create Problem (Recommended)

```bash

uv run python -m scripts.create_problem

```

### 3. Alterna ve: Manual Crea on

If not using the CLI tool, ensure you:

1. Create branch with `<language>/<name>` format

2. Follow the exact directory structure above

3. Include all required ﬁles

4. Run valida on before commi ng

### 4. Submit Pull Request

- Title: `Add [Language]: [Problem Name]`

- Include brief descrip on of the problem and approach

- Ensure all valida on passes

## Final Checklist

Before submi ng, verify:

- [ ] Used CLI tool or followed exact directory structure

- [ ] Branch name follows `<language>/<name>` format

- [ ] All required ﬁles present and complete

- [ ] Problem tested against AI agents for appropriate diﬃculty

- [ ] Solu on is executable and passes all tests

- [ ] Test suite has 10-20+ comprehensive test cases

- [ ] Problem-speciﬁc README includes run/test commands

- [ ] Metadata.json properly ﬁlled out

- [ ] No prohibited sources used

- [ ] Valida on passes: `uv run python -m scripts.validate_submission --changed`

## Ques ons?

Check the sample problem at `languages/Python/2025-06-09-stytarenko-two-sum/` for the

deﬁni ve format example.

# AGENTS.md - AI Agent Guide for Problem Submission

## Quick Start for AI Agents

This guide provides step-by-step instruc ons for AI agents to successfully submit coding
problems to this repository.

## Repository Structure Overview

```

rainforest-coding-veriﬁers/

├── README.md # Main documenta on

├── INSTRUCTIONS.md # Detailed problem crea on guidelines

├── AGENTS.md # This ﬁle - agent-speciﬁc instruc ons

├── scripts/

│ ├── create_problem.py # CLI tool for problem crea on

│ └── validate_submission.py # Valida on tool

├── languages/ # All problems organized by language

│ ├── Python/

│ ├── JavaScript/

│ ├── Java/

│ └── [other languages]/

└── evalua on/ # Valida on systems (read-only)

```

## Step-by-Step Submission Workﬂow

### 1. Environment Setup

```bash

# Clone and setup

git clone <repository-url>

cd rainforest-coding-veriﬁers

uv sync

uv run python setup_precommit.py

```

### 2. Branch Crea on

Create a branch following the exact format: `<language>/<name>`

**Valid Examples:**
- `python/graph-algorithms`

- `javascript/sor ng-op miza on`

- `java/concurrent-data-structures`

- `cpp/memory-management`

**Commands:**

```bash

git checkout -b python/your-problem-name

```

### 3. Problem Crea on Using CLI (Recommended)

```bash

uv run python -m scripts.create_problem

```

The CLI will:

- Prompt for language selec on

- Ask for problem name

- Auto-generate directory with date preﬁx

- Create all required template ﬁles

- Set up proper structure

### 4. Manual Problem Crea on (Alterna ve)

If CLI is unavailable, create this exact structure:

```

languages/{LANGUAGE}/{YYYY-MM-DD-username-problem-name}/

├── metadata.json

├── problem.txt
├── background.txt

├── src/

│ └── main.{ext}

├── test.{ext}

└── README.md

```

## Required Files and Templates

### metadata.json Template

```json

"name": "problem-name",

"diﬃculty": "Medium|Hard|Expert",

"subtask": "Code Genera on|Code Edi ng",

"subject_labels": ["Array", "Hash Table", "Algorithm"],

"test_coverage": {

"line_coverage": "85.5%",

"branch_coverage": "90.2%"

"setup_command": "pip install --upgrade pip",

"run_command": "python src/main.py",

"test_command": "python -m pytest test.py",

"test_coverage_command": "python -m pytest --cov=src.main --cov-branch --cov-report=term-

missing test.py",

"dependencies": {

"numpy": ">=1.26.0"

"test_dependencies": {
"pytest": ">=8.4.0",

"pytest-cov": ">=4.0.0"

"references": {

"problem_source_repo": "h ps://github.com/...",

"problem_source_ﬁle": "h ps://github.com/..."

```

### problem.txt Template

```

[Clear problem statement with:]

- Problem descrip on

- Input format speciﬁca on

- Output format speciﬁca on

- Constraints

- Examples with expected outputs

- Edge cases to consider

```

### background.txt Template

```

[Domain-speciﬁc context, deﬁni ons, or mathema cal background]

[Can be empty if no special context needed]

```

### src/main.py Template (Python example)

```python
#!/usr/bin/env python3

"""

Problem: [Problem Name]

Author: [Your name]

Date: [Date]

[Brief descrip on of solu on approach]

"""

def main_func on(input_param):

"""

Main solu on func on.

Args:

input_param: [Descrip on]

Returns:

[Descrip on of return value]

"""

# Implementa on here

pass

if __name__ == "__main__":

# Example usage or input handling

pass

```

### test.py Template (Python example)

```python
import pytest

from src.main import main_func on

class TestMainFunc on:

"""Comprehensive test suite with 10-20+ test cases."""

def test_basic_cases(self):

"""Test basic func onality."""

assert main_func on(input1) == expected1

assert main_func on(input2) == expected2

def test_edge_cases(self):

"""Test edge cases."""

# Empty inputs, boundary values, etc.

pass

def test_error_condi ons(self):

"""Test error handling."""

# Invalid inputs, excep ons, etc.

pass

```

### Problem-speciﬁc README.md Template

```markdown

# [Problem Name]

## How to Run

uv run python src/main.py

## How to Run Tests

uv run python -m pytest test.py

## How to Run Tests with Coverage

uv run python -m pytest --cov=src.main --cov-branch --cov-report=term-missing test.py

## Dependencies

[List any special dependencies or setup requirements]

```

## Quality Requirements Checklist

Your problem MUST meet these requirements:

### Problem Characteris cs

- [ ] Graduate-level diﬃculty that challenges advanced AI agents

- [ ] Mul -step reasoning requiring problem decomposi on

- [ ] Clear speciﬁca ons with unambiguous input/output

- [ ] NOT mul ple-choice format

- [ ] NOT a proof problem

- [ ] NOT surface-level or trivial

### Technical Requirements

- [ ] Complete working solu on in src/ directory

- [ ] 10-20+ comprehensive test cases covering edge cases

- [ ] Test coverage ≥70% line, ≥80% branch (reported in metadata.json)

- [ ] All tests pass with your reference solu on

- [ ] Proper dependencies speciﬁed in metadata.json

- [ ] Working run/test commands speciﬁed

### File Requirements

- [ ] All required ﬁles present and non-empty

- [ ] metadata.json properly forma ed with all ﬁelds

- [ ] Problem statement clear and complete

- [ ] Solu on code documented and executable

## Valida on Commands

Before submi ng, run these valida on checks:

```bash

# Validate your changes (run from repo root)

uv run python -m scripts.validate_submission --changed

# Test your speciﬁc problem

cd languages/{LANGUAGE}/{YYYY-MM-DD-username-problem-name}/

uv run python -m pytest --cov=src.main --cov-branch test.py

uv run python src/main.py

# Check coverage meets requirements

uv run python -m pytest --cov=src.main --cov-branch --cov-report=term-missing test.py

```

## PR Submission Process

### 1. Commit Your Changes

```bash

git add .
git commit -m "Add [language]: [problem-name] - [brief descrip on]"

git push origin <language>/<name>

```

### 2. Create Pull Request

- Title: `Add [Language]: [Problem Name] - [Brief Descrip on]`

- **Descrip on**: Include problem diﬃculty, subject areas, and brief overview

- Base branch: `main`

### 3. PR Descrip on Template

```markdown

## Problem Summary

- Language: [Programming Language]

- **Diﬃculty**: [Medium/Hard/Expert]

- Subject Areas: [List key topics]

- Problem Type: [Code Genera on/Code Edi ng]

## Problem Descrip on

[Brief 2-3 sentence descrip on of what the problem asks]

## Key Challenges

- [Challenge 1]

- [Challenge 2]

- [Challenge 3]

## Valida on Checklist

- [ ] All required ﬁles present

- [ ] Solu on passes all tests

- [ ] Test coverage ≥70% line, ≥80% branch

- [ ] Problem tested against AI agents for appropriate diﬃculty

- [ ] Metadata.json complete and valid

```

## Common Pi alls to Avoid

### Directory Structure

- Missing date preﬁx in directory name

- Incorrect ﬁle extensions for language

- Missing required ﬁles

- `languages/Python/2025-01-15-agent-problem-name/`

### Problem Quality

- Too easy (solvable in 1-2 steps)

- Mul ple choice format

- Vague or ambiguous requirements

- Requires mul -step reasoning and careful implementa on

### Tes ng

- Only 2-3 basic test cases

- No edge case coverage

- Tests don't actually validate the solu on

- 10-20+ comprehensive tests with edge cases

## Automated Evalua on

A er PR submission, your problem will be automa cally evaluated across three systems:
1. **Requirements Valida on**: Checks problem quality, structure, and completeness

2. **Model Diﬃculty Valida on**: Tests against AI models to verify appropriate diﬃculty

3. Execu on Valida on: Ensures code runs correctly in clean environment

Results will be posted as PR comments with speciﬁc feedback for any required improvements.

## Troubleshoo ng

### Common Issues and Solu ons

Issue: Branch name valida on fails

- Solu on: Ensure branch follows exact format `<language>/<name>`

Issue: Coverage valida on fails

- Solu on: Add more comprehensive tests, especially edge cases

Issue: Problem too easy (models pass)

- **Solu on**: Increase complexity, add more constraints, require deeper reasoning

Issue: Tests fail in clean environment

- Solu on: Check dependencies in metadata.json, verify setup commands

Issue: Metadata valida on fails

- **Solu on**: Ensure all required ﬁelds present, proper JSON forma ng

## Reference Example

See the reference implementa on at:

`languages/Python/2025-06-09-stytarenko-two-sum/`
This shows the exact structure and format expected (though complexity should be higher for
your submissions).

## Success Criteria

Your PR will be approved when:

- All automated evalua ons pass

- Problem demonstrates appropriate diﬃculty for AI agents

- Code quality and documenta on meet standards

- Test coverage and comprehensiveness adequate

- Problem ﬁts project goals and requirements

Follow this guide systema cally, and your submission should pass all valida on checks
successfully.

Instructions - Advanced Code Generation Verifiers
No ratings yet
Instructions - Advanced Code Generation Verifiers
8 pages
Unittest Python
No ratings yet
Unittest Python
26 pages
TOC Python
No ratings yet
TOC Python
10 pages
SWE Java Task
No ratings yet
SWE Java Task
5 pages
CLAUDE
No ratings yet
CLAUDE
2 pages
Exp 13
No ratings yet
Exp 13
11 pages
Without MCP Claude Output
No ratings yet
Without MCP Claude Output
12 pages
Advanced LangChain AI Assistant Framework For Comp
No ratings yet
Advanced LangChain AI Assistant Framework For Comp
7 pages
Gradelib Na
No ratings yet
Gradelib Na
11 pages
HAv2 Write The Requirement
No ratings yet
HAv2 Write The Requirement
5 pages
Py Introduction
No ratings yet
Py Introduction
49 pages
Proposal For openSUSE AI-Driven Test Selection On Pull Request Acceptance Tests
No ratings yet
Proposal For openSUSE AI-Driven Test Selection On Pull Request Acceptance Tests
4 pages
Project Draft 1.2
No ratings yet
Project Draft 1.2
11 pages
List Every Thing and Every Problem To Solve To Abs
No ratings yet
List Every Thing and Every Problem To Solve To Abs
4 pages
Cosqa: Pioneering The Multi-Choice Code Search Benchmark With Test-Driven Agents
No ratings yet
Cosqa: Pioneering The Multi-Choice Code Search Benchmark With Test-Driven Agents
15 pages
Python Unit Testing Guide
No ratings yet
Python Unit Testing Guide
3 pages
Lecture 2
No ratings yet
Lecture 2
19 pages
Dev Requirements
No ratings yet
Dev Requirements
2 pages
How To Structure An ML Project For Reproducibility
No ratings yet
How To Structure An ML Project For Reproducibility
27 pages
Software Testing Labs or Tools Using Python Typically Involve Frameworks
No ratings yet
Software Testing Labs or Tools Using Python Typically Involve Frameworks
4 pages
(Turing) Guidelines For Python Puzzles
No ratings yet
(Turing) Guidelines For Python Puzzles
8 pages
Testing Machine Learning Systems - Code, Data and Models - Made With ML
No ratings yet
Testing Machine Learning Systems - Code, Data and Models - Made With ML
33 pages
Open-Source Coding Challenge Guide
No ratings yet
Open-Source Coding Challenge Guide
3 pages
Prompt
No ratings yet
Prompt
4 pages
Puulmann Cs 2014
No ratings yet
Puulmann Cs 2014
40 pages
Build An AI Coding Agent With LangGraph by LangChain
No ratings yet
Build An AI Coding Agent With LangGraph by LangChain
11 pages
AI Assistant For Task Automation On Local Machines
No ratings yet
AI Assistant For Task Automation On Local Machines
6 pages
Major Project
No ratings yet
Major Project
144 pages
4 2-Github Action
No ratings yet
4 2-Github Action
16 pages
Internship Task - AgenticAI
No ratings yet
Internship Task - AgenticAI
3 pages
Pytest Documentation: Release 3.5
No ratings yet
Pytest Documentation: Release 3.5
251 pages
Tue+Sep+20+23 56 35+GMT+05 00+2022
No ratings yet
Tue+Sep+20+23 56 35+GMT+05 00+2022
1 page
Evaluating Large Language Models Trained On Code
No ratings yet
Evaluating Large Language Models Trained On Code
35 pages
Hypermodern Python Chapter 2 - Testing Claudio Jolowicz
No ratings yet
Hypermodern Python Chapter 2 - Testing Claudio Jolowicz
12 pages
Project - 0x00. AirBnB Clone - The Console - ALX Africa Intranet
No ratings yet
Project - 0x00. AirBnB Clone - The Console - ALX Africa Intranet
29 pages
Pytest PDF
No ratings yet
Pytest PDF
320 pages
Intro To Ai Submission Guidelines
No ratings yet
Intro To Ai Submission Guidelines
2 pages
OpenAI Codex Arxiv
No ratings yet
OpenAI Codex Arxiv
35 pages
Ce468 Project Summer 2024 v2
No ratings yet
Ce468 Project Summer 2024 v2
14 pages
Data Engineering Challenge
No ratings yet
Data Engineering Challenge
4 pages
Python For Test Automation
No ratings yet
Python For Test Automation
30 pages
6.2 Unit Testing of Modules
No ratings yet
6.2 Unit Testing of Modules
7 pages
Big Code Bench
No ratings yet
Big Code Bench
62 pages
AI - AI417DE01 Project 0
No ratings yet
AI - AI417DE01 Project 0
8 pages
Pytest Documentation: Release 2.7.1
No ratings yet
Pytest Documentation: Release 2.7.1
219 pages
Pytest PDF
No ratings yet
Pytest PDF
219 pages
Pytest 2.5.2 Documentation Guide
No ratings yet
Pytest 2.5.2 Documentation Guide
193 pages
Pytest
No ratings yet
Pytest
193 pages
Message
No ratings yet
Message
3 pages
Python Cheatsheet
No ratings yet
Python Cheatsheet
14 pages
Docs Pytest Org en 6.2.x
No ratings yet
Docs Pytest Org en 6.2.x
374 pages
Firmware Test Automation Using Open Source Tools
No ratings yet
Firmware Test Automation Using Open Source Tools
9 pages
Message
No ratings yet
Message
3 pages
From Fastapi Import FastAPI, UploadFile
No ratings yet
From Fastapi Import FastAPI, UploadFile
1 page
From Fastapi Import FastAPI, UploadFile
No ratings yet
From Fastapi Import FastAPI, UploadFile
1 page
Prompt Optimization Grand Grimoire (v3.0 May 2025)
No ratings yet
Prompt Optimization Grand Grimoire (v3.0 May 2025)
16 pages
Online Assignment Plagiarism Check
No ratings yet
Online Assignment Plagiarism Check
5 pages
EVALUATION - Coding Data Requirements
No ratings yet
EVALUATION - Coding Data Requirements
24 pages
Wukashi Rules
No ratings yet
Wukashi Rules
14 pages
CIP Validation Tool
No ratings yet
CIP Validation Tool
2 pages
SPM Unit-3
No ratings yet
SPM Unit-3
26 pages
ActivityInfo Roles and Responsibilities 2023-24
No ratings yet
ActivityInfo Roles and Responsibilities 2023-24
5 pages
Zte Zxa10 C320
No ratings yet
Zte Zxa10 C320
2 pages
B8 - T1 - Maths
No ratings yet
B8 - T1 - Maths
3 pages
Telecom Traffic Measurement Metrics
No ratings yet
Telecom Traffic Measurement Metrics
1 page
Toughbook 33-Accessory and Service Guide: Mobility Solutions
No ratings yet
Toughbook 33-Accessory and Service Guide: Mobility Solutions
11 pages
Stability of Discrete-Time Systems
No ratings yet
Stability of Discrete-Time Systems
33 pages
List of AI Project Problem Statement
No ratings yet
List of AI Project Problem Statement
26 pages
Safety Check List 120312
100% (1)
Safety Check List 120312
16 pages
Routing Information Protocol
No ratings yet
Routing Information Protocol
10 pages
Aluminum Plate Stress Analysis
No ratings yet
Aluminum Plate Stress Analysis
9 pages
Enclosed Transfer Switches - REV
No ratings yet
Enclosed Transfer Switches - REV
11 pages
A Laser Interferometer-Based Precision Measurement System
No ratings yet
A Laser Interferometer-Based Precision Measurement System
20 pages
DevOps Unit-2 - 230325 - 123955
No ratings yet
DevOps Unit-2 - 230325 - 123955
15 pages
Lesson Plan Y3 - 30 March 2009
No ratings yet
Lesson Plan Y3 - 30 March 2009
2 pages
DICOM Structure Basics
No ratings yet
DICOM Structure Basics
36 pages
ProbabilisticMethod 7
No ratings yet
ProbabilisticMethod 7
12 pages
Asset Retirement - C.Aguda Projects
No ratings yet
Asset Retirement - C.Aguda Projects
8 pages
Rajeshboini Hotel Book
No ratings yet
Rajeshboini Hotel Book
4 pages
Operate Electrical, Electronic and Control Systems
100% (3)
Operate Electrical, Electronic and Control Systems
129 pages
Sex XXX 18 New Videos Sex XXX Sex XNXX Porn Sexy BF Videos Original Oficial
No ratings yet
Sex XXX 18 New Videos Sex XXX Sex XNXX Porn Sexy BF Videos Original Oficial
3 pages
SAMSON Positioner
No ratings yet
SAMSON Positioner
60 pages
Radio Resource Control Parameters
No ratings yet
Radio Resource Control Parameters
10 pages
DX Diag
No ratings yet
DX Diag
24 pages
Heading Image P and Table
No ratings yet
Heading Image P and Table
9 pages
Quantitative Economics With Julia PDF
No ratings yet
Quantitative Economics With Julia PDF
1,174 pages
ATT26748
No ratings yet
ATT26748
1 page
Data Visualization Tools Tableau: Presented by Submitted To
100% (1)
Data Visualization Tools Tableau: Presented by Submitted To
15 pages

Acgv 2

Uploaded by

Acgv 2

Uploaded by

# Advanced Code Genera on Veriﬁers - Data Collec on Project

## Quick Start - Create a New Problem

uv run python setup_precommit.py

Then create a new problem with the interac ve CLI:

uv run python -m scripts.create_problem

The CLI will:

- Prompt you to select a programming language

- Ask for your problem name

- Get your git username automa cally

- Create all required ﬁles with proper templates

- Generate the correct directory structure with today's date

- Set up branch naming as `<language>/<name>` (e.g., `python/graph-algorithms`)

- `python/graph-algorithms` - Topic-based grouping

- `javascript/add-sor ng-problems` - Feature descrip on

- `cpp/batch-1` - Batch iden ﬁer

- `java/advanced-data-structures` - Subject area

## Required Directory Structure

Each submission must follow this exact format:

├── metadata.json # Problem metadata and conﬁgura on

├── problem.txt # Main problem statement

├── background.txt # Domain-speciﬁc context (if needed)

├── test.py # Comprehensive test suite

└── README.md # Problem-speciﬁc documenta on

## Required Files Overview

| File | Purpose | Requirements |

| `metadata.json` | Problem metadata | Diﬃculty, labels, test coverage, dependencies |

| `problem.txt` | Problem statement | Clear descrip on, examples, constraints |

| `background.txt` | Domain context | Deﬁni ons needed (can be empty) |

| `src/main.*` | Reference solu on | Fully executable, documented, complete |

| `test.*` | Test suite | 10-20+ test cases, uses standard frameworks |

| `README.md` | Instruc ons | How to run, test commands, dependencies |

### metadata.json Example

"subtask": "Code Genera on|Code Edi ng",

"subject_labels": ["Array", "Hash Table", "Algorithm"],

"test_ﬁlename": "test.py", // (op onal), will default to test.[ext]

"entrypoint": "main.py", // (op onal), will default to main.[ext]

"test_coverage_command": "python -m pytest --cov=src.main --cov-branch --cov-report=term-

"problem_source_repo": "h ps://github.com/...",

"problem_source_ﬁle": "h ps://github.com/..."

### Problem-speciﬁc README.md Template

Each problem needs its own README with:

uv run python src/main.py

## How to Run Tests with Coverage

uv run python -m pytest --cov=src.main --cov-branch --cov-report=term-missing test.py

### Must Include

- **Graduate-level diﬃculty** that challenges advanced AI agents

- **Clear speciﬁca ons** with unambiguous input/output

- **Mul -step reasoning** requiring problem decomposi on

- **Complete working solu on** with proper documenta on

### Must Not Include

- Mul ple-choice ques ons

- Surface-level problems easily solved by AI

- Ambiguous problem statements

- Problems from prohibited sources (see below)

Test your problem against AI agents:

- **Expert**: Both agents fail with severe issues

## Valida on & Tes ng

# Validate only your changes (recommended during development)

uv run python -m scripts.validate_submission --changed

# Validate all submissions

uv run python -m scripts.validate_submission

# Validate speciﬁc problem

uv run python -m scripts.validate_submission --problem two-sum

### Manual Tes ng

# Navigate to your problem directory

# Test your solu on

uv run python -m pytest --cov=src.main --cov-branch test.py

# Verify solu on runs

uv run python src/main.py

### Pre-commit Hooks

- Validate branch names (must be `<language>/<name>`)

- Check submission structure

- Run valida on before each commit

## Automated Evalua on Pipeline

### **Requirements Valida on**

Veriﬁes that your problem meets all quality standards:

- Graduate-level diﬃculty that challenges advanced AI agents

- Clear speciﬁca ons with unambiguous input/output

- Mul -step reasoning requiring problem decomposi on

- Complete working solu on with proper documenta on

- Expert: Both agents fail with severe issues

### Requirements Valida on

- Problem Characteris cs: Graduate-level complexity, mul -step reasoning, unambiguous

- Test Standards: Framework compliance, adequate coverage, comprehensive cases

- Format Compliance: Natural language, proper entry points, no prohibited formats

### Model Diﬃculty Valida on

### Execu on Valida on

- Coverage Analysis: Conﬁrms test coverage meets minimum requirements

- Cross-Pla orm Compa bility: Validates execu on across diﬀerent environments

- PASSED: All valida ons successful - problem ready for review

- WARNINGS: Minor issues that don't block acceptance

- FAILED: Cri cal issues requiring ﬁxes before approval

### System Resources

- Timeout: 120 seconds per execu on

- Memory Limit: 512MB

- CPU Limit: 1.0 core

### Supported Languages & Environments

| Python | `python:3.11-slim` | pip | pytest | pytest-cov |

| Java | `amazoncorre o:17-alpine-jdk` | Maven | JUnit 5 | JaCoCo |

| JavaScript | `node:18-alpine` | npm | Jest | Jest Coverage |

| TypeScript | `node:18-alpine` | npm | Jest | Jest Coverage |

| Go | `golang:1.21-alpine` | go mod | Built-in | Built-in |

| C | `gcc:latest` | apk | Built-in | gcov |

| C++ | `gcc:latest` | apk | Built-in | gcov |

| Ruby | `ruby:3.2-alpine` | gem | RSpec/Minitest | SimpleCov |

| PHP | `php:8.2-cli-alpine` | Composer | PHPUnit | PHPUnit Coverage |

| COBOL | `esolang/cobol` | Built-in | Built-in | Manual |

### Command Execu on Order

1. Setup Commands: Dependency installa on, environment conﬁgura on

### Language-Speciﬁc Considera ons

Auto-generated dependency ﬁles:

- Java: `pom.xml` with Maven conﬁgura on

- JavaScript/TypeScript: `package.json` with npm dependencies

- Rust: `Cargo.toml` with workspace setup

- Go: `go.mod` with module dependencies

- Ruby: `Gemﬁle` with bundler

- PHP: `composer.json` with Composer

### Custom Commands

### Environment Variables

- Python: `PYTHONPATH`, `PYTHONUNBUFFERED`, `PYTHONDONTWRITEBYTECODE`

- Java: `JAVA_HOME`, `MAVEN_OPTS`, `MAVEN_CONFIG`

- Node.js: `NODE_ENV`, `NPM_CONFIG_CACHE`, `NPM_CONFIG_PREFIX`

- Rust: `CARGO_HOME`, `CARGO_TARGET_DIR`, `RUSTUP_HOME`

- Go: `GOPATH`, `GOPROXY`, `GOMOD`

### Tes ng Guidelines