0% found this document useful (0 votes)
18 views30 pages

Acgv 2

The document outlines the process for creating and submitting advanced coding problems for a data collection project, emphasizing the use of a CLI tool for setup and problem creation. It details the required directory structure, file types, and metadata needed for submissions, as well as guidelines for ensuring problems are challenging for AI agents. Additionally, it includes validation processes, testing requirements, and prohibited sources to maintain quality and integrity in submissions.

Uploaded by

narendradasu6
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views30 pages

Acgv 2

The document outlines the process for creating and submitting advanced coding problems for a data collection project, emphasizing the use of a CLI tool for setup and problem creation. It details the required directory structure, file types, and metadata needed for submissions, as well as guidelines for ensuring problems are challenging for AI agents. Additionally, it includes validation processes, testing requirements, and prohibited sources to maintain quality and integrity in submissions.

Uploaded by

narendradasu6
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

# Advanced Code Genera on Verifiers - Data Collec on Project

## Quick Start - Create a New Problem

The fastest way to get started is using our CLI tool. First, set up your environment:

```bash

uv sync

uv run python setup_precommit.py

```

Then create a new problem with the interac ve CLI:

```bash

uv run python -m scripts.create_problem

```

The CLI will:

- Prompt you to select a programming language

- Ask for your problem name

- Get your git username automa cally

- Create all required files with proper templates

- Generate the correct directory structure with today's date

- Set up branch naming as `<language>/<name>` (e.g., `python/graph-algorithms`)

**That's it!** Your problem structure is ready - just fill in the templates.

## Branch Naming
Branch names must follow the format: `<language>/<name>`

**Examples:**

- `python/graph-algorithms` - Topic-based grouping

- `javascript/add-sor ng-problems` - Feature descrip on

- `cpp/batch-1` - Batch iden fier

- `java/advanced-data-structures` - Subject area

This allows you to group mul ple related problems in a single PR.

## Project Background

This project collects challenging human-reviewed prompts and test implementa ons for
advanced coding subjects. The goal is to create graduate-level problems that challenge advanced
AI agents across 11 programming languages.

For detailed methodology, problem crea on guidelines, and difficulty assessment criteria, see
**[INSTRUCTIONS.md](INSTRUCTIONS.md)**.

## Required Directory Structure

Each submission must follow this exact format:

```

languages/{LANGUAGE}/{YYYY-MM-DD-username-problem-name}/

├── metadata.json # Problem metadata and configura on

├── problem.txt # Main problem statement

├── background.txt # Domain-specific context (if needed)

├── src/
│ └── main.py # Reference solu on (entrypoint)

├── test.py # Comprehensive test suite

└── README.md # Problem-specific documenta on

```

## Required Files Overview

| File | Purpose | Requirements |

|------|---------|-------------|

| `metadata.json` | Problem metadata | Difficulty, labels, test coverage, dependencies |

| `problem.txt` | Problem statement | Clear descrip on, examples, constraints |

| `background.txt` | Domain context | Defini ons needed (can be empty) |

| `src/main.*` | Reference solu on | Fully executable, documented, complete |

| `test.*` | Test suite | 10-20+ test cases, uses standard frameworks |

| `README.md` | Instruc ons | How to run, test commands, dependencies |

### metadata.json Example

```json

"name": "problem-name",

"difficulty": "Medium|Hard|Expert",

"subtask": "Code Genera on|Code Edi ng",

"subject_labels": ["Array", "Hash Table", "Algorithm"],

"test_coverage": {

"line_coverage": "X%",

"branch_coverage": "Y%"

},

"test_filename": "test.py", // (op onal), will default to test.[ext]

"entrypoint": "main.py", // (op onal), will default to main.[ext]


"setup_command": "pip install --upgrade pip", // (op onal) this can be any command that
needs to happen before execu on of the main code happens, note: we will install dependencies

"run_command": "python src/main.py", // (op onal), this is the priority of tes ng:
test_coverage_command -> test_command -> run_command

"test_coverage_command": "python -m pytest --cov=src.main --cov-branch --cov-report=term-


missing test.py",

"dependencies": {

"numpy": ">=1.26.0"

},

"test_dependencies": {

"pytest": ">=8.4.0"

},

"references": {

"problem_source_repo": "h ps://github.com/...",

"problem_source_file": "h ps://github.com/..."

```

### Problem-specific README.md Template

Each problem needs its own README with:

```markdown

# Problem Name

## How to Run

uv run python src/main.py

## How to Run Tests with Coverage

uv run python -m pytest --cov=src.main --cov-branch --cov-report=term-missing test.py


```

## Problem Requirements

### Must Include

- **Graduate-level difficulty** that challenges advanced AI agents

- **Clear specifica ons** with unambiguous input/output

- **Mul -step reasoning** requiring problem decomposi on

- **Comprehensive tes ng** with 10-20+ test cases covering edge cases

- **Complete working solu on** with proper documenta on

### Must Not Include

- Mul ple-choice ques ons

- Proof problems

- Surface-level problems easily solved by AI

- Ambiguous problem statements

- Problems from prohibited sources (see below)

## Difficulty Levels

Test your problem against AI agents:

- **Medium**: Required agent (Nova Premier) fails, addi onal agent (GPT/Claude) succeeds

- **Hard**: Both required and addi onal agents fail with mild issues

- **Expert**: Both agents fail with severe issues

## Valida on & Tes ng


### Automated Valida on

```bash

# Validate only your changes (recommended during development)

uv run python -m scripts.validate_submission --changed

# Validate all submissions

uv run python -m scripts.validate_submission

# Validate specific problem

uv run python -m scripts.validate_submission --problem two-sum

```

### Manual Tes ng

```bash

# Navigate to your problem directory

cd languages/{LANGUAGE}/{YYYY-MM-DD-username-problem-name}/

# Test your solu on

uv run python -m pytest --cov=src.main --cov-branch test.py

# Verify solu on runs

uv run python src/main.py

```

### Pre-commit Hooks

The setup script automa cally configures pre-commit hooks that will:

- Validate branch names (must be `<language>/<name>`)

- Check submission structure


- Format Python code

- Run valida on before each commit

## Automated Evalua on Pipeline

When you submit a pull request, your problems undergo comprehensive automated evalua on
across three valida on systems:

### **Requirements Valida on**

Verifies that your problem meets all quality standards:

- **Metadata Completeness**: All required fields, proper file structure, command specifica ons

- **Content Quality**: Clear I/O specifica ons, comprehensive tes ng, proper dependencies

- **Problem Characteris cs**: Graduate-level complexity, mul -step reasoning, unambiguous


language

- **Test Standards**: Framework compliance, adequate coverage, comprehensive cases

- **Format Compliance**: Natural language, proper entry points, no prohibited formats

### **Model Difficulty Valida on**

Tests your problem against mul ple AI models to ensure appropriate difficulty:

- **Challenge Level**: Validates that advanced AI agents struggle with your problem

- **Difficulty Correla on**: Confirms stated difficulty matches actual agent performance

- **Mul -Agent Tes ng**: Requires at least 2 models to fail for proper challenge level

- **Solu on Verifica on**: Tests that valid solu ons can be generated and executed

### **Execu on Valida on**

Ensures your code and tests work correctly in a clean environment:

- **Code Execu on**: Runs your solu on against provided test cases

- **Test Suite Valida on**: Verifies all tests pass with your reference solu on

- **Coverage Analysis**: Confirms test coverage meets minimum requirements


- **Environment Setup**: Tests dependency installa on and setup commands

- **Cross-Pla orm Compa bility**: Validates execu on across different environments

### Evalua on Results

A er evalua on completes, you'll receive detailed feedback on:

- **PASSED**: All valida ons successful - problem ready for review

- **WARNINGS**: Minor issues that don't block acceptance

- **FAILED**: Cri cal issues requiring fixes before approval

Results are automa cally posted as PR comments with specific guidance for any required
improvements.

## Execu on Environments

All submissions are tested in clean, isolated Docker containers with the following specifica ons:

### **System Resources**

- **Timeout**: 120 seconds per execu on

- **Memory Limit**: 512MB

- **CPU Limit**: 1.0 core

### **Supported Languages & Environments**

| Language | Docker Image | Package Manager | Test Framework | Coverage Tool |

|----------|--------------|-----------------|----------------|---------------|

| **Python** | `python:3.11-slim` | pip | pytest | pytest-cov |

| **Java** | `amazoncorre o:17-alpine-jdk` | Maven | JUnit 5 | JaCoCo |

| **JavaScript** | `node:18-alpine` | npm | Jest | Jest Coverage |

| **TypeScript** | `node:18-alpine` | npm | Jest | Jest Coverage |


| **Rust** | `rust:1.75` | Cargo | Built-in | Built-in |

| **Go** | `golang:1.21-alpine` | go mod | Built-in | Built-in |

| **C** | `gcc:latest` | apk | Built-in | gcov |

| **C++** | `gcc:latest` | apk | Built-in | gcov |

| **Ruby** | `ruby:3.2-alpine` | gem | RSpec/Minitest | SimpleCov |

| **PHP** | `php:8.2-cli-alpine` | Composer | PHPUnit | PHPUnit Coverage |

| **COBOL** | `esolang/cobol` | Built-in | Built-in | Manual |

### **Command Execu on Order**

1. **Setup Commands**: Dependency installa on, environment configura on

2. **Test Execu on**: Run your test suite with coverage (if available)

3. **Result Parsing**: Extract test results, coverage metrics, and execu on logs

### **Language-Specific Considera ons**

The executor copies all files from your problem directory to maintain complete context.
However, some languages have specific requirements:

**Java**:

- Use `entrypoint` and `test_filename` in metadata.json to specify correct class names

- Example: `"entrypoint": "StringU ls.java"`, `"test_filename": "TestStringU ls.java"`

**Go**:

- All `.go` files are automa cally placed in the same package (root directory) during execu on

- This allows test files to access func ons from main files regardless of original directory
structure

- Your tests and main code will be in the same package during execu on

**General**: All auxiliary files (like `test_with_coverage.rb`, configura on files, etc.) are
automa cally available during execu on.
### **Dependency Management**

Each language automa cally handles dependencies specified in `metadata.json`:

```json

"dependencies": {

"package-name": ">=1.0.0"

},

"test_dependencies": {

"tes ng-framework": ">=2.0.0"

```

**Auto-generated dependency files:**

- **Python**: `requirements.txt`

- **Java**: `pom.xml` with Maven configura on

- **JavaScript/TypeScript**: `package.json` with npm dependencies

- **Rust**: `Cargo.toml` with workspace setup

- **Go**: `go.mod` with module dependencies

- **Ruby**: `Gemfile` with bundler

- **PHP**: `composer.json` with Composer

### **Custom Commands**

You can override default behavior using metadata fields:

```json

{
"setup_command": "custom setup script",

"run_command": "custom execu on command",

"test_command": "custom test command",

"test_coverage_command": "custom coverage command"

```

### **Environment Variables**

Language-specific environment variables are automa cally set:

- **Python**: `PYTHONPATH`, `PYTHONUNBUFFERED`, `PYTHONDONTWRITEBYTECODE`

- **Java**: `JAVA_HOME`, `MAVEN_OPTS`, `MAVEN_CONFIG`

- **Node.js**: `NODE_ENV`, `NPM_CONFIG_CACHE`, `NPM_CONFIG_PREFIX`

- **Rust**: `CARGO_HOME`, `CARGO_TARGET_DIR`, `RUSTUP_HOME`

- **Go**: `GOPATH`, `GOPROXY`, `GOMOD`

### **Tes ng Guidelines**

- Use standard tes ng frameworks for your language

- Include 10-20+ comprehensive test cases

- Aim for >80% code coverage where possible

- Test edge cases and error condi ons

- Ensure tests pass with your reference solu on

## Prohibited Sources

Do not use problems from:

- ML Engineering: huggingface/transformers.js, huggingface/transformers, pytorch/extension-


cpp, jax-ml/jax
- Scien fic Coding: cer k/theore cal-physics, gszauer/GamePhysicsCookbook, sxs-
collabora on/spectre

## Submission Checklist

Before submi ng, verify:

- [ ] Used CLI tool or followed exact directory structure

- [ ] Branch name follows `<language>/<name>` format

- [ ] All required files present and complete

- [ ] Problem tested against AI agents for appropriate difficulty

- [ ] Solu on is executable and passes all tests

- [ ] Test suite has 10-20+ comprehensive test cases

- [ ] Problem-specific README includes run/test commands

- [ ] Metadata.json properly filled out

- [ ] No prohibited sources used

## Contribu ng

For detailed contribu on guidelines, see [CONTRIBUTING.md](CONTRIBUTING.md).

For comprehensive project methodology and problem crea on guidelines, see


[INSTRUCTIONS.md](INSTRUCTIONS.md).

## Support

Ques ons? Check the sample problem at `languages/Python/2025-06-09-stytarenko-two-sum/`


for the defini ve format example.

`JUST AN EXAMPLE OF A RIGHT STRUCTURE, NOT A REFERENCE IN TERMS OF COMPLEXITY!`


# FAQ

- How do I check what's wrong with my PR?

- [Loom](h ps://www.loom.com/share/0c35b0374cbd41c29aa875aa263cd835?sid=debfcf75-
b45e-44f3-9144-7dc5626376a8)

# Contribu ng to Rainforest Coding Verifiers

## Quick Start - Recommended Approach

The fastest and most reliable way to contribute is using our CLI tool:

```bash

# Set up your environment

uv sync

uv run python setup_precommit.py

# Create a new problem interac vely

uv run python -m scripts.create_problem

```

The CLI will handle all the setup for you, including proper file structure and branch naming.

## Branch Naming Requirements

Branch names **must** follow the format: `<language>/<name>`

**Examples:**
- `python/graph-algorithms` - Topic-based grouping

- `javascript/add-sor ng-problems` - Feature descrip on

- `cpp/batch-1` - Batch iden fier

- `java/advanced-data-structures` - Subject area

This allows you to group mul ple related problems in a single PR.

## Required Directory Structure

Each submission must follow this **exact** format:

```

languages/{LANGUAGE}/{YYYY-MM-DD-username-problem-name}/

├── metadata.json # Problem metadata and configura on

├── problem.txt # Main problem statement

├── background.txt # Domain-specific context (if needed)

├── src/

│ └── main.py # Reference solu on (entrypoint)

├── test.py # Comprehensive test suite

└── README.md # Problem-specific documenta on

```

### Directory Naming Format

```

YYYY-MM-DD-username-problem-name

```

**Examples:**
- `2024-01-15-john-two-sum`

- `2024-01-15-sarah-binary-search`

- `2024-01-16-mike-reverse-linked-list`

- `2024-01-15-john-fibonacci-recursive` (same day, different problem)

**Components:**

- **Date**: Use the date when you start working on the problem (YYYY-MM-DD format)

- **Username**: Your GitHub username (lowercase, replace spaces/special chars with hyphens)

- **Name**: Brief, descrip ve name using hyphens instead of spaces

## Required Files

| File | Purpose | Requirements |

|------|---------|-------------|

| `metadata.json` | Problem metadata | Difficulty, labels, test coverage, dependencies |

| `problem.txt` | Problem statement | Clear descrip on, examples, constraints |

| `background.txt` | Domain context | Defini ons needed (can be empty) |

| `src/main.*` | Reference solu on | Fully executable, documented, complete |

| `test.*` | Test suite | 10-20+ test cases, uses standard frameworks |

| `README.md` | Instruc ons | How to run, test commands, dependencies |

## Problem Requirements

### Must Include

- **Graduate-level difficulty** that challenges advanced AI agents

- **Clear specifica ons** with unambiguous input/output

- **Mul -step reasoning** requiring problem decomposi on

- **Comprehensive tes ng** with 10-20+ test cases covering edge cases

- **Complete working solu on** with proper documenta on


### Must Not Include

- Mul ple-choice ques ons

- Proof problems

- Surface-level problems easily solved by AI

- Ambiguous problem statements

- Problems from prohibited sources

## Valida on & Tes ng

### Before Submi ng

Run valida on on your changes:

```bash

# Validate only your changes (recommended during development)

uv run python -m scripts.validate_submission --changed

# Test your solu on

cd languages/{LANGUAGE}/{YYYY-MM-DD-username-problem-name}/

uv run python -m pytest --cov=src.main --cov-branch test.py

uv run python src/main.py

```

### Pre-commit Hooks

The setup script automa cally configures hooks that will:

- Validate branch names (must be `<language>/<name>`)

- Check submission structure

- Format code

- Run valida on before each commit


## Submission Workflow

### 1. Setup (One- me)

```bash

uv sync

uv run python setup_precommit.py

```

### 2. Create Problem (Recommended)

```bash

uv run python -m scripts.create_problem

```

### 3. Alterna ve: Manual Crea on

If not using the CLI tool, ensure you:

1. Create branch with `<language>/<name>` format

2. Follow the exact directory structure above

3. Include all required files

4. Run valida on before commi ng

### 4. Submit Pull Request

- Title: `Add [Language]: [Problem Name]`

- Include brief descrip on of the problem and approach

- Ensure all valida on passes

## Final Checklist

Before submi ng, verify:


- [ ] Used CLI tool or followed exact directory structure

- [ ] Branch name follows `<language>/<name>` format

- [ ] All required files present and complete

- [ ] Problem tested against AI agents for appropriate difficulty

- [ ] Solu on is executable and passes all tests

- [ ] Test suite has 10-20+ comprehensive test cases

- [ ] Problem-specific README includes run/test commands

- [ ] Metadata.json properly filled out

- [ ] No prohibited sources used

- [ ] Valida on passes: `uv run python -m scripts.validate_submission --changed`

## Ques ons?

Check the sample problem at `languages/Python/2025-06-09-stytarenko-two-sum/` for the


defini ve format example.

# AGENTS.md - AI Agent Guide for Problem Submission

## Quick Start for AI Agents

This guide provides step-by-step instruc ons for AI agents to successfully submit coding
problems to this repository.

## Repository Structure Overview

```

rainforest-coding-verifiers/

├── README.md # Main documenta on


├── INSTRUCTIONS.md # Detailed problem crea on guidelines

├── AGENTS.md # This file - agent-specific instruc ons

├── scripts/

│ ├── create_problem.py # CLI tool for problem crea on

│ └── validate_submission.py # Valida on tool

├── languages/ # All problems organized by language

│ ├── Python/

│ ├── JavaScript/

│ ├── Java/

│ └── [other languages]/

└── evalua on/ # Valida on systems (read-only)

```

## Step-by-Step Submission Workflow

### 1. Environment Setup

```bash

# Clone and setup

git clone <repository-url>

cd rainforest-coding-verifiers

uv sync

uv run python setup_precommit.py

```

### 2. Branch Crea on

Create a branch following the **exact format**: `<language>/<name>`

**Valid Examples:**
- `python/graph-algorithms`

- `javascript/sor ng-op miza on`

- `java/concurrent-data-structures`

- `cpp/memory-management`

**Commands:**

```bash

git checkout -b python/your-problem-name

```

### 3. Problem Crea on Using CLI (Recommended)

```bash

uv run python -m scripts.create_problem

```

The CLI will:

- Prompt for language selec on

- Ask for problem name

- Auto-generate directory with date prefix

- Create all required template files

- Set up proper structure

### 4. Manual Problem Crea on (Alterna ve)

If CLI is unavailable, create this **exact structure**:

```

languages/{LANGUAGE}/{YYYY-MM-DD-username-problem-name}/

├── metadata.json

├── problem.txt
├── background.txt

├── src/

│ └── main.{ext}

├── test.{ext}

└── README.md

```

## Required Files and Templates

### metadata.json Template

```json

"name": "problem-name",

"difficulty": "Medium|Hard|Expert",

"subtask": "Code Genera on|Code Edi ng",

"subject_labels": ["Array", "Hash Table", "Algorithm"],

"test_coverage": {

"line_coverage": "85.5%",

"branch_coverage": "90.2%"

},

"setup_command": "pip install --upgrade pip",

"run_command": "python src/main.py",

"test_command": "python -m pytest test.py",

"test_coverage_command": "python -m pytest --cov=src.main --cov-branch --cov-report=term-


missing test.py",

"dependencies": {

"numpy": ">=1.26.0"

},

"test_dependencies": {
"pytest": ">=8.4.0",

"pytest-cov": ">=4.0.0"

},

"references": {

"problem_source_repo": "h ps://github.com/...",

"problem_source_file": "h ps://github.com/..."

```

### problem.txt Template

```

[Clear problem statement with:]

- Problem descrip on

- Input format specifica on

- Output format specifica on

- Constraints

- Examples with expected outputs

- Edge cases to consider

```

### background.txt Template

```

[Domain-specific context, defini ons, or mathema cal background]

[Can be empty if no special context needed]

```

### src/main.py Template (Python example)

```python
#!/usr/bin/env python3

"""

Problem: [Problem Name]

Author: [Your name]

Date: [Date]

[Brief descrip on of solu on approach]

"""

def main_func on(input_param):

"""

Main solu on func on.

Args:

input_param: [Descrip on]

Returns:

[Descrip on of return value]

"""

# Implementa on here

pass

if __name__ == "__main__":

# Example usage or input handling

pass

```

### test.py Template (Python example)

```python
import pytest

from src.main import main_func on

class TestMainFunc on:

"""Comprehensive test suite with 10-20+ test cases."""

def test_basic_cases(self):

"""Test basic func onality."""

assert main_func on(input1) == expected1

assert main_func on(input2) == expected2

def test_edge_cases(self):

"""Test edge cases."""

# Empty inputs, boundary values, etc.

pass

def test_error_condi ons(self):

"""Test error handling."""

# Invalid inputs, excep ons, etc.

pass

```

### Problem-specific README.md Template

```markdown

# [Problem Name]

## How to Run

uv run python src/main.py


## How to Run Tests

uv run python -m pytest test.py

## How to Run Tests with Coverage

uv run python -m pytest --cov=src.main --cov-branch --cov-report=term-missing test.py

## Dependencies

[List any special dependencies or setup requirements]

```

## Quality Requirements Checklist

Your problem MUST meet these requirements:

### Problem Characteris cs

- [ ] **Graduate-level difficulty** that challenges advanced AI agents

- [ ] **Mul -step reasoning** requiring problem decomposi on

- [ ] **Clear specifica ons** with unambiguous input/output

- [ ] **NOT mul ple-choice** format

- [ ] **NOT a proof problem**

- [ ] **NOT surface-level** or trivial

### Technical Requirements

- [ ] **Complete working solu on** in src/ directory

- [ ] **10-20+ comprehensive test cases** covering edge cases

- [ ] **Test coverage ≥70% line, ≥80% branch** (reported in metadata.json)

- [ ] **All tests pass** with your reference solu on

- [ ] **Proper dependencies** specified in metadata.json

- [ ] **Working run/test commands** specified


### File Requirements

- [ ] All required files present and non-empty

- [ ] metadata.json properly forma ed with all fields

- [ ] Problem statement clear and complete

- [ ] Solu on code documented and executable

## Valida on Commands

Before submi ng, run these valida on checks:

```bash

# Validate your changes (run from repo root)

uv run python -m scripts.validate_submission --changed

# Test your specific problem

cd languages/{LANGUAGE}/{YYYY-MM-DD-username-problem-name}/

uv run python -m pytest --cov=src.main --cov-branch test.py

uv run python src/main.py

# Check coverage meets requirements

uv run python -m pytest --cov=src.main --cov-branch --cov-report=term-missing test.py

```

## PR Submission Process

### 1. Commit Your Changes

```bash

git add .
git commit -m "Add [language]: [problem-name] - [brief descrip on]"

git push origin <language>/<name>

```

### 2. Create Pull Request

- **Title**: `Add [Language]: [Problem Name] - [Brief Descrip on]`

- **Descrip on**: Include problem difficulty, subject areas, and brief overview

- **Base branch**: `main`

### 3. PR Descrip on Template

```markdown

## Problem Summary

- **Language**: [Programming Language]

- **Difficulty**: [Medium/Hard/Expert]

- **Subject Areas**: [List key topics]

- **Problem Type**: [Code Genera on/Code Edi ng]

## Problem Descrip on

[Brief 2-3 sentence descrip on of what the problem asks]

## Key Challenges

- [Challenge 1]

- [Challenge 2]

- [Challenge 3]

## Valida on Checklist

- [ ] All required files present

- [ ] Solu on passes all tests

- [ ] Test coverage ≥70% line, ≥80% branch


- [ ] Problem tested against AI agents for appropriate difficulty

- [ ] Metadata.json complete and valid

```

## Common Pi alls to Avoid

### Directory Structure

- Missing date prefix in directory name

- Incorrect file extensions for language

- Missing required files

- `languages/Python/2025-01-15-agent-problem-name/`

### Problem Quality

- Too easy (solvable in 1-2 steps)

- Mul ple choice format

- Vague or ambiguous requirements

- Requires mul -step reasoning and careful implementa on

### Tes ng

- Only 2-3 basic test cases

- No edge case coverage

- Tests don't actually validate the solu on

- 10-20+ comprehensive tests with edge cases

## Automated Evalua on

A er PR submission, your problem will be automa cally evaluated across three systems:
1. **Requirements Valida on**: Checks problem quality, structure, and completeness

2. **Model Difficulty Valida on**: Tests against AI models to verify appropriate difficulty

3. **Execu on Valida on**: Ensures code runs correctly in clean environment

Results will be posted as PR comments with specific feedback for any required improvements.

## Troubleshoo ng

### Common Issues and Solu ons

**Issue**: Branch name valida on fails

- **Solu on**: Ensure branch follows exact format `<language>/<name>`

**Issue**: Coverage valida on fails

- **Solu on**: Add more comprehensive tests, especially edge cases

**Issue**: Problem too easy (models pass)

- **Solu on**: Increase complexity, add more constraints, require deeper reasoning

**Issue**: Tests fail in clean environment

- **Solu on**: Check dependencies in metadata.json, verify setup commands

**Issue**: Metadata valida on fails

- **Solu on**: Ensure all required fields present, proper JSON forma ng

## Reference Example

See the reference implementa on at:

`languages/Python/2025-06-09-stytarenko-two-sum/`
This shows the exact structure and format expected (though complexity should be higher for
your submissions).

## Success Criteria

Your PR will be approved when:

- All automated evalua ons pass

- Problem demonstrates appropriate difficulty for AI agents

- Code quality and documenta on meet standards

- Test coverage and comprehensiveness adequate

- Problem fits project goals and requirements

Follow this guide systema cally, and your submission should pass all valida on checks
successfully.

You might also like