Skip to content

itsmostafa/gskill

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

25 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

gskill

License: MIT Claude Last Commit PRs Welcome

Automatically learns repository-specific skills for coding agents using evolutionary search.

Given a GitHub repository, gskill produces a .claude/skills/{repo}/SKILL.md file containing optimized instructions that dramatically improve an agent's resolve rate on that repo's issues. It implements the pipeline described in the GEPA blog post, which demonstrated improvements from 24% → 93% resolve rate on some repositories.

How it works

  1. Loads verifiable software engineering tasks from SWE-smith for the target repository
  2. Generates an initial skill via static analysis of the repo (README, config files) + gpt 5.2.
  3. Uses GEPA's optimize_anything to iteratively refine the skill through evolutionary search
  4. Each candidate skill is evaluated by running mini-SWE-agent on training tasks inside Docker and checking whether the FAIL_TO_PASS tests pass
  5. Writes the best-scoring skill to disk

Requirements

  • Python 3.13+
  • uv
  • Docker (for running SWE-smith task environments)
  • OPENAI_API_KEY set in your environment (for initial skill generation and GEPA reflection)
  • GSKILL_AGENT_MODEL (optional) — LiteLLM model string for mini-SWE-agent (default: openai/gpt-5.2)

Installation

git clone https://github.com/your-org/gskill
cd gskill
uv sync

Usage

Run the full pipeline

uv run python main.py run https://github.com/pallets/jinja

This will:

  • Load SWE-smith tasks for pallets/jinja
  • Generate an initial skill
  • Run up to 150 mini evaluations to optimize the skill
  • Write the result to .claude/skills/jinja/SKILL.md

run only works for repositories that have task instances in SWE-bench/SWE-smith. If a GitHub repository exists but is not covered by that dataset, gskill will fail with an unsupported-repo message.

Common options

# Custom evaluation budget (more evals = better skill, slower run)
uv run python main.py run https://github.com/pallets/jinja --max-evals 300

# Custom output directory
uv run python main.py run https://github.com/pallets/jinja --output-dir ~/skills

# Skip static analysis, start from an empty seed
uv run python main.py run https://github.com/pallets/jinja --no-initial-skill

# Use a different model for the coding agent
uv run python main.py run https://github.com/pallets/jinja --agent-model openai/gpt-5-mini

# Use a local model (e.g. qwen2.5-coder running on localhost:11434)
OPENAI_BASE_URL=http://localhost:11434/v1 \
  uv run python main.py run https://github.com/pallets/jinja --agent-model openai/gpt-oss-120b

You can also set the agent model via the GSKILL_AGENT_MODEL environment variable instead of passing --agent-model every time.

Preview available tasks

# Show the first 10 SWE-smith tasks for a repo
uv run python main.py tasks pallets/jinja

# Show more
uv run python main.py tasks pallets/jinja --limit 25

Discover supported repositories

# List the first 50 supported repos
uv run python main.py repos

# Filter supported repos by substring
uv run python main.py repos --filter fast

Help

uv run python main.py --help
uv run python main.py run --help
uv run python main.py tasks --help

Output

The optimized skill is written to:

.claude/skills/{repo}/SKILL.md

To use it with Claude Code, add the skill path to your project's .claude/settings.json or reference it from your CLAUDE.md.

Task runner

A Taskfile.yml provides shortcuts for common operations (requires Task):

task sync                # uv sync
task lint                # ruff check
task format              # ruff format
task test                # pytest
task run -- owner/repo   # gskill run (pass args via CLI_ARGS)
task tasks               # gskill tasks (pass args via CLI_ARGS)

Project structure

gskill/
├── main.py              # CLI entry point (typer)
├── src/
│   ├── pipeline.py      # Top-level orchestration
│   ├── tasks.py         # SWE-smith dataset loading & splitting
│   ├── evaluator.py     # mini runner + pass/fail evaluation
│   └── skill.py         # Initial skill generation (gpt-5.2) + file I/O
├── Taskfile.yml         # Task runner shortcuts
└── pyproject.toml

About

No description or website provided.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages