Skip to content

SWE-Spot/swespot

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🔎 SWE-Spot

Evaluation

Environment Setup:

uv sync

In general, the evaluation can be done by:

  • Obtain the benchmark datasets for the four tasks.
  • Change the LLM API information in mini-swe-agent config files. You can either query an existing endpoint or host one yourself.
  • For any benchmark dataset, use mini-swe-agent to finish the instances, i.e., generating the trajectories.
  • Run the corresponding evaluation harness to score the answers parsed from the trajectories.

Benchmark datasets filtered with knowledge-cutoff protocol (after 2020-12-31):

❯ ls eval/data
fea  qa  sbv  tdd

mini-swe-agent config files for each task:

❯ ls eval | grep yaml
fea_host.yaml
qa_host.yaml
sbv_host.yaml
tdd_host.yaml

You need to change the LLM API information in these config files.

Evaluation scripts for the four tasks:

❯ ls eval | grep sh
sbv.sh # SWE-Bench-Verified
tdd.sh # TDD-Bench-Verified
fea.sh # FEA-Bench
qa.sh # SWE-QA

Take a look at each to know how to specify the arguments with environment variables, like:

VERSION=0 WORKERS=6 MS=qwen34i CONFIG=eval/sbv_host.yaml REPO=django HASH=e13b714 eval/sbv.sh

SFT

ms-swift is leveraged to perform SFT in our experiments. But of course, you can use other libraries to do SFT.

To use ms-swift, it is recommended to:

An example script for training the Django expert model: train/mix_django.sh

  • The 4-unit RCX dataset is used for training. For each unit, we sample 2k instances, so the total training dataset is 8k instances. See the argument --dataset in the script.
  • The dataset is available at https://huggingface.co/datasets/swespot/sft-v0 . Clone it somewhere, and set the environment variable DATA_DIR to the path of the cloned dataset in the script: export DATA_DIR=/path/to/swespot-sft-v0-hf-repo .
  • Similarly, you can train expert models for other repositories.

Example trained models for the seven selected repositories in the paper can be found at Hugging Face, such as https://huggingface.co/swespot/django-sft-v0 .

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors