EvalPlus v0.1.5
🚀 HumanEval+[mini] -- 47x smaller while equivalently effective as HumanEval+
- Add
--minitoevalplus.evaluate ...you can use a minimal and best-quality set of extra tests to accelerate evaluation! HumanEval+[mini](avg 16.5 tests) is smaller thanHumanEval+(avg 774.8 tests) by 47x.- This is achieved via test-suite reduction -- we run a set covering algorithm to preserve the same coverage (coverage analysis), mutant-killings (mutation analysis) and sample-killings (pass-fail status of each sample-test pair).
PyPI: https://pypi.org/project/evalplus/0.1.5/
Docker Hub: https://hub.docker.com/layers/ganler/evalplus/v0.1.5/images/sha256-01ef3275ab02776e94edd4a436a3cd33babfaaf7a81e7ae44f895c2794f4c104