Lang-comp is a benchmark suite for comparing the performance of AI coding agent effectivenes working in different programming languages on a set of common programming tasks. The benchmark includes implementations of various algorithms and data structures, as well as a set of test cases to evaluate the performance of each implementation.
Main goal of the bench is to see how language affordances (e.g. static vs. dynamic typing) and availability in the training data affect token cost .
All source code used in this benchmark was adapted from the exercism project.
- These benchmarks use the Github copilot harness if not explicitly stated differently. Fresh context (system prompt, tool definitions etc): 15.200 tokens