Pulse · EleutherAI/lm-evaluation-harness · GitHub

February 13, 2025 – February 20, 2025

Overview

12 Active pull requests

11 Active issues

6 Pull requests merged by 4 people

fix vllm
#2708 merged Feb 17, 2025
arithmetic: set target delimiter to empty string
#2701 merged Feb 14, 2025
fix construct_requests kwargs in python tasks
#2700 merged Feb 14, 2025
Update README.md
#2694 merged Feb 14, 2025
Update remaining references to assistant_prefill in docs to gen_prefix
#2683 merged Feb 14, 2025
Set defaults for BLiMP scores
#2692 merged Feb 13, 2025

6 Pull requests opened by 6 people

add o3-mini support
#2697 opened Feb 14, 2025
Add Task (Financial mmlu ko)
#2699 opened Feb 14, 2025
Support SGLang as Potential Backend for Evaluation
#2703 opened Feb 15, 2025
feat: Add mmlu-redux and it's spanish transaltion as generative task definitions
#2705 opened Feb 16, 2025
Add AIBE task and utilities
#2712 opened Feb 18, 2025
New healthcare benchmark: careqa
#2714 opened Feb 19, 2025

3 Issues closed by 2 people

vllm 0.7 support (got an unexpected keyword argument 'worker_use_ray')
#2706 closed Feb 17, 2025
Bug: vLLM v0.7.1 throws an error when running lm_eval with TP > 1
#2704 closed Feb 16, 2025
Fix: Inconsistent spacing in arithmetic tasks affects few-shot performance
#2695 closed Feb 14, 2025

8 Issues opened by 8 people

How to preprocess a document with the assistance of a tokenizer from a specific Model
#2717 opened Feb 20, 2025
Different models on same tasks gives same results when cache is active
#2715 opened Feb 19, 2025
Importing a local module in a task included with include_path
#2713 opened Feb 19, 2025
[Accuracy gap with official model card due to wrong parsing]
#2707 opened Feb 17, 2025
Inconsistent Behavior with max_tokens, Post-Processing, and Cache Options
#2702 opened Feb 15, 2025
vLLM CUDA OOM for `loglikelihood`, but not for `generate_until`
#2698 opened Feb 14, 2025
Feature request: allow peft revision separate from base model revision
#2696 opened Feb 13, 2025
Support Arabic Dataset
#2693 opened Feb 13, 2025

11 Unresolved conversations

Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.

Add o3-mini support
#2685 commented on Feb 13, 2025 • 0 new comments
lm_eval on squadv2 and meta-llama/Meta-Llama-3.1-8B fails with TypeError: Instance.__init__() got an unexpected keyword argument 'apply_chat_template'
#2537 commented on Feb 14, 2025 • 0 new comments
[Feature Request] pre-built Docker image support
#1746 commented on Feb 14, 2025 • 0 new comments
Strip the input for the three tasks: FDA, SWDE, and SQuAD_completion.
#2690 commented on Feb 14, 2025 • 0 new comments
Issue running Squadv2 in LM-Evaluation-Harness
#2664 commented on Feb 14, 2025 • 0 new comments
long-time stuck after running loglikelihood requests
#2532 commented on Feb 15, 2025 • 0 new comments
How to evaluate local model with local-completions?
#2402 commented on Feb 18, 2025 • 0 new comments
Can't find the dataset lighteval/MATH-Hard in the huggingface
#2618 commented on Feb 19, 2025 • 0 new comments
Logging
#2203 commented on Feb 19, 2025 • 0 new comments
add math_verify to some tasks
#2686 commented on Feb 18, 2025 • 0 new comments
add audio modality (qwen2 audio only)
#2689 commented on Feb 19, 2025 • 0 new comments