-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Insights: EleutherAI/lm-evaluation-harness
Overview
Could not load contribution data
Please try again later
6 Pull requests merged by 4 people
-
fix vllm
#2708 merged
Feb 17, 2025 -
arithmetic
: set target delimiter to empty string#2701 merged
Feb 14, 2025 -
fix
construct_requests
kwargs in python tasks#2700 merged
Feb 14, 2025 -
Update README.md
#2694 merged
Feb 14, 2025 -
Update remaining references to
assistant_prefill
in docs togen_prefix
#2683 merged
Feb 14, 2025 -
Set defaults for BLiMP scores
#2692 merged
Feb 13, 2025
6 Pull requests opened by 6 people
-
add o3-mini support
#2697 opened
Feb 14, 2025 -
Add Task (Financial mmlu ko)
#2699 opened
Feb 14, 2025 -
Support SGLang as Potential Backend for Evaluation
#2703 opened
Feb 15, 2025 -
feat: Add mmlu-redux and it's spanish transaltion as generative task definitions
#2705 opened
Feb 16, 2025 -
Add AIBE task and utilities
#2712 opened
Feb 18, 2025 -
New healthcare benchmark: careqa
#2714 opened
Feb 19, 2025
3 Issues closed by 2 people
-
vllm 0.7 support (got an unexpected keyword argument 'worker_use_ray')
#2706 closed
Feb 17, 2025 -
Bug: vLLM v0.7.1 throws an error when running lm_eval with TP > 1
#2704 closed
Feb 16, 2025 -
Fix: Inconsistent spacing in arithmetic tasks affects few-shot performance
#2695 closed
Feb 14, 2025
8 Issues opened by 8 people
-
How to preprocess a document with the assistance of a tokenizer from a specific Model
#2717 opened
Feb 20, 2025 -
Different models on same tasks gives same results when cache is active
#2715 opened
Feb 19, 2025 -
Importing a local module in a task included with include_path
#2713 opened
Feb 19, 2025 -
[Accuracy gap with official model card due to wrong parsing]
#2707 opened
Feb 17, 2025 -
Inconsistent Behavior with max_tokens, Post-Processing, and Cache Options
#2702 opened
Feb 15, 2025 -
vLLM CUDA OOM for `loglikelihood`, but not for `generate_until`
#2698 opened
Feb 14, 2025 -
Feature request: allow peft revision separate from base model revision
#2696 opened
Feb 13, 2025 -
Support Arabic Dataset
#2693 opened
Feb 13, 2025
11 Unresolved conversations
Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.
-
Add o3-mini support
#2685 commented on
Feb 13, 2025 • 0 new comments -
lm_eval on squadv2 and meta-llama/Meta-Llama-3.1-8B fails with TypeError: Instance.__init__() got an unexpected keyword argument 'apply_chat_template'
#2537 commented on
Feb 14, 2025 • 0 new comments -
[Feature Request] pre-built Docker image support
#1746 commented on
Feb 14, 2025 • 0 new comments -
Strip the input for the three tasks: FDA, SWDE, and SQuAD_completion.
#2690 commented on
Feb 14, 2025 • 0 new comments -
Issue running Squadv2 in LM-Evaluation-Harness
#2664 commented on
Feb 14, 2025 • 0 new comments -
long-time stuck after running loglikelihood requests
#2532 commented on
Feb 15, 2025 • 0 new comments -
How to evaluate local model with local-completions?
#2402 commented on
Feb 18, 2025 • 0 new comments -
Can't find the dataset lighteval/MATH-Hard in the huggingface
#2618 commented on
Feb 19, 2025 • 0 new comments -
Logging
#2203 commented on
Feb 19, 2025 • 0 new comments -
add math_verify to some tasks
#2686 commented on
Feb 18, 2025 • 0 new comments -
add audio modality (qwen2 audio only)
#2689 commented on
Feb 19, 2025 • 0 new comments