Dice-Leaderboard

Code for evaluating LLMs in the Diké (Dice) leaderboard:
https://huggingface.co/spaces/iproskurina/dike-leaderboard

This repository provides tools to evaluate your model and format the results for submission. There are two types of models supported:

Instruct models
Non-instruct (base) models

Installation

git clone https://github.com/upunaprosk/dice-leaderboard.git
cd dice-leaderboard
pip install -r requirements.txt
pip install datasets colorama logbar

Additional dependencies depending on model type:

GPTQ models
```
pip install gptqmodel
```
Models compressed with vLLM or stored in vLLM format
```
pip install vllm llm-compressor
```
Models quantized with bitsandbytes
```
pip install bitsandbytes
```

Evaluating Non-Instruct Models

git clone https://github.com/upunaprosk/lm-evaluation-harness.git
cd lm-evaluation-harness
pip install -e .
cd ..

chmod +x run_eval.sh

# Example GPTQ model evaluation
bash run_eval.sh iproskurina/opt-350m-int4-tb --gptq

python base_model_evaluation.py    --model iproskurina/Llama-3.1-8B-gptqmodel-4bit    --is_gptqmodel    --use_fast_tokenizer    --trust_remote_code

Evaluating Instruct Models

git clone https://github.com/upunaprosk/lm-evaluation-harness.git
cd lm-evaluation-harness
pip install -e .
cd ..

chmod +x run_eval_instruct.sh

# Example GPTQ instruct model evaluation
bash run_eval_instruct.sh model --gptq

python instruct_model_evaluation.py    --model model_name    --is_gptqmodel    --use_fast_tokenizer    --trust_remote_code

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
batch_jobs		batch_jobs
bias_bench		bias_bench
data		data
experiments		experiments
export		export
.gitignore		.gitignore
EVALUATION.md		EVALUATION.md
README.md		README.md
base_model_evaluation.py		base_model_evaluation.py
evaluate_perplexity.py		evaluate_perplexity.py
format_base.py		format_base.py
format_instruct.py		format_instruct.py
holistic_bias.py		holistic_bias.py
instruct_model_evaluation.py		instruct_model_evaluation.py
lm_eval_mcrows_utils.py		lm_eval_mcrows_utils.py
moral_stories_declarative_prompt.py		moral_stories_declarative_prompt.py
ppl_eval.py		ppl_eval.py
run_eval.sh		run_eval.sh
run_eval_instruct.sh		run_eval_instruct.sh
setup.py		setup.py
sofa.py		sofa.py
unqover.py		unqover.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Dice-Leaderboard

Installation

Evaluating Non-Instruct Models

Evaluating Instruct Models

About

Uh oh!

Releases

Packages

Languages

upunaprosk/dice-leaderboard

Folders and files

Latest commit

History

Repository files navigation

Dice-Leaderboard

Installation

Evaluating Non-Instruct Models

Evaluating Instruct Models

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages