Skip to content

Marker-Inc-Korea/KoLLM_Eval

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

KoLLM_Eval🥰

한국어 벤치마크 평가 코드 통합본(?) 2024.10.15 Version

Logickor, K2-Eval, and LM-Harness 평가를 하나의 코드에서 실행

🍚Gukbap-Series LLM🍚

Install (required)🤩

First, download LM-Eval-Harness.

git clone https://github.com/EleutherAI/lm-evaluation-harness
cd ./lm-evaluation-harness
pip install -e .
pip install -e ".[multilingual]"

pip install vllm

Lastly, you need to move below files.

questions.jsonl # logickor
data_k2-eval-generation.csv # k2_eval

korean_eval.sh
MTBench 
├──logickor.py
└──k2_eval.py

Implementation🤩

sh korean_eval.sh

You must set api key through OpenAI.

You can test on a A100 GPU (using COLAB).

Examples🤩

Model Logickor(0-shot) K^2-Eval Haerae(Acc) CSAT-QA(Acc) kmmlu(Acc)
Human-MarkrAI/Gukbap-Gemma2-9B 8.77 4.50 62.60 43.85 46.46
google/gemma-2-9b-it 8.32 4.38 64.34 47.06 42.51
rtzr/ko-geamm-2-9b-it 8.67 4.40 64.07 48.13 44.75
LGAI/EXAONE-3.0-7.8B-Instruct 8.64 4.43 77.09 34.76 35.23
yanolja/EEVE-Korean-Instruct-10.8B-v1.0 6.03 3.51 70.94 38.50 41.99

Evaluator: GPT-4-1106-preview Logickor [0,10] & K^2-Eval [0,5]

References🌠

Logickor
LM-Harness
K2-eval

About

한국어 벤치마크 평가 코드 통합본(?)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published