Skip to content

bezir/MMLU-pro-TR

 
 

Repository files navigation

MMLU-Pro-TR

|🤗 Dataset | [🏆Leaderboard(SOON!)] | 📖 Original Dataset Paper |

This repository contains the evaluation code for the Turkish-translated version of the paper "MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark"

Introduction

We introduce MMLU-Pro-TR, a benchmark that is the translated version of MMLU-Pro, an enhanced benchmark designed to evaluate language understanding models across broader and more challenging tasks. Building on the Massive Multitask Language Understanding (MMLU) dataset, MMLU-Pro integrates more challenging, reasoning-focused questions and increases the answer choices per question from four to ten, significantly raising the difficulty and reducing the chance of success through random guessing. MMLU-Pro comprises over 12,000 rigorously curated questions from academic exams and textbooks, spanning 14 diverse domains including Biology, Business, Chemistry, Computer Science, Economics, Engineering, Health, History, Law, Math, Philosophy, Physics, Psychology, and Others.

Dataset Creation

Please refer to huggingface 🤗 Dataset and 🤗 Dataset-TR for more details.

Evaluation

To run local inference, modify the model name in the following script and execute it:

sh run.sh

🏆 Mini-Leaderboard

Models Original MMLU Score MMLU Pro Score Drop
Metin/LLaMA-3-8B-Instruct-TR-DPO 49.71 27.00 22.71
ytu-ce-cosmos/Turkish-Llama-8b-Instruct-v0.1 51.75 23.90 27.85
VeriUS/VeriUS-LLM-8b-v0.2 48.81 23.23 25.58
Orbina/Orbita-v0.1 49.51 22.95 26.56
KOCDIGITAL/Kocdigital-LLM-8b-v0.1 47.35 21.83 25.52
meta-llama/Meta-Llama-3-8B-Instruct 49.29 20.93 28.36
NousResearch/Meta-Llama-3-8B 49.29 20.93 28.36
curiositytech/MARS 46.73 20.81 25.92
Trendyol/Trendyol-LLM-7b-chat-v1.8 41.91 18.15 23.76
TURKCELL/Turkcell-LLM-7b-v1 39.03 17.15 21.88
ytu-ce-cosmos/turkish-gpt2-large-750m-instruct-v0.1 26.56 10.88 15.67

For more details on various models and their accuracy across different subjects, please visit our [Leaderboard(SOON!)].

Citation

BibTeX:

@misc{wang2024mmlupro,
      title={MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark}, 
      author={Yubo Wang and Xueguang Ma and Ge Zhang and Yuansheng Ni and Abhranil Chandra and Shiguang Guo and Weiming Ren and Aaran Arulraj and Xuan He and Ziyan Jiang and Tianle Li and Max Ku and Kai Wang and Alex Zhuang and Rongqi Fan and Xiang Yue and Wenhu Chen},
      year={2024},
      eprint={2406.01574},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

About

MMLU-Pro Turkish Evaluation Benchmark

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Python 96.6%
  • Shell 3.4%