Skip to content

FlagEval Logo


FlagEval

FlagEval, launched by BAAI in 2023, is a comprehensive large model evaluation system that encompasses over 800 open-source and closed-source models from around the globe. It features more than 40 capability dimensions, including reasoning, mathematical skills, and task-solving abilities, along with five major tasks and four categories of metrics.


Recent Developments

In 2024, FlagEval expanded its offerings by launching the Colosseum and Debate Arena. These platforms are dedicated to model-to-model competition and battle, fostering a competitive environment for continuous improvement.


Visit FlagEval

Popular repositories Loading

  1. FlagEval FlagEval Public

    FlagEval is an evaluation toolkit for AI large foundation models.

    Python 339 31

  2. FlagEvalMM FlagEvalMM Public

    A Flexible Framework for Comprehensive Multimodal Model Evaluation

    Python 95 22

  3. ChildMandarin ChildMandarin Public

    A Comprehensive Mandarin Speech Dataset for Young Children Aged 3-5

    Shell 44 1

  4. CMMU CMMU Public

    [IJCAI 2024] CMMU: A Benchmark for Chinese Multi-modal Multi-type Question Understanding and Reasoning

    Python 24

  5. SeniorTalk SeniorTalk Public

    A Chinese Conversation Dataset with Rich Annotations for Super-Aged Seniors

    Python 23 4

  6. HalluDial HalluDial Public

    Python 21 1

Repositories

Showing 10 of 23 repositories

People

This organization has no public members. You must be a member to see who’s a part of this organization.

Top languages

Loading…

Most used topics

Loading…