A MNIST-like fashion product database. Benchmark 👇
-
Updated
Jun 13, 2022 - Python
A MNIST-like fashion product database. Benchmark 👇
OpenMMLab Pose Estimation Toolbox and Benchmark.
OpenMMLab's Next Generation Video Understanding Toolbox and Benchmark
Benchmarks of approximate nearest neighbor libraries in Python
Python package for the evaluation of odometry and SLAM
SWE-bench: Can Language Models Resolve Real-world Github Issues?
OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.
Reference implementations of MLPerf® inference benchmarks
Reference implementations of MLPerf® training benchmarks
A machine learning toolkit for log parsing [ICSE'19, DSN'16]
中文语言理解测评基准 Chinese Language Understanding Evaluation Benchmark: datasets, baselines, pre-trained models, corpus and leaderboard
MTEB: Massive Text Embedding Benchmark
Computational framework for reinforcement learning in traffic control
[NeurIPS 2024] OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments
Python suite to construct benchmark machine learning datasets from the MIMIC-III 💊 clinical database.
Benchmark for vector databases.
A series of large language models developed by Baichuan Intelligent Technology
Modular Deep Reinforcement Learning framework in PyTorch. Companion library of the book "Foundations of Deep Reinforcement Learning".
⚡FlashRAG: A Python Toolkit for Efficient RAG Research (WWW2025 Resource)
Add a description, image, and links to the benchmark topic page so that developers can more easily learn about it.
To associate your repository with the benchmark topic, visit your repo's landing page and select "manage topics."