Code Repository for: AIRTBench: Measuring Autonomous AI Red Teaming Capabilities in Language Models
-
Updated
Dec 14, 2025 - Jupyter Notebook
Code Repository for: AIRTBench: Measuring Autonomous AI Red Teaming Capabilities in Language Models
[OpenPAR] An open-source framework for Pedestrian Attribute Recognition, based on PyTorch
Envision: Benchmarking Unified Understanding & Generation for Causal World Process Insights
Dataset management and caching for AI research benchmarks
A framework for benchmarking clustering algorithms – Benchmark results (for version 1 of the Suite)
[BlackboxNLP Workshop @ EMNLP, 2025] CE-Bench: A Contrastive Evaluation Benchmark of LLM Interpretability with Sparse Autoencoders
Estonian Grammatical Error Correction (GEC) test and development corpus that contains L2 learner texts error-annotated in the M2 format.
My Master's Thesis Project at IIT Kharagpur, (May'24 - June'25), [Place: IPCV Lab, E&ECE, IIT Kharagpur]
CDeC-Net: Composite Deformable Cascade Network for Table Detection in Document Images
[NAACL'25] Contains code and documentation for our VANE-Bench paper.
A Python toolkit for setting up benchmarking dataset using biomedical networks
benchmark dataset and Deep learning method (Hierarchical Interaction Network, HINT) for clinical trial approval probability prediction, published in Cell Patterns 2022.
Crowdsourced Geospatial Question-Answering dataset containing triples of question-queries-answers.
Test your local LLMs on the AIME problems
A framework for benchmarking clustering algorithms – Benchmark suite, version 1
Datasets to protect Earth's forests and biodiversity
Pathogen Identifier and Strain Tagger datasets
This is the official repository for evaluation on the NoW Benchmark Dataset. The goal of the NoW benchmark is to introduce a standard evaluation metric to measure the accuracy and robustness of 3D face reconstruction methods from a single image under variations in viewing angle, lighting, and common occlusions.
Benchmark Datasets for Time Series Forecasting Preprocessing - NASA HTTP Dataset, WorldCup98 Dataset
Official repository of IDEA-Bench
Add a description, image, and links to the benchmark-datasets topic page so that developers can more easily learn about it.
To associate your repository with the benchmark-datasets topic, visit your repo's landing page and select "manage topics."