🌍 AppWorld: A Controllable World of Apps and People for Benchmarking Function Calling and Interactive Coding Agent, ACL'24 Best Resource Paper.
-
Updated
Feb 17, 2026 - Python
🌍 AppWorld: A Controllable World of Apps and People for Benchmarking Function Calling and Interactive Coding Agent, ACL'24 Best Resource Paper.
Chinese, English NER, English-Chinese machine translation dataset. 中英文实体识别数据集,中英文机器翻译数据集, 中文分词数据集
multi_task_NLP is a utility toolkit enabling NLP developers to easily train and infer a single model for multiple tasks.
Implementation of Very Deep Convolutional Neural Network for Text Classification
TriggerNER: Learning with Entity Triggers as Explanations for Named Entity Recognition (ACL 2020)
A Constrained Text Generation Challenge Towards Generative Commonsense Reasoning
Yorùbá language training text for NLP, ASR and TTS tasks
Extracts Transcript and Summary (Abstractive and Extractive) from the AMI Meeting Corpus
A collection of datasets for Ukrainian language
Compilation of Turkish writings dataset that promotes creativity, content, composition, grammar, spelling and punctuation.
WikiWhy is a new benchmark for evaluating LLMs' ability to explain between cause-effect relationships. It is a QA dataset containing 9000+ "why" question-answer-rationale triplets.
Comprehensive evaluation framework for Open Information Extraction.
Question Answering System using BiDAF Model on SQuAD v2.0
Library for generation of russian names
Code Repo for the ACL21 paper "Common Sense Beyond English: Evaluating and Improving Multilingual LMs for Commonsense Reasoning"
English loanwords in Japanese
Implementation of the semi-structured inference model in our ACL 2020 paper, INFOTABS: Inference on Tables as Semi-structured Data.
A Python library designed for scraping data from the SCP wiki.
Loads OpenSubtitles v2018 dataset without having to load everything into memory at once. Works well with pytorch.
Add a description, image, and links to the nlp-datasets topic page so that developers can more easily learn about it.
To associate your repository with the nlp-datasets topic, visit your repo's landing page and select "manage topics."