nlp-datasets

Here are 58 public repositories matching this topic...

StonyBrookNLP / appworld

🌍 AppWorld: A Controllable World of Apps and People for Benchmarking Function Calling and Interactive Coding Agent, ACL'24 Best Resource Paper.

ai-planning interactive-coding nlp-machine-learning autonomous-agents ai-assistants ai-agents nlp-datasets llm function-calling llm-agents ai-environment tool-usage coding-agents acl-2024 ai-apis

Updated Feb 17, 2026
Python

quincyliang / nlp-public-dataset

Star

Chinese, English NER, English-Chinese machine translation dataset. 中英文实体识别数据集，中英文机器翻译数据集, 中文分词数据集

machine-learning-dataset nlp-datasets

Updated Feb 3, 2021
Python

hellohaptik / multi-task-NLP

Star

multi_task_NLP is a utility toolkit enabling NLP developers to easily train and infer a single model for multiple tasks.

nlp transformers pytorch named-entity-recognition ranking sentence-classification nlp-apis nlp-library sequence-labeling machine-comprehension context-awareness entailment intent-classification nlp-datasets multitask-learning nli-tasks

Updated Nov 21, 2022
Python

cjiang2 / VDCNN

Star

Implementation of Very Deep Convolutional Neural Network for Text Classification

nlp text-classification tensorflow keras convolutional-neural-networks keras-tensorflow vdcnn nlp-datasets

Updated Jun 28, 2022
Python

INK-USC / TriggerNER

Star

TriggerNER: Learning with Entity Triggers as Explanations for Named Entity Recognition (ACL 2020)

information-extraction dataset named-entity-recognition nlp-resources nlp-datasets low-resource sequence-tagging

Updated Jun 15, 2022
Python

INK-USC / CommonGen

Star

A Constrained Text Generation Challenge Towards Generative Commonsense Reasoning

natural-language-processing deep-learning text-generation natural-language-generation nlg-dataset commonsense-reasoning nlp-datasets machine-reasoning language-generation-dataset

Updated Jan 5, 2024
Python

Niger-Volta-LTI / yoruba-text

Star

Yorùbá language training text for NLP, ASR and TTS tasks

nlp natural-language-processing machine-translation tts african-languages yoruba asr diacritization training-dataset nlp-datasets

Updated Mar 3, 2023
Python

gcunhase / AMICorpusXML

Star

Extracts Transcript and Summary (Abstractive and Extractive) from the AMI Meeting Corpus

nlp-datasets meeting-dataset xml-to-story convert-to-cnn-dm-format

Updated Dec 4, 2019
Python

fido-ai / ua-datasets

Star

A collection of datasets for Ukrainian language

nlp natural-language-processing text-classification dataset question-answering nlp-datasets ukrainian-language token-classification

Updated Oct 26, 2025
Python

selimfirat / bilkent-turkish-writings-dataset

Star

Compilation of Turkish writings dataset that promotes creativity, content, composition, grammar, spelling and punctuation.

nlp turkish dataset pdf-conversion bilkent-university turkish-language creative-writing nlp-datasets

Updated May 26, 2025
Python

matt-seb-ho / WikiWhy

Star

WikiWhy is a new benchmark for evaluating LLMs' ability to explain between cause-effect relationships. It is a QA dataset containing 9000+ "why" question-answer-rationale triplets.

nlp machine-learning artificial-intelligence dataset question-answering nlp-datasets explainable-ai open-domain-qa iclr2023

Updated Dec 7, 2023
Python

gkiril / benchie

Star

Comprehensive evaluation framework for Open Information Extraction.

nlp natural-language-processing information-extraction dataset benchmark-framework natural-language-understanding open-information-extraction nlp-datasets

Updated Jun 21, 2022
Python

ElizaLo / Question-Answering-based-on-SQuAD

Star

Question Answering System using BiDAF Model on SQuAD v2.0

python nlp machine-learning natural-language-processing neural-network python-3-6 question-answering squad nlp-machine-learning bidaf natural-language-understanding nlp-datasets

Updated Sep 2, 2020
Python

cybermatt / russian-names

Star

Library for generation of russian names

text-generation text-processing nlp-datasets

Updated Apr 23, 2019
Python

INK-USC / XCSR

Star

Code Repo for the ACL21 paper "Common Sense Beyond English: Evaluating and Improving Multilingual LMs for Commonsense Reasoning"

natural-language-understanding commonsense-reasoning nlp-datasets multilingual-models crosslingual-transfer

Updated Oct 26, 2021
Python

JasonShao55 / Chinese_Metaphor_Explanation

Star

An annotated Chinese metaphor dataset

nlp chinese metaphor nlp-datasets

Updated Feb 23, 2024
Python

jamesohortle / loanwords_gairaigo

Star

English loanwords in Japanese

nlp japanese linguistics english phonetics nlp-datasets linguistics-databases

Updated Oct 24, 2024
Python

utahnlp / infotabs-code

Star

Implementation of the semi-structured inference model in our ACL 2020 paper, INFOTABS: Inference on Tables as Semi-structured Data.

nlp wikipedia svm inference transformer nlp-machine-learning tables semi-structured-data nli nlp-datasets roberta acl2020 infotabs

Updated Dec 7, 2021
Python

JadynHax / scpscraper

Star

A Python library designed for scraping data from the SCP wiki.

python pypi webscraper python3 scp dataset-creation data-collection dataset-generation scp-foundation webscraping nlp-datasets pypi-package training-data-generation nlp-dataset-creation

Updated Oct 20, 2020
Python

MiniXC / opensubtitles-dataloader

Star

Loads OpenSubtitles v2018 dataset without having to load everything into memory at once. Works well with pytorch.

python nlp pytorch dataset dataloader nlp-datasets

Updated Aug 26, 2020
Python

Improve this page

Add a description, image, and links to the nlp-datasets topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the nlp-datasets topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

nlp-datasets

Here are 58 public repositories matching this topic...

StonyBrookNLP / appworld

quincyliang / nlp-public-dataset

hellohaptik / multi-task-NLP

cjiang2 / VDCNN

INK-USC / TriggerNER

INK-USC / CommonGen

Niger-Volta-LTI / yoruba-text

gcunhase / AMICorpusXML

fido-ai / ua-datasets

selimfirat / bilkent-turkish-writings-dataset

matt-seb-ho / WikiWhy

gkiril / benchie

ElizaLo / Question-Answering-based-on-SQuAD

cybermatt / russian-names

INK-USC / XCSR

JasonShao55 / Chinese_Metaphor_Explanation

jamesohortle / loanwords_gairaigo

utahnlp / infotabs-code

JadynHax / scpscraper

MiniXC / opensubtitles-dataloader

Improve this page

Add this topic to your repo