Cornell NLVR and NLVR2 are natural language grounding datasets. Each example shows a visual input and a sentence describing it, and is annotated with the truth-value of the sentence.
-
Updated
Aug 18, 2022 - HTML
Cornell NLVR and NLVR2 are natural language grounding datasets. Each example shows a visual input and a sentence describing it, and is annotated with the truth-value of the sentence.
中文词典 / 中文詞典。Chinese / Chinese-English dictionaries.
近代汉语语料库数据集 自然语言处理 语料库 古代汉语 古汉语 文言文 数字人文 计算语言
The University of Pittsburgh English Language Institute Corpus (PELIC) dataset
Collection de romans français du dix-huitième siècle (1751-1800) / Collection of Eighteenth-Century French Novels (1751-1800)
data, metadata, tools, and LDA experiments on a corpus of Sanskrit philosophy texts
This repository contains python code to create a corpus of 12,215 terms of service documents scraped from TOSDR, intended for legal, privacy, and natural language processing research.
HUMOR dataset for humor research
Article title, authors, date and body extraction dataset.
A Corpus of the Kurdish Folkloric Lyrics
A Text / Speech Summarizer
Un corpus de chansons de geste
Materiales para el curso de verano, «Del corpus a la interpretación: Estilometría con R», Burgos, 2021
Arabic Stories Corpus
Toxic Comment Classification Project constructed by Qimo Li, Chen He and Kun Qiu for the course "Introduction to Natural Language Processing in Python" at Brandeis University.
a garden of file formats from a collection of sources for use as inputs for fuzzing engines.
Add a description, image, and links to the corpus topic page so that developers can more easily learn about it.
To associate your repository with the corpus topic, visit your repo's landing page and select "manage topics."