default search action
11th LREC 2018: Miyazaki, Japan
- Nicoletta Calzolari, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Kôiti Hasida, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asunción Moreno, Jan Odijk, Stelios Piperidis, Takenobu Tokunaga:
Proceedings of the Eleventh International Conference on Language Resources and Evaluation, LREC 2018, Miyazaki, Japan, May 7-12, 2018. European Language Resources Association (ELRA) 2018 - Christos Christodoulopoulos, Arpit Mittal:
Simple Large-scale Relation Extraction from Unstructured Text. - Eva Hajicová, Jirí Mírovský:
Discourse Coherence Through the Lens of an Annotated Text Corpus: A Case Study. - Fan Xu, Mingwen Wang, Maoxi Li:
Building Parallel Monolingual Gan Chinese Dialects Corpus. - Marjoleine Sloos, Eduard Drenth, Wilbert Heeringa:
The Boarnsterhim Corpus: A Bilingual Frisian-Dutch Panel and Trend Study. - Zdenka Uresová, Eva Fucíková, Eva Hajicová, Jan Hajic:
Creating a Verb Synonym Lexicon Based on a Parallel Corpus. - Amir Hazem, Béatrice Daille:
Word Embedding Approach for Synonym Extraction of Multi-Word Terms. - Kirk Roberts, Yuqi Si, Anshul Gandhi, Elmer V. Bernstam:
A FrameNet for Cancer Information in Clinical Narratives: Schema and Annotation. - Thomas Kisler, Florian Schiel:
MOCCA: Measure of Confidence for Corpus Analysis - Automatic Reliability Check of Transcript and Automatic Segmentation. - Shachar Mirkin, Michal Jacovi, Tamar Lavee, Hong-Kwang Kuo, Samuel Thomas, Leslie Sager, Lili Kotlerman, Elad Venezian, Noam Slonim:
A Recorded Debating Dataset. - Abhik Jana, Pawan Goyal:
Network Features Based Co-hyponymy Detection. - Marcus Klang, Pierre Nugues:
Linking, Searching, and Visualizing Entities in Wikipedia. - Chinho Lin, Hen-Hsen Huang, Hsin-Hsi Chen:
Learning to Map Natural Language Statements into Knowledge Base Representations for Knowledge Base Construction. - Kenji Imamura, Eiichiro Sumita:
Multilingual Parallel Corpus for Global Communication Plan. - Marc Schulder, Michael Wiegand, Josef Ruppenhofer, Stephanie Köser:
Introducing a Lexicon of Verbal Polarity Shifters for English. - Vivi Nastase, Devon Fritz, Anette Frank:
DeModify: A Dataset for Analyzing Contextual Constraints on Modifier Deletion. - Morgan Ulinski, Bob Coyne, Julia Hirschberg:
Evaluating the WordsEye Text-to-Scene System: Imaginative and Realistic Sentences. - Mathias Creutz:
Open Subtitles Paraphrase Corpus for Six Languages. - Eun-Kyung Kim, Key-Sun Choi:
Incorporating Global Contexts into Sentence Embedding for Relational Extraction at the Paragraph Level with Distant Supervision. - Bonan Min, Marjorie Freedman, Roger Bock, Ralph M. Weischedel:
When ACE met KBP: End-to-End Evaluation of Knowledge Base Population with Component-level Annotation. - Austin Baird, Anissa Hamza, Daniel Hardt:
Classifying Sluice Occurrences in Dialogue. - Sushant Kafle, Matt Huenerfauth:
A Corpus for Modeling Word Importance in Spoken Dialogue Transcripts. - Vivian Dos Santos Silva, André Freitas, Siegfried Handschuh:
Building a Knowledge Graph from Natural Language Definitions for Interpretable Text Entailment Recognition. - Kyoko Sugisaki, Nicolas Wiedmer, Heiko Hausendorf:
Building a Corpus from Handwritten Picture Postcards: Transcription, Annotation and Part-of-Speech Tagging. - Eric Malmi, Daniele Pighin, Sebastian Krause, Mikhail Kozhevnikov:
Automatic Prediction of Discourse Connectives. - Boyang Li, Beth Cardier, Tong Wang, Florian Metze:
Annotating High-Level Structures of Short Stories and Personal Anecdotes. - Sabyasachi Kamila, Asif Ekbal, Pushpak Bhattacharyya:
Sentence Level Temporality Detection using an Implicit Time-sensed Resource. - Alexander Panchenko, Eugen Ruppert, Stefano Faralli, Simone Paolo Ponzetto, Chris Biemann:
Building a Web-Scale Dependency-Parsed Corpus from CommonCrawl. - Simon Ostermann, Ashutosh Modi, Michael Roth, Stefan Thater, Manfred Pinkal:
MCScript: A Novel Dataset for Assessing Machine Comprehension Using Script Knowledge. - Albert Gatt, Marc Tanti, Adrian Muscat, Patrizia Paggio, Reuben A. Farrugia, Claudia Borg, Kenneth P. Camilleri, Mike Rosner, Lonneke van der Plas:
Face2Text: Collecting an Annotated Image Description Corpus for the Generation of Rich Face Descriptions. - Matthew Shardlow, Nhung T. H. Nguyen, Gareth I. Owen, Claire O'Donovan, Andrew Leach, John McNaught, Steve Turner, Sophia Ananiadou:
A New Corpus to Support Text Mining for the Curation of Metabolites in the ChEBI Database. - Mateusz Lango, Magda Sevcíková, Zdenek Zabokrtský:
Semi-Automatic Construction of Word-Formation Networks (for Polish and Spanish). - Natalie Parde, Rodney D. Nielsen:
A Corpus of Metaphor Novelty Scores for Syntactically-Related Word Pairs. - Tetsuaki Nakamura, Daisuke Kawahara:
JFCKB: Japanese Feature Change Knowledge Base. - Orizu Udochukwu, Yulan He:
Content-Based Conflict of Interest Detection on Wikipedia. - Farhad Nooralahzadeh, Lilja Øvrelid, Jan Tore Lønning:
Evaluation of Domain-specific Word Embeddings using Knowledge Resources. - Masahiro Araki, Sayaka Tomimasu, Mikio Nakano, Kazunori Komatani, Shogo Okada, Shinya Fujie, Hiroaki Sugiyama:
Collection of Multimodal Dialog Data and Analysis of the Result of Annotation of Users' Interest Level. - Oliver Hellwig, Heinrich Hettrich, Ashutosh Modi, Manfred Pinkal:
Multi-layer Annotation of the Rigveda. - Masayuki Asahara, Hiroshi Kanayama, Takaaki Tanaka, Yusuke Miyao, Sumire Uematsu, Shinsuke Mori, Yuji Matsumoto, Mai Omura, Yugo Murawaki:
Universal Dependencies Version 2 for Japanese. - Matteo Negri, Marco Turchi, Rajen Chatterjee, Nicola Bertoldi:
ESCAPE: a Large-scale Synthetic Corpus for Automatic Post-Editing. - Tobias Staron, Özge Alaçam, Wolfgang Menzel:
Incorporating Contextual Information for Language-Independent, Dynamic Disambiguation Tasks. - Chenggang Mi, Yating Yang, Lei Wang, Xi Zhou, Tonghai Jiang:
A Neural Network Based Model for Loanword Identification in Uyghur. - Pierre Lison, Jörg Tiedemann, Milen Kouylekov:
OpenSubtitles2018: Statistical Rescoring of Sentence Alignments in Large, Noisy Parallel Corpora. - Xuancheng Ren, Xu Sun, Ji Wen, Bingzhen Wei, Weidong Zhan, Zhiyuan Zhang:
Building an Ellipsis-aware Chinese Dependency Treebank for Web Text. - Pawel Kamocki, Valérie Mapelli, Khalid Choukri:
Data Management Plan (DMP) for Language Data under the New General Da-ta Protection Regulation (GDPR). - Olga Seminck, Pascal Amsili:
A Gold Anaphora Annotation Layer on an Eye Movement Corpus. - Michael Wojatzki, Saif M. Mohammad, Torsten Zesch, Svetlana Kiritchenko:
Quantifying Qualitative Data for Understanding Controversial Issues. - Saif M. Mohammad:
Word Affect Intensities. - Sudipta Kar, Suraj Maharjan, Adrián Pastor López-Monroy, Thamar Solorio:
MPST: A Corpus of Movie Plot Synopses with Tags. - Cyril Goutte, Yunli Wang, FangMing Liao, Zachary Zanussi, Samuel Larkin, Yuri Grinberg:
EuroGames16: Evaluating Change Detection in Online Conversation. - Richard Futrell, Edward Gibson, Harry J. Tily, Idan Blank, Anastasia Vishnevetsky, Steven T. Piantadosi, Evelina Fedorenko:
The Natural Stories Corpus. - Houda Bouamor, Nizar Habash, Mohammad Salameh, Wajdi Zaghouani, Owen Rambow, Dana Abdulrahim, Ossama Obeid, Salam Khalifa, Fadhl Eryani, Alexander Erdmann, Kemal Oflazer:
The MADAR Arabic Dialect Corpus and Lexicon. - YoungGyun Hahm, Jiseong Kim, Sunggoo Kwon, Key-Sun Choi:
Semi-automatic Korean FrameNet Annotation over KAIST Treebank. - Géraldine Damnati, Jérémy Auguste, Alexis Nasr, Delphine Charlet, Johannes Heinecke, Frédéric Béchet:
Handling Normalization Issues for Part-of-Speech Tagging of Online Conversational Text. - Felipe Soares, Viviane Pereira Moreira, Karin Becker:
A Large Parallel Corpus of Full-Text Scientific Articles. - Arbi Haza Nasution, Yohei Murakami, Toru Ishida:
Designing a Collaborative Process to Create Bilingual Dictionaries of Indonesian Ethnic Languages. - Yutong Shao, Rico Sennrich, Bonnie L. Webber, Federico Fancellu:
Evaluating Machine Translation Performance on Chinese Idioms with a Blacklist Method. - Debopam Das, Manfred Stede:
Developing the Bangla RST Discourse Treebank. - Pierre-Alexandre Broux, David Doukhan, Simon Petitrenaud, Sylvain Meignier, Jean Carrive:
Computer-assisted Speaker Diarization: How to Evaluate Human Corrections. - Silvana Hartmann, Monojit Choudhury, Kalika Bali:
An Integrated Representation of Linguistic and Social Functions of Code-Switching. - Mohammed Alsuhaibani, Danushka Bollegala:
Joint Learning of Sense and Word Embeddings. - António Branco:
We Are Depleting Our Research Subject as We Are Investigating It: In Language Technology, more Replication and Diversity Are Needed. - Sven Buechel, Udo Hahn:
Representation Mapping: A Novel Approach to Generate High-Quality Multi-Lingual Emotion Lexicons. - Harry Bunt, James Pustejovsky, Kiyong Lee:
Towards an ISO Standard for the Annotation of Quantification. - Tingsong Jiang, Jing Liu, Chin-Yew Lin, Zhifang Sui:
Revisiting Distant Supervision for Relation Extraction. - Omar Juárez Gambino, Hiram Calvo, Consuelo Varinia García Mendoza:
Distribution of Emotional Reactions to News Articles in Twitter. - Piotr Banski, Susanne Haaf, Martin Mueller:
Lightweight Grammatical Annotation in the TEI: New Perspectives. - Vuk Batanovic, Milos Cvetanovic, Bosko Nikolic:
Fine-grained Semantic Textual Similarity for Serbian. - Arnaud Ferré, Louise Deléger, Pierre Zweigenbaum, Claire Nédellec:
Combining rule-based and embedding-based approaches to normalize textual entities with an ontology. - Chaya Liebeskind, Ido Dagan, Jonathan Schler:
Automatic Thesaurus Construction for Modern Hebrew. - Edward Newell, Jackie Chi Kit Cheung:
Constructing a Lexicon of Relational Nouns. - Tirthankar Ghosal, Amitra Salam, Swati Tiwary, Asif Ekbal, Pushpak Bhattacharyya:
TAP-DLND 1.0 : A Corpus for Document Level Novelty Detection. - Melanie Geiger, Martin Braschler:
Overcoming the Long Tail Problem: A Case Study on CO2-Footprint Estimation of Recipes using Information Retrieval. - Deepak Gupta, Asif Ekbal, Pushpak Bhattacharyya:
A Deep Neural Network based Approach for Entity Extraction in Code-Mixed Indian Social Media Text. - Oliver Adams, Trevor Cohn, Graham Neubig, Hilaria Cruz, Steven Bird, Alexis Michaud:
Evaluation Phonemic Transcription of Low-Resource Tonal Languages for Language Documentation. - Ivan Habernal, Patrick Pauli, Iryna Gurevych:
Adapting Serious Game for Fallacious Argumentation to German: Pitfalls, Insights, and Best Practices. - Els Lefever, Iris Hendrickx, Ilja Croijmans, Antal van den Bosch, Asifa Majid:
Discovering the Language of Wine Reviews: A Text Mining Account. - Claus Zinn, Wei Qui, Marie Hinrichs, Emanuel Dima, Alexandr Chernov:
Handling Big Data and Sensitive Data Using EUDAT's Generic Execution Framework and the WebLicht Workflow Engine. - Qianchu Liu, Federico Fancellu, Bonnie L. Webber:
NegPar: A parallel corpus annotated for negation. - Yuki Arase, Jun'ichi Tsujii:
SPADE: Evaluation Dataset for Monolingual Phrase Alignment. - Thorsten Trippel, Claus Zinn:
Lessons Learned: On the Challenges of Migrating a Research Data Repository from a Research Institution to a University Library. - Valérie Mapelli, Victoria Arranz, Hélène Mazo, Pawel Kamocki, Vladimir Popescu:
New directions in ELRA activities. - Alicia Flores Lotz, Klas Ihme, Audrey Charnoz, Pantelis Maroudis, Ivan Dmitriev, Andreas Wendemuth:
Recognizing Behavioral Factors while Driving: A Real-World Multimodal Corpus to Monitor the Driver's Affective State. - Jean-Philippe Goldman, Yves Scherrer, Julie Glikman, Mathieu Avanzi, Christophe Benzitoun, Philippe Boula de Mareüil:
Crowdsourcing Regional Variation Data and Automatic Geolocalisation of Speakers of European French. - Egidio Marsico, Sébastien Flavier, Annemarie Verkerk, Steven Moran:
BDPROTO: A Database of Phonological Inventories from Ancient and Reconstructed Languages. - Yoshihiko Asao, Ryu Iida, Kentaro Torisawa:
Annotating Zero Anaphora for Question Answering. - Rik van Noord, Lasha Abzianidze, Hessel Haagsma, Johan Bos:
Evaluating Scoped Meaning Representations. - Kareem Darwish, Hamdy Mubarak, Ahmed Abdelali, Mohamed Eldesouki, Younes Samih, Randah Alharbi, Mohammed Attia, Walid Magdy, Laura Kallmeyer:
Multi-Dialect Arabic POS Tagging: A CRF Approach. - Thierry Etchegoyhen, Anna Fernández Torné, Andoni Azpeitia, Eva Martínez Garcia, Anna Matamala:
Evaluating Domain Adaptation for Machine Translation Across Scenarios. - Radu Ion, Elena Irimia, Verginica Barbu Mititelu:
Ensemble Romanian Dependency Parsing with Neural Networks. - Jakub Náplava, Milan Straka, Pavel Stranák, Jan Hajic:
Diacritics Restoration Using Neural Networks. - Franciska de Jong, Bente Maegaard, Koenraad De Smedt, Darja Fiser, Dieter Van Uytvanck:
CLARIN: Towards FAIR and Responsible Data Science Using Language Resources. - Tomohiro Sakaguchi, Daisuke Kawahara, Sadao Kurohashi:
Comprehensive Annotation of Various Types of Temporal Information on the Time Axis. - Chao-Chun Hsu, Sheng-Yeh Chen, Chuan-Chun Kuo, Ting-Hao K. Huang, Lun-Wei Ku:
EmotionLines: An Emotion Corpus of Multi-Party Conversations. - Motoki Yatsu, Kenji Araki:
Comparison of Pun Detection Methods Using Japanese Pun Corpus. - Ayla Rigouts Terryn, Véronique Hoste, Els Lefever:
A Gold Standard for Multilingual Automatic Term Extraction from Comparable Corpora: Term Structure and Translation Equivalents. - Francis Bond, Graham Matthews:
Toward An Epic Epigraph Graph. - Adrian Brasoveanu, Giuseppe Rizzo, Philipp Kuntschik, Albert Weichselbraun, Lyndon J. B. Nixon:
Framing Named Entity Linking Error Types. - Siyou Liu, Longyue Wang, Chao-Hong Liu:
Chinese-Portuguese Machine Translation: A Study on Building Parallel Corpora from Comparable Texts. - Ali Can Kocabiyikoglu, Laurent Besacier, Olivier Kraif:
Augmenting Librispeech with French Translations: A Multimodal Corpus for Direct Speech Translation Evaluation. - Edouard Grave, Piotr Bojanowski, Prakhar Gupta, Armand Joulin, Tomás Mikolov:
Learning Word Vectors for 157 Languages. - Hady ElSahar, Pavlos Vougiouklis, Arslen Remaci, Christophe Gravier, Jonathon S. Hare, Frédérique Laforest, Elena Simperl:
T-REx: A Large Scale Alignment of Natural Language with Knowledge Base Triples. - Manuela Sanguinetti, Cristina Bosco, Alberto Lavelli, Alessandro Mazzei, Oronzo Antonelli, Fabio Tamburini:
PoSTWITA-UD: an Italian Twitter Treebank in Universal Dependencies. - Will Roberts, Markus Egg:
A Large Automatically-Acquired All-Words List of Multiword Expressions Scored for Compositionality. - Holger Schwenk, Xian Li:
A Corpus for Multilingual Document Classification in Eight Languages. - Venelin Kovatchev, Toni Martí, Maria Salamó:
ETPC - A Paraphrase Identification Corpus Annotated with Extended Paraphrase Typology and Negation. - Vikas Reddy, Amrith Krishna, Vishnu Dutt Sharma, Prateek Gupta, Vineeth M. R, Pawan Goyal:
Building a Word Segmenter for Sanskrit Overnight. - David R. Traum, Cassidy Henry, Stephanie M. Lukin, Ron Artstein, Felix Gervits, Kimberly A. Pollard, Claire Bonial, Su Lei, Clare R. Voss, Matthew Marge, Cory J. Hayes, Susan G. Hill:
Dialogue Structure Annotation for Multi-Floor Interaction. - Akbar Karimi, Ebrahim Ansari, Bahram Sadeghi Bigham:
Extracting an English-Persian Parallel Corpus from Comparable Corpora. - Christian Hadiwinoto, Hwee Tou Ng:
Upping the Ante: Towards a Better Benchmark for Chinese-to-English Machine Translation. - Joonsuk Park, Claire Cardie:
A Corpus of eRulemaking User Comments for Measuring Evaluability of Arguments. - Pierre Godard, Gilles Adda, Martine Adda-Decker, Juan Benjumea, Laurent Besacier, Jamison Cooper-Leavitt, Guy-Noël Kouarata, Lori Lamel, Hélène Maynard, Markus Müller, Annie Rialland, Sebastian Stüker, François Yvon, Marcely Zanon Boito:
A Very Low Resource Language Speech Corpus for Computational Language Documentation Experiments. - Benoît Sagot:
A multilingual collection of CoNLL-U-compatible morphological lexicons. - Elena Musi, Manfred Stede, Leonard Kriese, Smaranda Muresan, Andrea Rocci:
A Multi-layer Annotated Corpus of Argumentative Text: From Argument Schemes to Discourse Relations. - Tomás Mikolov, Edouard Grave, Piotr Bojanowski, Christian Puhrsch, Armand Joulin:
Advances in Pre-Training Distributed Word Representations. - Vishwash Batra, Yulan He, George Vogiatzis:
Neural Caption Generation for News Images. - Dirk De Hertog, Piet Desmet:
Contextualized Usage-Based Material Selection. - Tommaso Pasini, Francesco Elia, Roberto Navigli:
Huge Automatically Extracted Training-Sets for Multilingual Word SenseDisambiguation. - Patrick Drouin, Marie-Claude L'Homme, Benoît Robichaud:
Lexical Profiling of Environmental Corpora. - Alexis Conneau, Douwe Kiela:
SentEval: An Evaluation Toolkit for Universal Sentence Representations. - Marcin Wolinski, Elzbieta Hajnicz, Tomasz Bartosiak:
A New Version of the Składnica Treebank of Polish Harmonised with the Walenty Valency Dictionary. - Marcos García-Salido, Marcos García, Milka Villayandre-Llamazares, Margarita Alonso Ramos:
A Lexical Tool for Academic Writing in Spanish based on Expert and Novice Corpora. - Winston Wu, Nidhi Vyas, David Yarowsky:
Creating a Translation Matrix of the Bible's Names Across 591 Languages. - Christopher Cieri, Mark Liberman, Stephanie M. Strassel, Denise DiPersio, Jonathan Wright, Andrea Mazzucchi:
From 'Solved Problems' to New Challenges: A Report on LDC Activities. - Drahomira Herrmannova, Petr Knoth, Robert M. Patton:
Analyzing Citation-Distance Networks for Evaluating Publication Impact. - Masatoshi Tsuchiya:
Performance Impact Caused by Hidden Bias of Training Data for Recognizing Textual Entailment. - Christo Kirov, Ryan Cotterell, John Sylak-Glassman, Géraldine Walther, Ekaterina Vylomova, Patrick Xia, Manaal Faruqui, S. J. Mielke, Arya McCarthy, Sandra Kübler, David Yarowsky, Jason Eisner, Mans Hulden:
UniMorph 2.0: Universal Morphology. - Angrosh Mandya, Danushka Bollegala, Frans Coenen, Katie Atkinson:
A Dataset for Inter-Sentence Relation Extraction using Distant Supervision. - Donghui Lin, Yohei Murakami, Toru Ishida:
A Framework for Multi-Language Service Design with the Language Grid. - Djegdjiga Amazouz, Martine Adda-Decker, Lori Lamel:
The French-Algerian Code-Switching Triggered audio corpus (FACST). - Mostafa Bayomi, Séamus Lawless:
C-HTS: A Concept-based Hierarchical Text Segmentation approach. - Georg Rehm, Stefanie Hegele:
Language Technology for Multilingual Europe: An Analysis of a Large-Scale Survey regarding Challenges, Demands, Gaps and Needs. - Olga Lovick, Christopher Cox, Miikka Silfverberg, Antti Arppe, Mans Hulden:
A Computational Architecture for the Morphology of Upper Tanana. - Jacqueline Brixey, Eli Pincus, Ron Artstein:
Chahta Anumpa: A multimodal corpus of the Choctaw Language. - Matthias Kraus, Johannes Kraus, Martin Baumann, Wolfgang Minker:
Effects of Gender Stereotypes on Trust and Likability in Spoken Human-Robot Interaction. - Milan Straka, Nikita Mediankin, Tom Kocmi, Zdenek Zabokrtský, Vojtech Hudecek, Jan Hajic:
SumeCzech: Large Czech News-Based Summarization Dataset. - Thomas Proisl, Stefan Evert, Fotis Jannidis, Christof Schöch, Leonard Konle, Steffen Pielström:
Delta vs. N-Gram Tracing: Evaluating the Robustness of Authorship Attribution Methods. - Aggeliki Vlachostergiou, Mark Dennison, Catherine Neubauer, Stefan Scherer, Peter Khooshabeh, Andre V. Harrison:
Unfolding the External Behavior and Inner Affective State of Teammates through Ensemble Learning: Experimental Evidence from a Dyadic Team Corpus. - Anoop Kunchukuttan, Pratik Mehta, Pushpak Bhattacharyya:
The IIT Bombay English-Hindi Parallel Corpus. - Aurélie Névéol, Antonio Jimeno-Yepes, Mariana L. Neves, Karin Verspoor:
Parallel Corpora for the Biomedical Domain. - Maximiliana Behnke, Antonio Valerio Miceli Barone, Rico Sennrich, Vilelmini Sosoni, Thanasis Naskos, Eirini Takoulidou, Maria Stasimioti, Menno van Zaanen, Sheila Castilho, Federico Gaspari, Panayota Georgakopoulou, Valia Kordoni, Markus Egg, Katia Lida Kermanidis:
Improving Machine Translation of Educational Content via Crowdsourcing. - Claire Bonial, Bianca Badarau, Kira Griffitt, Ulf Hermjakob, Kevin Knight, Tim O'Gorman, Martha Palmer, Nathan Schneider:
Abstract Meaning Representation of Constructions: The More We Include, the Better the Representation. - Ritesh Kumar, Aishwarya N. Reganti, Akshit Bhatia, Tushar Maheshwari:
Aggression-annotated Corpus of Hindi-English Code-mixed Data. - Carmen Klaussner, Carl Vogel:
A Diachronic Corpus for Literary Style Analysis. - Hiyon Yoo, Inyoung Kim:
CBFC: a parallel L2 speech corpus for Korean and French learners. - Tommaso Caselli, Roser Morante:
Systems' Agreements and Disagreements in Temporal Processing: An Extensive Error Analysis of the TempEval-3 Task. - Solen Quiniou, Béatrice Daille:
Towards a Diagnosis of Textual Difficulties for Children with Dyslexia. - Jiseong Kim, YoungGyun Hahm, Sunggoo Kwon, Key-Sun Choi:
Automatic Wordnet Mapping: from CoreNet to Princeton WordNet. - Christopher Cieri, James Fiumara, Mark Liberman, Chris Callison-Burch, Jonathan Wright:
Introducing NIEUW: Novel Incentives and Workflows for Eliciting Linguistic Data. - Dagmar Gromann, Thierry Declerck:
Comparing Pretrained Multilingual Word Embeddings on an Ontology Alignment Task. - Jean-Philippe Goldman, Simon Clematide, Mathieu Avanzi, Raphaël Tandler:
Strategies and Challenges for Crowdsourcing Regional Dialect Perception Data for Swiss German and Swiss French. - Winston Wu, David Yarowsky:
Creating Large-Scale Multilingual Cognate Tables. - Ayushi Pandey, Brij Mohan Lal Srivastava, Rohit Kumar, Bhanu Teja Nellore, Kasi Sai Teja, Suryakanth V. Gangashetty:
Phonetically Balanced Code-Mixed Speech Corpus for Hindi-English Automatic Speech Recognition. - Fatima Hamlaoui, Emmanuel-Moselly Makasso, Markus Müller, Jonas Engelmann, Gilles Adda, Alex Waibel, Sebastian Stüker:
BULBasaa: A Bilingual Basaa-French Speech Corpus for the Evaluation of Language Documentation Tools. - Pasindu Tennage, Prabath Sandaruwan, Malith Thilakarathne, Achini Herath, Surangika Ranathunga:
Handling Rare Word Problem using Synthetic Training Data for Sinhala and Tamil Neural Machine Translation. - Alakananda Vempala, Eduardo Blanco:
Annotating Temporally-Anchored Spatial Knowledge by Leveraging Syntactic Dependencies. - K. Bretonnel Cohen, Jingbo Xia, Pierre Zweigenbaum, Tiffany Callahan, Orin Hargraves, Foster R. Goss, Nancy Ide, Aurélie Névéol, Cyril Grouin, Lawrence E. Hunter:
Three Dimensions of Reproducibility in Natural Language Processing. - Kristiina Jokinen:
Researching Less-Resourced Languages - the DigiSami Corpus. - Saif M. Mohammad, Svetlana Kiritchenko:
Understanding Emotions: A Dataset of Tweets to Study Interactions between Affect Categories. - Nasredine Semmar:
A Hybrid Approach for Automatic Extraction of Bilingual Multiword Expressions from Parallel Corpora. - Austin Blodgett, Nathan Schneider:
Semantic Supersenses for English Possessives. - Piotr Zelasko:
Expanding Abbreviations in a Strongly Inflected Language: Are Morphosyntactic Tags Sufficient? - Rodrigo Agerri, Yiling Chung, Itziar Aldabe, Nora Aranberri, Gorka Labaka, German Rigau:
Building Named Entity Recognition Taggers via Parallel Corpora. - Zhiyi Song, Ann Bies, Justin Mott, Xuansong Li, Stephanie M. Strassel, Christopher Caruso:
Cross-Document, Cross-Language Event Coreference Annotation Using Event Hoppers. - Rebecca Sharp, Mithun Paul, Ajay Nagesh, Dane Bell, Mihai Surdeanu:
Grounding Gradable Adjectives through Crowdsourcing. - Dimosthenis Kontogiorgos, Vanya Avramova, Simon Alexandersson, Patrik Jonell, Catharine Oertel, Jonas Beskow, Gabriel Skantze, Joakim Gustafson:
A Multimodal Corpus for Mutual Gaze and Joint Attention in Multiparty Situated Interaction. - Andreea Godea, Rodney Nielsen:
Annotating Educational Questions for Student Response Analysis. - Vivek Reddy Doudagiri, Alakananda Vempala, Eduardo Blanco:
Annotating If the Authors of a Tweet are Located at the Locations They Tweet About. - Rodrigo Wilkens, Leonardo Zilio, Cédrick Fairon:
SW4ALL: a CEFR Classified and Aligned Corpus for Language Learning. - Kira Griffitt, Jennifer Tracey, Ann Bies, Stephanie M. Strassel:
Simple Semantic Annotation and Situation Frames: Two Approaches to Basic Text Understanding in LORELEI. - Kira Droganova, Daniel Zeman, Jenna Kanerva, Filip Ginter:
Parse Me if You Can: Artificial Treebanks for Parsing Experiments on Elliptical Constructions. - Jeremy Getman, Joe Ellis, Stephanie M. Strassel, Zhiyi Song, Jennifer Tracey:
Laying the Groundwork for Knowledge Base Population: Nine Years of Linguistic Resources for TAC KBP. - Edward Newell, Drew Margolin, Derek Ruths:
An Attribution Relations Corpus for Political News. - Ruchit Agrawal, Vighnesh Chenthil Kumar, Vigneshwaran Muralidaran, Dipti Misra Sharma:
No more beating about the bush : A Step towards Idiom Handling for Indian Language NLP. - Susan Windisch Brown, James Pustejovsky, Annie Zaenen, Martha Palmer:
Integrating Generative Lexicon Event Structures into VerbNet. - Tugba Kulahcioglu, Gerard de Melo:
FontLex: A Typographical Lexicon based on Affective Associations. - Carolina Scarton, Gustavo Paetzold, Lucia Specia:
Text Simplification from Professionally Produced Corpora. - Dimitris N. Metaxas, Mark Dilsizian, Carol Neidle:
Linguistically-driven Framework for Computationally Efficient and Scalable Sign Recognition. - Tim O'Gorman, Sameer Pradhan, Martha Palmer, Julia Bonn, Kathryn Conger, James Gung:
The New Propbank: Aligning Propbank with AMR through POS Unification. - Lei Chen, Sylvie Gibet, Camille Marteau:
CONDUCT: An Expressive Conducting Gesture Dataset for Sound Control. - Jacky Visser, Rory Duthie, John Lawrence, Chris Reed:
Intertextual Correspondence for Integrating Corpora. - Markus Zlabinger, Linda Andersson, Allan Hanbury, Michael Andersson, Vanessa Quasnik, Jon Brassey:
Medical Entity Corpus with PICO elements and Sentiment Analysis. - Nelson Mukuze, Anna Rohrbach, Vera Demberg, Bernt Schiele:
A vision-grounded dataset for predicting typical locations for verbs. - AbdelRahim A. Elmadany, Sherif M. Abdou, Mervat Gheith:
Improving Dialogue Act Classification for Spontaneous Arabic Speech and Instant Messages at Utterance Level. - Lukás Svoboda, Slobodan Beliga:
Evaluation of Croatian Word Embeddings. - Octavian Popescu, Ngoc Phuoc An Vo, Vadim Sheinin:
A Large Resource of Patterns for Verbal Paraphrases. - Attila Novák, Borbála Novák:
Cross-Lingual Generation and Evaluation of a Wide-Coverage Lexical Semantic Resource. - Henrique D. P. dos Santos, Vinicius Woloszyn, Renata Vieira:
BlogSet-BR: A Brazilian Portuguese Blog Corpus. - Abbas Ghaddar, Philippe Langlais:
Transforming Wikipedia into a Large-Scale Fine-Grained Entity Type Corpus. - Halidanmu Abudukelimu, Abudoukelimu Abulizi, Boliang Zhang, Xiaoman Pan, Di Lu, Heng Ji, Yang Liu:
Error Analysis of Uyghur Name Tagging: Language-specific Techniques and Remaining Challenges. - Aikaterini-Lida Kalouli, Katharina Kaiser, Annette Hautli-Janisz, Georg A. Kaiser, Miriam Butt:
A Multilingual Approach to Question Classification. - Laura Fernández Gallardo, Benjamin Weiss:
The Nautilus Speaker Characterization Corpus: Speech Recordings and Labels of Speaker Characteristics and Voice Descriptions. - Sarah Ebling, Necati Cihan Camgöz, Penny Boyes Braem, Katja Tissi, Sandra Sidler-Miserez, Stephanie Stoll, Simon Hadfield, Tobias Haug, Richard Bowden, Sandrine Tornay, Marzieh Razavi, Mathew Magimai-Doss:
SMILE Swiss German Sign Language Dataset. - Florian Schiel, Thomas Zitzelsberger:
Evaluation of Automatic Formant Trackers. - Reid Pryzant, Youngjoo Chung, Dan Jurafsky, Denny Britz:
JESC: Japanese-English Subtitle Corpus. - Ricelli Moreira Silva Ramos, Georges Basile Stavracas Neto, Bárbara Barbosa Claudino da Silva, Danielle Sampaio Monteiro, Ivandré Paraboni, Rafael Dias:
Building a Corpus for Personality-dependent Natural Language Understanding and Generation. - Yiming Cui, Ting Liu, Zhipeng Chen, Wentao Ma, Shijin Wang, Guoping Hu:
Dataset for the First Evaluation on Chinese Machine Reading Comprehension. - Amir Hazem, Basma El Amel Boussaha, Nicolas Hernandez:
A Multi-Domain Framework for Textual Similarity. A Case Study on Question-to-Question and Question-Answering Similarity Tasks. - Alex Lan, Ivandré Paraboni:
Definite Description Lexical Choice: taking Speaker's Personality into account. - André Mariotti, Ivandré Paraboni:
Referring Expression Generation in time-constrained communication. - Lubos Smídl, Jan Svec, Daniel Tihelka, Jindrich Matousek, Jan Romportl, Pavel Ircing:
Design and Development of Speech Corpora for Air Traffic Control Training. - Diego Moussallem, Mohamed Ahmed Sherif, Diego Esteves, Marcos Zampieri, Axel-Cyrille Ngonga Ngomo:
LIdioms: A Multilingual Linked Idioms Data Set. - Hanieh Poostchi, Ehsan Zare Borzeshi, Massimo Piccardi:
BiLSTM-CRF for Persian Named-Entity Recognition ArmanPersoNERCorpus: the First Entity-Annotated Persian Dataset. - Bartosz Ziólko, Piotr Zelasko, Ireneusz Gawlik, Tomasz Pedzimaz, Tomasz Jadczyk:
An Application for Building a Polish Telephone Speech Corpus. - Zeljko Agic, Natalie Schluter:
Baselines and Test Data for Cross-Lingual Inference. - Suguru Matsuyoshi, Hirotaka Kameko, Yugo Murawaki, Shinsuke Mori:
Annotating Modality Expressions and Event Factuality for a Japanese Chess Commentary Corpus. - Dat Quoc Nguyen, Dai Quoc Nguyen, Thanh Vu, Mark Dras, Mark Johnson:
A Fast and Accurate Vietnamese Word Segmenter. - Shuyuan Cao, Harritxu Gete:
Using Discourse Information for Education with a Spanish-Chinese Parallel Corpus. - Michael Wiegand, Sylvette Loda, Josef Ruppenhofer:
Disambiguation of Verbal Shifters. - Kay Berkling:
A 2nd Longitudinal Corpus for Children's Writing with Enhanced Output for Specific Spelling Patterns. - Ian D. Wood, John Philip McCrae, Vladimir Andryushechkin, Paul Buitelaar:
A Comparison Of Emotion Annotation Schemes And A New Annotated Data Set. - Lyan Verwimp, Hugo Van hamme, Patrick Wambacq:
TF-LM: TensorFlow-based Language Modeling Toolkit. - Shinnosuke Takamichi, Hiroshi Saruwatari:
CPJD Corpus: Crowdsourced Parallel Speech Corpus of Japanese Dialects. - Kiem-Hieu Nguyen:
BKTreebank: Building a Vietnamese Dependency Treebank. - Francis M. Tyers, Sevilay Bayatli, Güllü Karanfil, Memduh Gokirmak:
Finite-state morphological analysis for Gagauz. - Juyeon Kang, Jungyeul Park:
Data Anonymization for Requirements Quality Analysis: a Reproducible Automatic Error Detection Task. - Roman Schneider, Monica Fürbacher:
GeCoTagger: Annotation of German Verb Complements with Conditional Random Fields. - Matiss Rikters, Marcis Pinnis, Rihards Krislauks:
Training and Adapting Multilingual NMT for Less-resourced and Morphologically Rich Languages. - Peter A. Jansen, Elizabeth Wainwright, Steven Marmorstein, Clayton T. Morrison:
WorldTree: A Corpus of Explanation Graphs for Elementary Science Questions supporting Multi-hop Inference. - Amal Alsaif, Tasniem Alyahya, Madawi Alotaibi, Huda Abdulrahman Almuzaini, Abeer Algahtani:
Annotating Attribution Relations in Arabic. - Ina Rösiger:
BASHI: A Corpus of Wall Street Journal Articles Annotated with Bridging Links. - Martin Schiersch, Veselina Mironova, Maximilian Schmitt, Philippe Thomas, Aleksandra Gabryszak, Leonhard Hennig:
A German Corpus for Fine-Grained Named Entity Recognition and Relation Extraction of Traffic and Industry Events. - Ludovic Tanguy, Cécile Fabre, Laura Rivière:
Extending the gold standard for a lexical substitution task: is it worth it? - Saskia Schön, Veselina Mironova, Aleksandra Gabryszak, Leonhard Hennig:
A Corpus Study and Annotation Schema for Named Entity Recognition and Relation Extraction of Business Products. - Besim Kabashi, Thomas Proisl:
Albanian Part-of-Speech Tagging: Gold Standard and Evaluation. - Ewald van der Westhuizen, Thomas Niesler:
A First South African Corpus of Multilingual Code-switched Soap Opera Speech. - Gideon Mendels, Victor Soto, Aaron Jaech, Julia Hirschberg:
Collecting Code-Switched Data from Social Media. - Luwen Huangfu, Mihai Surdeanu:
Bootstrapping Polar-Opposite Emotion Dimensions from Online Reviews. - Yuya Sakaizawa, Mamoru Komachi:
Construction of a Japanese Word Similarity Dataset. - Natsuda Laokulrat, Naoaki Okazaki, Hideki Nakayama:
Incorporating Semantic Attention in Video Description Generation. - Rui Suzuki, Kanako Komiya, Masayuki Asahara, Minoru Sasaki, Hiroyuki Shinnou:
All-words Word Sense Disambiguation Using Concept Embeddings. - Inigo Jauregi Unanue, Lierni Garmendia Arratibel, Ehsan Zare Borzeshi, Massimo Piccardi:
English-Basque Statistical and Neural Machine Translation. - Vadim Kimmelman, Anna Klezovich, George Moroz:
IPSL: A Database of Iconicity Patterns in Sign Languages. Creation and Use. - Nina Pörner, Florian Schiel:
A Web Service for Pre-segmenting Very Long Transcribed Speech Recordings. - Estelle Delpech, Marion Laignelet, Christophe Pimm, Céline Raynal, Michal Trzos, Alexandre Arnold, Dominique Pronto:
A Real-life, French-accented Corpus of Air Traffic Control Communications. - Stefano Melacci, Achille Globo, Leonardo Rigutini:
Enhancing Modern Supervised Word Sense Disambiguation Models by Semantic Lexical Resources. - Vivi Nastase, Julian Hitschler:
Correction of OCR Word Segmentation Errors in Articles from the ACL Collection through Neural Machine Translation Methods. - Olga Majewska, Diana McCarthy, Ivan Vulic, Anna Korhonen:
Acquiring Verb Classes Through Bottom-Up Semantic Verb Clustering. - Haoyue Shi, Xihao Wang, Yuqi Sun, Junfeng Hu:
Constructing High Quality Sense-specific Corpus and Word Embedding via Unsupervised Elimination of Pseudo Multi-sense. - Koki Washio, Tsuneaki Kato:
Undersampling Improves Hypernymy Prototypicality Learning. - Sreelekha S, Pushpak Bhattacharyya:
Morphology Injection for English-Malayalam Statistical Machine Translation. - Pavithra Rajendran, Danushka Bollegala, Simon Parsons:
Sentiment-Stance-Specificity (SSS) Dataset: Identifying Support-based Entailment among Opinions. - Andrej Kibrik, Olga Fedorova:
A «Portrait» Approach to Multichannel Discourse. - Yang Zhao, Jiajun Zhang, Chengqing Zong:
Exploiting Pre-Ordering for Neural Machine Translation. - Suzi Park, Hyopil Shin:
Grapheme-level Awareness in Word Embeddings for Morphologically Rich Languages. - Andargachew Mekonnen Gezmu, Andreas Nürnberger, Binyam Ephrem Seyoum:
Portable Spelling Corrector for a Less-Resourced Language: Amharic. - Michael Gref, Joachim Köhler, Almut Leh:
Improved Transcription and Indexing of Oral History Interviews for Digital Humanities Research. - Gyu-Hyeon Choi, Jong-Hun Shin, Young Kil Kim:
Improving a Multi-Source Neural Machine Translation Model with Corpus Extension for Low-Resource Languages. - Deniz Zeyrek, Amália Mendes, Murathan Kurfali:
Multilingual Extension of PDTB-Style Annotation: The Case of TED Multilingual Discourse Bank. - Yo Sato, Kevin Heffernan:
Creating dialect sub-corpora by clustering: a case in Japanese for an adaptive method. - Shuo Wang, Zehui Hao, Xiaofeng Meng, Qiuyue Wang:
ScholarGraph: a Chinese Knowledge Graph of Chinese Scholars. - Maria Moritz, David Steding:
Lexical and Semantic Features for Cross-lingual Text Reuse Classification: an Experiment in English and Latin Paraphrases. - Rama Rohit Reddy Gangula, Radhika Mamidi:
Resource Creation Towards Automated Sentiment Analysis in Telugu (a low resource language) and Integrating Multiple Domain Sources to Enhance Sentiment Prediction. - Xiaomin Chu, Feng Jiang, Sheng Xu, Qiaoming Zhu:
Building a Macro Chinese Discourse Treebank. - Samar Haider:
Urdu Word Embeddings. - Mohammed Attia, Younes Samih, Ali El-Kahky, Laura Kallmeyer:
Multilingual Multi-class Sentiment Classification Using Convolutional Neural Networks. - Mohammed Attia, Vitaly Nikolaev, Ali El-Kahky:
The Morpho-syntactic Annotation of Animacy for a Dependency Parser. - Linrui Zhang, Dan Moldovan:
Chinese Relation Classification using Long Short Term Memory Networks. - Anne-Kathrin Schumann, Héctor Martínez Alonso:
Automatic Annotation of Semantic Term Types in the Complete ACL Anthology Reference Corpus. - Chi-Yen Chen, Wei-Yun Ma:
Word Embedding Evaluation Datasets and Wikipedia Title Embedding for Chinese. - Mikhail Khodak, Nikunj Saunshi, Kiran Vodrahalli:
A Large Self-Annotated Corpus for Sarcasm. - Zi-Yi Dou, Hao Zhou, Shujian Huang, Xin-Yu Dai, Jiajun Chen:
Dynamic Oracle for Neural Machine Translation in Decoding Phase. - Weicheng Ma, Kai Cao, Zhaoheng Ni, Peter Chin, Xiang Li:
Sound Signal Processing with Seq2Tree Network. - Juliana Miehle, Wolfgang Minker, Stefan Ultes:
What Causes the Differences in Communication Styles? A Multicultural Study on Directness and Elaborateness. - Juliana Miehle, Nadine Gerstenlauer, Daniel Ostler, Hubertus Feußner, Wolfgang Minker, Stefan Ultes:
Expert Evaluation of a Spoken Dialogue System in a Clinical Operating Room. - Kiyoaki Shirai, Tomotaka Fukuoka:
JAIST Annotated Corpus of Free Conversation. - Bartlomiej Niton, Pawel Morawiecki, Maciej Ogrodniczuk:
Deep Neural Networks for Coreference Resolution for Polish. - Karima Abidi, Kamel Smaïli:
An Automatic Learning of an Algerian Dialect Lexicon by using Multilingual Word Embeddings. - Volha Petukhova, Andrei Malchanau, Youssef Oualil, Dietrich Klakow, Saturnino Luz, Fasih Haider, Nick Campbell, Dimitris Koryzis, Dimitris Spiliotopoulos, Pierre Albert, Nicklas Linz, Jan Alexandersson:
The Metalogue Debate Trainee Corpus: Data Collection and Annotations. - Andreas Liesenfeld:
MYCanCor: A Video Corpus of spoken Malaysian Cantonese. - Tatiana Bladier, Esther Seyffarth, Oliver Hellwig, Wiebke Petersen:
AET: Web-based Adjective Exploration Tool for German. - Anna Björk Nikulásdóttir, Inga Rún Helgadóttir, Matthías Pétursson, Jón Guðnason:
Open ASR for Icelandic: Resources and a Baseline System. - Akari Asai, Sara Evensen, Behzad Golshan, Alon Y. Halevy, Vivian Li, Andrei Lopatenko, Daniela Stepanov, Yoshihiko Suhara, Wang-Chiew Tan, Yinzhan Xu:
HappyDB: A Corpus of 100, 000 Crowdsourced Happy Moments. - Binyang Li, Jun Xiang, Le Chen, Xu Han, Xiaoyan Yu, Ruifeng Xu, Tengjiao Wang, Kam-Fai Wong:
The UIR Uncertainty Corpus for Chinese: Annotating Chinese Microblog Corpus for Uncertainty Identification from Social Media. - Simon Ostermann, Hannah Seitz, Stefan Thater, Manfred Pinkal:
Mapping Texts to Scripts: An Entailment Study. - Tao Ge, Lei Cui, Baobao Chang, Zhifang Sui, Furu Wei, Ming Zhou:
EventWiki: A Knowledge Base of Major Events. - Marijn Schraagen, Feike Dietz, Marjo van Koppen:
Linguistic and Sociolinguistic Annotation of 17th Century Dutch Letters. - Jeremy Barnes, Toni Badia, Patrik Lambert:
MultiBooked: A Corpus of Basque and Catalan Hotel Reviews Annotated for Aspect-level Sentiment Classification. - Claire Broad, Helen Langone, David Guy Brizan:
Candidate Ranking for Maintenance of an Online Dictionary. - Kijong Han, Sangha Nam, Jiseong Kim, YoungGyun Hahm, Key-Sun Choi:
Unsupervised Korean Word Sense Disambiguation using CoreNet. - Serge Sharoff:
Language adaptation experiments via cross-lingual embeddings for related languages. - João Rodrigues, Chakaveh Saedi, António Branco, João Silva:
Semantic Equivalence Detection: Are Interrogatives Harder than Declaratives? - Mahmoud El-Haj, Paul Rayson, Mariam Aboelezz:
Arabic Dialect Identification in the Context of Bivalency and Code-Switching. - Satoru Uchida, Shohei Takada, Yuki Arase:
CEFR-based Lexical Simplification Dataset. - Kordula De Kuthy, Nils Reiter, Arndt Riester:
QUD-Based Annotation of Discourse Structure and Information Structure: Tool and Evaluation. - Yu Yuan, Serge Sharoff:
Investigating the Influence of Bilingual MWU on Trainee Translation Quality. - Yu Yuan, Yuze Gao, Yue Zhang, Serge Sharoff:
Cross-lingual Terminology Extraction for Translation Quality Estimation. - Jack Halpern:
Very Large-Scale Lexical Resources to Enhance Chinese and Japanese Machine Translation. - Mika Hasegawa, Tetsunori Kobayashi, Yoshihiko Hayashi:
Social Image Tags as a Source of Word Embeddings: A Task-oriented Evaluation. - Loïc Vial, Benjamin Lecouteux, Didier Schwab:
UFSAC: Unification of Sense Annotated Corpora and Tools. - Giulia Donato, Patrizia Paggio:
Classifying the Informative Behaviour of Emoji in Microblogs. - Stefano Faralli, Els Lefever, Simone Paolo Ponzetto:
MIsA: Multilingual "IsA" Extraction from Corpora. - Askars Salimbajevs:
Creating Lithuanian and Latvian Speech Corpora from Inaccurately Annotated Web Data. - Vanya Dimitrova, Christian Fäth, Christian Chiarcos, Heike Renner-Westermann, Frank Abromeit:
Interoperability of Language-related Information: Mapping the BLL Thesaurus to Lexvo and Glottolog. - Jan Nehring, Felix Sasaki:
A Framework for the Needs of Different Types of Users in Multilingual Semantic Enrichment. - Stefano Faralli, Alexander Panchenko, Chris Biemann, Simone Paolo Ponzetto:
Enriching Frame Representations with Distributionally Induced Senses. - Todd Shore, Theofronia Androulakaki, Gabriel Skantze:
KTH Tangrams: A Dataset for Research on Alignment and Conceptual Pacts in Task-Oriented Dialogue. - Menghan Jiang, Natalia Klyueva, Hongzhi Xu, Chu-Ren Huang:
Annotating Chinese Light Verb Constructions according to PARSEME guidelines. - Kevin P. Yancey, Yves Lepage:
Korean L2 Vocabulary Prediction: Can a Large Annotated Corpus be Used to Train Better Models for Predicting Unknown Words? - Elvys Linhares Pontes, Juan-Manuel Torres-Moreno, Stéphane Huet, Andréa Carneiro Linhares:
A New Annotated Portuguese/Spanish Corpus for the Multi-Sentence Compression Task. - Anna Koroleva, Patrick Paroubek:
Annotating Spin in Biomedical Scientific Publications : the case of Random Controlled Trials (RCTs). - Sunayana Sitaram, Varun Manjunath, Varun Bharadwaj, Monojit Choudhury, Kalika Bali, Michael Tjalve:
Discovering Canonical Indian English Accents: A Crowdsourcing-based Approach. - Takumi Maruyama, Kazuhide Yamamoto:
Simplified Corpus with Core Vocabulary. - Michael Färber, Alexander Thiemann, Adam Jatowt:
A High-Quality Gold Standard for Citation-based Tasks. - Armin Hoenen:
Multi Modal Distance - An Approach to Stemma Generation With Weighting. - Adeline Granet, Benjamin Hervy, Geoffrey Roman-Jimenez, Marouane Hachicha, Emmanuel Morin, Harold Mouchère, Solen Quiniou, Guillaume Raschia, Françoise Rubellin, Christian Viard-Gaudin:
Crowdsourcing-based Annotation of the Accounting Registers of the Italian Comedy. - Thierry Declerck, Kseniya Egorova, Eileen Schnur:
An Integrated Formal Representation for Terminological and Lexical Data included in Classification Schemes. - Delphine Bernhard, Anne-Laure Ligozat, Fanny Martin, Myriam Bras, Pierre Magistry, Marianne Vergez-Couret, Lucie Steiblé, Pascale Erhart, Nabil Hathout, Dominique Huck, Christophe Rey, Philippe Reynes, Sophie Rosset, Jean Sibille, Thomas Lavergne:
Corpora with Part-of-Speech Annotations for Three Regional Languages of France: Alsatian, Occitan and Picard. - Steven Zimmerman, Udo Kruschwitz, Chris Fox:
Improving Hate Speech Detection with Deep Learning Ensembles. - Shilei Huang, Jiangqin Wu:
A Pragmatic Approach for Classical Chinese Word Segmentation. - Ting Han, David Schlangen:
A Corpus of Natural Multimodal Spatial Scene Descriptions. - Ryusei Matsumoto, Minoru Yoshida, Kazuyuki Matsumoto, Hironobu Matsuda, Kenji Kita:
Visualization of the occurrence trend of infectious diseases using Twitter. - Riccardo Del Gratta, Sara Goggi, Gabriella Pardelli, Nicoletta Calzolari:
LREMap, a Song of Resources and Evaluation. - Alan Akbik, Roland Vollgraf:
ZAP: An Open-Source Multilingual Annotation Projection Framework. - Amaru Cuba Gyllensten, Magnus Sahlgren:
Distributional Term Set Expansion. - Louisa Pragst, Niklas Rach, Wolfgang Minker, Stefan Ultes:
On the Vector Representation of Utterances in Dialogue Context. - Rob van der Goot, Rik van Noord, Gertjan van Noord:
A Taxonomy for In-depth Evaluation of Normalization for User Generated Content. - Dimitrios Kokkinakis, Kristina Lundholm Fors, Kathleen C. Fraser, Arto Nordlund:
A Swedish Cookie-Theft Corpus. - Javier Álvez, Itziar Gonzalez-Dios, German Rigau:
Cross-checking WordNet and SUMO Using Meronymy. - Juliana P. C. Pirovani, Elias de Oliveira:
Portuguese Named Entity Recognition using Conditional Random Fields and Local Grammars. - Matej Martinc, Senja Pollak:
Reusable workflows for gender prediction. - Zsanett Ferenczi, Iván Mittelholcz, Eszter Simon, Tamás Váradi:
Evaluation of Dictionary Creating Methods for Finno-Ugric Minority Languages. - Armin Hoenen:
From Manuscripts to Archetypes through Iterative Clustering. - Avinesh P. V. S., Maxime Peyrard, Christian M. Meyer:
Live Blog Corpus for Summarization. - Leonidas Lefakis, Alan Akbik, Roland Vollgraf:
FEIDEGGER: A Multi-modal Corpus of Fashion Images and Descriptions in German. - Laura García-Sardiña, Manex Serras, Arantza del Pozo:
ES-Port: a Spontaneous Spoken Human-Human Technical Support Corpus for Dialogue Research in Spanish. - Núria Bel, Joel Pocostales:
Can Domain Adaptation be Handled as Analogies? - Veronika Vincze, Klára Hegedüs, Alex Sliz-Nagy, Richárd Farkas:
SzegedKoref: A Hungarian Coreference Corpus. - Akihiro Katsuta, Kazuhide Yamamoto:
Crowdsourced Corpus of Sentence Simplification with Core Vocabulary. - Wasi Uddin Ahmad, Kai-Wei Chang:
A Corpus to Learn Refer-to-as Relations for Nominals. - Fernando Tadao Ito, Helena de Medeiros Caseli, Jander Moreira:
The Effects of Unimodal Representation Choices on Multimodal Learning. - Silvia Pareti, Tatiana Lando:
Dialog Intent Structure: A Hierarchical Schema of Linked Dialog Acts. - Rashel Fam, Yves Lepage:
Tools for The Production of Analogical Grids and a Resource of N-gram Analogical Grids in 11 Languages. - Shun-ya Fukunaga, Hitoshi Nishikawa, Takenobu Tokunaga, Hikaru Yokono, Tetsuro Takahashi:
Analysis of Implicit Conditions in Database Search Dialogues. - Tetsuaki Nakamura, Daisuke Kawahara:
JDCFC: A Japanese Dialogue Corpus with Feature Changes. - Armin Hoenen, Niko Schenk:
Knowing the Author by the Company His Words Keep. - Fernando Hsieh, Rafael Dias, Ivandré Paraboni:
Author Profiling from Facebook Corpora. - Arun Sharma, Tomek Strzalkowski:
Gaining and Losing Influence in Online Conversation. - Ankush Khandelwal, Sahil Swami, Syed Sarfaraz Akhtar, Manish Shrivastava:
Humor Detection in English-Hindi Code-Mixed Social Media Content : Corpus and Baseline System. - Mika Hämäläinen, Liisa Lotta Tarvainen, Jack Rueter:
Combining Concepts and Their Translations from Structured Dictionaries of Uralic Minority Languages. - Rafael T. Anchiêta, Thiago A. S. Pardo:
Towards AMR-BR: A SemBank for Brazilian Portuguese Language. - Andrea Zielinski, Peter Mutschke:
Towards a Gold Standard Corpus for Variable Detection and Linking in Social Science Publications. - Anant Maheshwari, Léo Bouscarrat, Paul Cook:
Towards Language Technology for Mi'kmaq. - Sandeep Mathias, Pushpak Bhattacharyya:
ASAP++: Enriching the ASAP Automated Essay Grading Dataset with Essay Attribute Scores. - Kenji Yamauchi, Hajime Yamamoto, Wakaha Mori:
Building A Handwritten Cuneiform Character Imageset. - Tsung-Han Yang, Hen-Hsen Huang, An-Zi Yen, Hsin-Hsi Chen:
Transfer of Frames from English FrameNet to Construct Chinese FrameNet: A Bilingual Corpus-Based Approach. - Jayeol Chun, Na-Rae Han, Jena D. Hwang, Jinho D. Choi:
Building Universal Dependency Treebanks in Korean. - Yasuhiro Minami, Tessei Kobayashi, Yuko Okumura:
Infant Word Comprehension-to-Production Index Applied to Investigation of Noun Learning Predominance Using Cross-lingual CDI database. - Cvetana Krstev, Branislava Sandrih, Ranka Stankovic, Miljana Mladenovic:
Using English Baits to Catch Serbian Multi-Word Terminology. - Henrico Bertini Brum, Maria das Graças Volpe Nunes:
Building a Sentiment Corpus of Tweets in Brazilian Portuguese. - Dan Tufis, Dan Cristea:
A Bird's-eye View of Language Processing Projects at the Romanian Academy. - Akihiko Kato, Hiroyuki Shindo, Yuji Matsumoto:
Construction of Large-scale English Verbal Multiword Expression Annotated Corpus. - Nizar Habash, Fadhl Eryani, Salam Khalifa, Owen Rambow, Dana Abdulrahim, Alexander Erdmann, Reem Faraj, Wajdi Zaghouani, Houda Bouamor, Nasser Zalmout, Sara Hassan, Faisal Al-Shargi, Sakhar B. Alkhereyf, Basma Abdulkareem, Ramy Eskander, Mohammad Salameh, Hind Saddiki:
Unified Guidelines and Resources for Arabic Dialect Orthography. - Go Inoue, Nizar Habash, Yuji Matsumoto, Hiroyuki Aoyama:
A Parallel Corpus of Arabic-Japanese News Articles. - Lucie Steiblé, Delphine Bernhard:
Pronunciation Dictionaries for the Alsatian Dialects to Analyze Spelling and Phonetic Variation. - Duc Anh Phan, Yuji Matsumoto:
EMTC: Multilabel Corpus in Movie Domain for Emotion Analysis in Conversational Text. - Emer Gilmartin, Christian Saam, Brendan Spillane, Maria O'Reilly, Ketong Su, Arturo Calvo, Loredana Cerrato, Killian Levacher, Nick Campbell, Vincent Wade:
The ADELE Corpus of Dyadic Social Text Conversations: Dialog Act Annotation with ISO 24617-2. - José Lopes, Nils Hemmingsson, Oliver Åstrand:
The Spot the Difference corpus: a multi-modal corpus of spontaneous task oriented spoken interactions. - Zhao Meng, Lili Mou, Zhi Jin:
Towards Neural Speaker Modeling in Multi-Party Conversation: The Task, Dataset, and Models. - Margot Mieskes, Andreas Stiegelmayr:
Preparing Data from Psychotherapy for Natural Language Processing. - Christian Chiarcos, Kathrin Donandt, Maxim Ionov, Monika Rind-Pawlowski, Hasmik Sargsian, Jesse Wichers Schreur, Frank Abromeit, Christian Fäth:
Universal Morphologies for the Caucasus region. - Verginica Barbu Mititelu, Dan Tufis, Elena Irimia:
The Reference Corpus of the Contemporary Romanian Language (CoRoLa). - Maria Mitrofan, Dan Tufis:
BioRo: The Biomedical Corpus for the Romanian Language. - Andrea Horbach, Manfred Pinkal:
Semi-Supervised Clustering for Short Answer Scoring. - Jelte van Waterschoot, Guillaume Dubuisson Duplessis, Lorenzo Gatti, Merijn Bruijnes, Dirk Heylen:
An Information-Providing Closed-Domain Human-Agent Interaction Corpus. - Junqing He, Xian Huang, Xuemin Zhao, Yan Zhang, Yonghong Yan:
Discriminating between Similar Languages on Imbalanced Conversational Texts. - Marzieh Fadaee, Arianna Bisazza, Christof Monz:
Examining the Tip of the Iceberg: A Data Set for Idiom Translation. - Jannik Strötgen, Anne-Lyse Minard, Lukas Lange, Manuela Speranza, Bernardo Magnini:
KRAUTS: A German Temporally Annotated News Corpus. - Uxoa Iñurrieta Urmeneta, Itziar Aduriz, Arantza Díaz de Ilarraza, Gorka Labaka, Kepa Sarasola:
Konbitzul: an MWE-specific database for Spanish-Basque. - Agnieszka Falenska, Kerstin Eckart, Jonas Kuhn:
Moving TIGER beyond Sentence-Level. - Michael Filhol, Mohamed Nassime Hadjadj:
Elicitation protocol and material for a corpus of long prepared monologues in Sign Language. - Amir Vaheb, Ali Janalizadeh Choobbasti, Mahdi Mortazavi, Saeid Safavi, Behnam Sabeti:
MirasVoice: A bilingual (English-Persian) speech corpus. - Liat Ein-Dor, Alon Halfon, Yoav Kantor, Ran Levy, Yosi Mass, Ruty Rinott, Eyal Shnarch, Noam Slonim:
Semantic Relatedness of Wikipedia Concepts - Benchmark Data and a Working Solution. - Stefania Pecore, Jeanne Villaneau:
Complex and Precise Movie and Book Annotations in French Language for Aspect Based Sentiment Analysis. - François Lareau, Florie Lambrey, Ieva Dubinskaite, Daniel Galarreta-Piquette, Maryam Nejat:
GenDR: A Generic Deep Realizer with Complex Lexicalization. - Andre Cianflone, Leila Kosseim:
Attention for Implicit Discourse Relation Recognition. - Juliano Efson Sales, Siamak Barzegar, Wellington Franco, Bernhard Bermeitinger, Tiago Cunha, Brian Davis, André Freitas, Siegfried Handschuh:
A Multilingual Test Collection for the Semantic Search of Entity Categories. - Soumia Dermouche, Catherine Pelachaud:
From analysis to modeling of engagement as sequences of multimodal behaviors. - Antonio Moreno-Ortiz, Chantal Pérez Hernández:
Lingmotif-lex: a Wide-coverage, State-of-the-art Lexicon for Sentiment Analysis. - Scott Piao, Paul Rayson, Dawn Knight, Gareth Watkins:
Towards a Welsh Semantic Annotation System. - Koichiro Yoshino, Yoko Ishikawa, Masahiro Mizukami, Yu Suzuki, Sakriani Sakti, Satoshi Nakamura:
Dialogue Scenario Collection of Persuasive Dialogue with Emotional Expressions via Crowdsourcing. - Koichiro Yoshino, Hiroki Tanaka, Kyoshiro Sugiyama, Makoto Kondo, Satoshi Nakamura:
Japanese Dialogue Corpus of Information Navigation and Attentive Listening Annotated with Extended ISO-24617-2 Dialogue Act Tags. - Yiou Wang, Takuji Tahara:
A Japanese Corpus for Analyzing Customer Loyalty Information. - Nikhil Krishnaswamy, James Pustejovsky:
An Evaluation Framework for Multimodal Interaction. - Heike Brock, Kazuhiro Nakadai:
Deep JSLC: A Multimodal Corpus Collection for Data-driven Generation of Japanese Sign Language Expressions. - Henny Sluyter-Gäthje, Pintu Lohar, Haithem Afli, Andy Way:
FooTweets: A Bilingual Parallel Corpus of World Cup Tweets. - Masashi Yokota, Hideki Nakayama:
Augmenting Image Question Answering Dataset by Exploiting Image Captions. - Ramesh R. Manuvinakurike, Jacqueline Brixey, Trung Bui, Walter Chang, Doo Soon Kim, Ron Artstein, Kallirroi Georgila:
Edit me: A Corpus and a Framework for Understanding Natural Language Image Editing. - Ron Artstein, Jill Boberg, Alesia Gainer, Jonathan Gratch, Emmanuel Johnson, Anton Leuski, Gale M. Lucas, David R. Traum:
The Niki and Julie Corpus: Collaborative Multimodal Dialogues between Humans, Robots, and Virtual Agents. - Randah Alharbi, Walid Magdy, Kareem Darwish, Ahmed Abdelali, Hamdy Mubarak:
Part-of-Speech Tagging for Arabic Gulf Dialect Using Bi-LSTM. - Nan Wang, Yan Song, Fei Xia:
Constructing a Chinese Medical Conversation Corpus Annotated with Conversational Structures and Actions. - Imed Laaridh, Christine Meunier, Corinne Fredouille:
Dysarthric speech evaluation: automatic and perceptual approaches. - Ryo Ishii, Ryuichiro Higashinaka, Junji Tomita:
Predicting Nods by using Dialogue Acts in Dialogue. - Kaoru Ito, Hiroyuki Nagai, Taro Okahisa, Shoko Wakamiya, Tomohide Iwao, Eiji Aramaki:
J-MeDic: A Japanese Disease Name Dictionary based on Real Clinical Usage. - Corine Astésano, Mathieu Balaguer, Jérôme Farinas, Corinne Fredouille, Pascal Gaillard, Alain Ghio, Imed Laaridh, Muriel Lalain, Benoît Lepage, Julie Mauclair, Olivier Nocaudie, Julien Pinquier, Oriol Pont, Gilles Pouchoulin, Michèle Puech, Danièle Robert, Etienne Sicard, Virginie Woisard:
Carcinologic Speech Severity Index Project: A Database of Speech Disorder Productions to Assess Quality of Life Related to Speech After Cancer. - Ahmed Abdelali, Irina P. Temnikova, Samy Hedaya, Stephan Vogel:
The WAW Corpus: The First Corpus of Interpreted Speeches and their Translations for English and Arabic. - Iris Hendrickx, Eirini Takoulidou, Thanasis Naskos, Katia Lida Kermanidis, Vilelmini Sosoni, Hugo De Vos, Maria Stasimioti, Menno van Zaanen, Panayota Georgakopoulou, Valia Kordoni, Maja Popovic, Markus Egg, Antal van den Bosch:
A Multilingual Wikified Data Set of Educational Material. - Minh-Tien Nguyen, Viet Dac Lai, Huy-Tien Nguyen, Minh-Le Nguyen:
TSix: A Human-involved-creation Dataset for Tweet Summarization. - Deniz Zeyrek, Murathan Kurfali:
An Assessment of Explicit Inter- and Intra-sentential Discourse Connectives in Turkish Discourse Bank. - Wajdi Zaghouani, Anis Charfi:
Arap-Tweet: A Large Multi-Dialect Twitter Corpus for Gender, Age and Language Variety Identification. - Joseph Mariani, Gil Francopoulo, Patrick Paroubek:
Measuring Innovation in Speech and Language Processing Publications. - Chandrakant Bothe, Cornelius Weber, Sven Magg, Stefan Wermter:
A Context-based Approach for Dialogue Act Recognition using Simple Recurrent Neural Networks. - Gabriel Marzinotto, Jérémy Auguste, Frédéric Béchet, Géraldine Damnati, Alexis Nasr:
Semantic Frame Parsing for Information Extraction : the CALOR corpus. - Anna Feltracco, Elisabetta Jezek, Bernardo Magnini:
Enriching a Lexicon of Discourse Connectives with Corpus-based Data. - Salam Khalifa, Nizar Habash, Fadhl Eryani, Ossama Obeid, Dana Abdulrahim, Meera Al Kaabi:
A Morphologically Annotated Corpus of Emirati Arabic. - Kyoko Kanzaki, Hitoshi Isahara:
Building a List of Synonymous Words and Phrases of Japanese Compound Verbs. - Lung-Hao Lee, Yuen-Hsien Tseng, Li-Ping Chang:
Building a TOCFL Learner Corpus for Chinese Grammatical Error Diagnosis. - Dainis Boumber, Yifan Zhang, Arjun Mukherjee:
Experiments with Convolutional Neural Networks for Multi-Label Authorship Attribution. - Peter Schmitz, Enrico Francesconi, Najeh Hajlaoui, Brahim Batouche:
PMKI: an European Commission action for the interoperability, maintainability and sustainability of Language Resources. - Patricia Braunger, Wolfgang Maier, Jan Wessling, Maria Schmidt:
Towards an Automatic Assessment of Crowdsourced Data for NLU. - Mihael Arcan, Elena Montiel-Ponsoda, John Philip McCrae, Paul Buitelaar:
Automatic Enrichment of Terminological Resources: the IATE RDF Example. - Carolina Scarton, Gustavo Paetzold, Lucia Specia:
SimPA: A Sentence-Level Simplification Corpus for the Public Administration Domain. - Luis Chiruzzo, Dina Wonsever:
Spanish HPSG Treebank based on the AnCora Corpus. - Jean-Philippe Goldman, Sandra Schwab:
MIAPARLE: Online training for the discrimination of stress contrasts. - Michael Stadtschnitzer, Christoph Schmidt:
Data-Driven Pronunciation Modeling of Swiss German Dialectal Speech for Automatic Speech Recognition. - Jinyoung Yeo, Gyeongbok Lee, Gengyu Wang, Seungtaek Choi, Hyunsouk Cho, Reinald Kim Amplayo, Seung-won Hwang:
Visual Choice of Plausible Alternatives: An Evaluation of Image-based Commonsense Causal Reasoning. - Thomas Gaillat, Manel Zarrouk, André Freitas, Brian Davis:
The SSIX Corpora: Three Gold Standard Corpora for Sentiment Analysis in English, Spanish and German Financial Microblogs. - Binyam Ephrem Seyoum, Yusuke Miyao, Baye Yimam Mekonnen:
Universal Dependencies for Amharic. - Rui Sakaida, Ryosaku Makino, Mayumi Bono:
Preliminary Analysis of Embodied Interactions between Science Communicators and Visitors Based on a Multimodal Corpus of Japanese Conversations in a Science Museum. - Nathan Green, Septina Dian Larasati:
The First 100 Days: A Corpus Of Political Agendas on Twitter. - Kathleen Ahrens, Huiheng Zeng, Shun Han Rebekah Wong:
Using a Corpus of English and Chinese Political Speeches for Metaphor Analysis. - William Léchelle, Philippe Langlais:
Revisiting the Task of Scoring Open IE Relations. - Shweta Yadav, Asif Ekbal, Sriparna Saha, Pushpak Bhattacharyya:
Medical Sentiment Analysis using Social Media: Towards building a Patient Assisted System. - Houda Saadane, Hosni Seffih, Christian Fluhr, Khalid Choukri, Nasredine Semmar:
Automatic Identification of Maghreb Dialects Using a Dictionary-Based Approach. - Amarsanaa Ganbold, Altangerel Chagnaa, Gábor Bella:
Using Crowd Agreement for Wordnet Localization. - Aditya Joshi, Pranav Goel, Pushpak Bhattacharyya, Mark J. Carman:
Sarcasm Target Identification: Dataset and An Introductory Approach. - Fathima Farhath, Pranavan Theivendiram, Surangika Ranathunga, Sanath Jayasena, Gihan Dias:
Improving domain-specific SMT for low-resourced languages using data from different domains. - Bolette S. Pedersen, Sanni Nimb, Anders Søgaard, Mareike Hartmann, Sussi Olsen:
A Danish FrameNet Lexicon and an Annotated Corpus Used for Training and Evaluating a Semantic Frame Classifier. - João Rodrigues, António Branco:
Finely Tuned, 2 Billion Token Based Word Embeddings for Portuguese. - Maria Koutsombogera, Carl Vogel:
Modeling Collaborative Multimodal Behavior in Group Dialogues: The MULTISIMO Corpus. - Jorge A. Wagner Filho, Rodrigo Wilkens, Marco Idiart, Aline Villavicencio:
The brWaC Corpus: A New Open Resource for Brazilian Portuguese. - Kyungtae Lim, Niko Partanen, Thierry Poibeau:
Multilingual Dependency Parsing for Low-Resource Languages: Case Studies on North Saami and Komi-Zyrian. - Rajdeep Sarkar, John Philip McCrae, Paul Buitelaar:
A supervised approach to taxonomy extraction using word embeddings. - Charles Jochim, Francesca Bonin, Roy Bar-Haim, Noam Slonim:
SLIDE - a Sentiment Lexicon of Common Idioms. - Roxane Segers, Tommaso Caselli, Piek Vossen:
The Circumstantial Event Ontology (CEO) and ECB+/CEO: an Ontology and Corpus for Implicit Causal Relations between Events. - Vassilis Papavassiliou, Prokopis Prokopidis, Stelios Piperidis:
Discovering Parallel Language Resources for Training MT Engines. - Laura Van Brussel, Arda Tezcan, Lieve Macken:
A fine-grained error analysis of NMT, SMT and RBMT output for English-to-Dutch. - Yi Zhang, Xu Sun:
A Chinese Dataset with Negative Full Forms for General Abbreviation Prediction. - Filip Klubicka, Giancarlo D. Salton, John D. Kelleher:
Is it worth it? Budget-related evaluation metrics for model selection. - João Sequeira, Teresa Gonçalves, Paulo Quaresma, Amália Mendes, Iris Hendrickx:
A Multi- versus a Single-classifier Approach for the Identification of Modality in the Portuguese Language. - Marie Mikulová, Eduard Bejcek:
ForFun 1.0: Prague Database of Forms and Functions - An Invaluable Resource for Linguistic Research. - Muhamed Al-Khalil, Hind Saddiki, Nizar Habash, Latifa Al-Sulaiti:
A Leveled Reading Corpus of Modern Standard Arabic. - Sanja Stajner, Sergiu Nisioi:
A Detailed Evaluation of Neural Sequence-to-Sequence Models for In-domain and Cross-domain Text Simplification. - Yan Cao, Yasuhiro Minami, Yuko Okumura, Tessei Kobayashi:
Analyzing Vocabulary Commonality Index Using Large-scaled Database of Child Language Development. - Minh Le, Antske Fokkens:
Neural Models of Selectional Preferences for Implicit Semantic Role Labeling. - Amir More, Özlem Çetinoglu, Çagri Çöltekin, Nizar Habash, Benoît Sagot, Djamé Seddah, Dima Taji, Reut Tsarfaty:
CoNLL-UL: Universal Morphological Lattices for Universal Dependency Parsing. - Makoto Yamazaki, Yumi Miyazaki, Wakako Kashino:
Annotation and Quantitative Analysis of Speaker Information in Novel Conversation Sentences in Japanese. - Gregor Wiedemann, Gerhard Heyer:
Page Stream Segmentation with Convolutional Neural Nets Combining Textual and Visual Features. - Chiraag Lala, Lucia Specia:
Multimodal Lexical Translation. - Yudai Kishimoto, Shinnosuke Sawada, Yugo Murawaki, Daisuke Kawahara, Sadao Kurohashi:
Improving Crowdsourcing-Based Annotation of Japanese Discourse Relations. - Lilja Øvrelid, Andre Kåsen, Kristin Hagen, Anders Nøklestad, Per Erik Solberg, Janne Bondi Johannessen:
The LIA Treebank of Spoken Norwegian Dialects. - Alina Wróblewska:
Polish Corpus of Annotated Descriptions of Images. - Haris Bin Zia, Agha Ali Raza, Awais Athar:
PronouncUR: An Urdu Pronunciation Lexicon Generator. - Stelios Piperidis, Penny Labropoulou, Miltos Deligiannis, Maria Giagkou:
Managing Public Sector Data for Multilingual Applications Development. - Emer Gilmartin, Carl Vogel, Nick Campbell:
Chats and Chunks: Annotation and Analysis of Multiparty Long Casual Conversations. - Magalie Ochs, Philippe Blache, Grégoire de Montcheuil, Jean-Marie Pergandi, Jorane Saubesty, Daniel Francon, Daniel Mestre:
A Semi-autonomous System for Creating a Human-Machine Interaction Corpus in Virtual Reality: Application to the ACORFORMed System for Training Doctors to Break Bad News. - Chae-Gyun Lim, Young-Seob Jeong, Ho-Jin Choi:
Korean TimeBank Including Relative Temporal Information. - Pavel Král, Ladislav Lenc:
Czech Text Document Corpus v 2.0. - Witold Kieras, Marcin Wolinski:
Manually Annotated Corpus of Polish Texts Published between 1830 and 1918. - Arif Khan, Ingmar Steiner, Yusuke Sugano, Andreas Bulling, Ross G. MacDonald:
A Multimodal Corpus of Expert Gaze and Behavior during Phonetic Segmentation Tasks. - Vilelmini Sosoni, Katia Lida Kermanidis, Maria Stasimioti, Thanasis Naskos, Eirini Takoulidou, Menno van Zaanen, Sheila Castilho, Panayota Georgakopoulou, Valia Kordoni, Markus Egg:
Translation Crowdsourcing: Creating a Multilingual Corpus of Online Educational Content. - Qi Lu, YaoSheng Yang, Zhenghua Li, Wenliang Chen, Min Zhang:
M-CNER: A Corpus for Chinese Named Entity Recognition in Multi-Domains. - Zhongxi Cai, Koichiro Ryu, Shigeki Matsubara:
Statistical Analysis of Missing Translation in Simultaneous Interpretation Using A Large-scale Bilingual Speech Corpus. - Claudia Soria, Valeria Quochi, Irene Russo:
The DLDP Survey on Digital Use and Usability of EU Regional and Minority Languages. - Agnieszka Mykowiecka, Malgorzata Marciniak, Piotr Rychlik:
SimLex-999 for Polish. - Thanh-Le Ha, Jan Niehues, Matthias Sperber, Ngoc-Quan Pham, Alexander Waibel:
KIT-Multi: A Translation-Oriented Multilingual Embedding Corpus. - Patrick Huber, Jan Niehues, Alex Waibel:
Automated Evaluation of Out-of-Context Errors. - Stephanie Gross, Matthias Hirschmanner, Brigitte Krenn, Friedrich Neubarth, Michael Zillich:
Action Verb Corpus. - Rashmi Sankepally, Douglas W. Oard:
An Initial Test Collection for Ranked Retrieval of SMS Conversations. - Christina Lohr, Sven Buechel, Udo Hahn:
Sharing Copies of Synthetic Clinical Corpora without Physical Distribution - A Case Study to Get Around IPRs and Privacy Constraints Featuring the German JSYNCC Corpus. - Mohamed Nassime Hadjadj, Michael Filhol, Annelies Braffort:
Modeling French Sign Language: a proposal for a semantically compositional system. - Mahmoud El-Haj, Paul Rayson, Scott Piao, Jo Knight:
Profiling Medical Journal Articles Using a Gene Ontology Semantic Tagger. - Nathalie Camelin, Géraldine Damnati, Abdessalam Bouchekif, Anaïs Landeau, Delphine Charlet, Yannick Estève:
FrNewsLink : a corpus linking TV Broadcast News Segments and Press Articles. - Manuela Sanguinetti, Fabio Poletto, Cristina Bosco, Viviana Patti, Marco Stranisci:
An Italian Twitter Corpus of Hate Speech against Immigrants. - Kyungjae Lee, Kyoungho Yoon, Sunghyun Park, Seung-won Hwang:
Semi-supervised Training Data Generation for Multilingual Question Answering. - Patrik Jonell, Mattias Bystedt, Per Fallgren, Dimosthenis Kontogiorgos, José Lopes, Zofia Malisz, Samuel Mascarenhas, Catharine Oertel, Eran Raveh, Todd Shore:
FARMI: A FrAmework for Recording Multi-Modal Interactions. - Cédric Fayet, Arnaud Delhay, Damien Lolive, Pierre-François Marteau:
EMO&LY (EMOtion and AnomaLY) : A new corpus for anomaly detection in an audiovisual stream with emotional context. - Shu-Kai Hsieh, Yu-Hsiang Tseng, Chih-yao Lee, Chiung-Yu Chiang:
Fluid Annotation: A Granularity-aware Annotation Tool for Chinese Word Fluidity. - Aghilas Sini, Damien Lolive, Gaëlle Vidal, Marie Tahon, Elisabeth Delais-Roussarie:
SynPaFlex-Corpus: An Expressive French Audiobooks Corpus dedicated to expressive speech synthesis. - Lydia Müller, Uwe Quasthoff, Maciej Sumalvico:
Corpora of Typical Sentences. - Janaka Chathuranga, Shanika Ediriweera, Ravindu Hasantha, Pranidhith Munasinghe, Surangika Ranathunga:
Annotating Opinions and Opinion Targets in Student Course Feedback. - Adeline Nazarenko, François Lévy, Adam Z. Wyner:
An Annotation Language for Semantic Search of Legal Sources. - Tolga Uslu, Alexander Mehler, Daniel Baumartz, Alexander Henlein, Wahed Hemati:
FastSense: An Efficient Word Sense Disambiguation Classifier. - Marc Kupietz, Harald Lüngen, Pawel Kamocki, Andreas Witt:
The German Reference Corpus DeReKo: New Developments - New Opportunities. - Julien Plu, Roman Prokofyev, Alberto Tonon, Philippe Cudré-Mauroux, Djellel Eddine Difallah, Raphaël Troncy, Giuseppe Rizzo:
Sanaphor++: Combining Deep Neural Networks with Semantics for Coreference Resolution. - Gavin Abercrombie, Riza Batista-Navarro:
'Aye' or 'No'? Speech-level Sentiment Analysis of Hansard UK Parliamentary Debate Transcripts. - Noelia Migueles-Abraira, Rodrigo Agerri, Arantza Díaz de Ilarraza:
Annotating Abstract Meaning Representations for Spanish. - Claudia Marzi, Marcello Ferro, Ouafae Nahli, Patrizia Belik, Stavros Bompolas, Vito Pirrelli:
Evaluating Inflectional Complexity Crosslinguistically: a Processing Perspective. - Steinþór Steingrímsson, Sigrún Helgadóttir, Eiríkur Rögnvaldsson, Starkaður Barkarson, Jón Guðnason:
Risamálheild: A Very Large Icelandic Text Corpus. - Robert Jimerson, Emily Prud'hommeaux:
ASR for Documenting Acutely Under-Resourced Indigenous Languages. - Sashi Novitasari, Quoc Truong Do, Sakriani Sakti, Dessi Puji Lestari, Satoshi Nakamura:
Construction of English-French Multimodal Affective Conversational Corpus from TV Dramas. - Shubham Bhardwaj, Neelamadhav Gantayat, Nikhil Chaturvedi, Rahul Garg, Sumeet Agarwal:
SandhiKosh: A Benchmark Corpus for Evaluating Sanskrit Sandhi Tools. - Andrei Dulceanu, Thang Le Dinh, Walter Chang, Trung Bui, Doo Soon Kim, Manh Chien Vu, Seokhwan Kim:
PhotoshopQuiA: A Corpus of Non-Factoid Questions and Answers for Why-Question Answering. - Jon Chamberlain, Udo Kruschwitz, Orland Hoeber:
Scalable Visualisation of Sentiment and Stance. - Marilyn A. Walker, Albry Smither, Shereen Oraby, Vrindavan Harrison, Hadar Shemtov:
Exploring Conversational Language Generation for Rich Content about Hotels. - Winston Wu, David Yarowsky:
A Comparative Study of Extremely Low-Resource Transliteration of the World's Languages. - Christopher R. Norman, Mariska M. G. Leeflang, Pierre Zweigenbaum, Aurélie Névéol:
Automating Document Discovery in the Systematic Review Process: How to Use Chaff to Extract Wheat. - Diego Moussallem, Thiago Castro Ferreira, Marcos Zampieri, Maria Cláudia Cavalcanti, Geraldo Xexéo, Mariana L. Neves, Axel-Cyrille Ngonga Ngomo:
RDF2PT: Generating Brazilian Portuguese Texts from RDF Data. - Alfred Sliwa, Yuan Man, Ruishen Liu, Niravkumar Borad, Seyedeh Ziyaei, Mina Ghobadi, Firas Sabbah, Ahmet Aker:
Multi-lingual Argumentative Corpora in English, Turkish, Greek, Albanian, Croatian, Serbian, Macedonian, Bulgarian, Romanian and Arabic. - Felix Gervits, Matthias Scheutz:
Towards a Conversation-Analytic Taxonomy of Speech Overlap. - Dimitris Pappas, Ion Androutsopoulos, Haris Papageorgiou:
BioRead: A New Dataset for Biomedical Reading Comprehension. - Patrick Littell, Tom McCoy, Na-Rae Han, Shruti Rijhwani, Zaid Sheikh, David R. Mortensen, Teruko Mitamura, Lori S. Levin:
Parser combinators for Tigrinya and Oromo morphology. - Vincent Kríz, Barbora Hladká:
Czech Legal Text Treebank 2.0. - Keith Curtis, Nick Campbell, Gareth J. F. Jones:
Development of an Annotated Multimodal Dataset for the Investigation of Classification and Summarisation of Presentations using High-Level Paralinguistic Features. - Sonja Bosch, Thomas Eckart, Bettina Klimek, Dirk Goldhahn, Uwe Quasthoff:
Preparation and Usage of Xhosa Lexicographical Data for a Multilingual, Federated Environment. - Adarsh Kumar, Sandipan Dandapat, Sushil Chordia:
Translating Web Search Queries into Natural Language Questions. - Isabel Lacruz, Michael Carl, Masaru Yamada:
Literality and cognitive effort: Japanese and Spanish. - Garland McNew, Curdin Derungs, Steven Moran:
Towards faithfully visualizing global linguistic diversity. - Amália Mendes, Iria del Río Gayo, Manfred Stede, Felix Dombek:
A Lexicon of Discourse Markers for Portuguese - LDM-PT. - Chatrine Qwaider, Motaz Saad, Stergios Chatzikyriakidis, Simon Dobnik:
Shami: A Corpus of Levantine Arabic Dialects. - Mirko Tavosanis, Federica Cominetti:
The ICoN Corpus of Academic Written Italian (L1 and L2). - Edwin Simonnet, Sahar Ghannay, Nathalie Camelin, Yannick Estève:
Simulating ASR errors for training SLU systems. - Piotr Andruszkiewicz, Rafal Hazan:
Annotated Corpus of Scientific Conference's Homepages for Information Extraction. - Darja Fiser, Jakob Lenardic, Tomaz Erjavec:
CLARIN's Key Resource Families. - Michele Berlingerio, Francesca Bonin:
Towards a music-language mapping. - Oussama Ahmia, Nicolas Béchet, Pierre-François Marteau:
Two Multilingual Corpora Extracted from the Tenders Electronic Daily for Machine Learning and Machine Translation Applications. - David Lukes, Marie Koprivová, Zuzana Komrsková, Petra Klimesová:
Pronunciation Variants and ASR of Colloquial Speech: A Case Study on Czech. - Marie-Claude L'Homme, Benoît Robichaud, Nathalie Prévil:
Browsing the Terminological Structure of a Specialized Domain: A Method Based on Lexical Functions and their Classification. - Anas Fahad Khan, Andrea Bellandi, Francesca Frontini, Monica Monachini:
One Language to rule them all: modelling Morphological Patterns in a Large Scale Italian Lexicon with SWRL. - Bettina Braun, Katharina Zahner:
The Distribution and Prosodic Realization of Verb Forms in German Infant-Directed Speech. - Takashi Yamamura, Kazutaka Shimada:
Annotation and Analysis of Extractive Summaries for the Kyutech Corpus. - Erik Velldal, Lilja Øvrelid, Eivind Alexander Bergem, Cathrine Stadsnes, Samia Touileb, Fredrik Jørgensen:
NoReC: The Norwegian Review Corpus. - Petr Belohlávek, Ondrej Plátek, Zdenek Zabokrtský, Milan Straka:
Using Adversarial Examples in Natural Language Processing. - Marlies van der Wees, Arianna Bisazza, Christof Monz:
Evaluation of Machine Translation Performance Across Multiple Genres and Languages. - Jacobo Rouces, Nina Tahmasebi, Lars Borin, Stian Rødven Eide:
SenSALDO: Creating a Sentiment Lexicon for Swedish. - Giorgia Di Tommaso, Stefano Faralli, Paola Velardi:
A Large Multilingual and Multi-domain Dataset for Recommender Systems. - Christian Chiarcos, Émilie Pagé-Perron, Ilya Khait, Niko Schenk, Lucas Reckling:
Towards a Linked Open Data Edition of Sumerian Corpora. - Siamak Barzegar, Brian Davis, Manel Zarrouk, Siegfried Handschuh, André Freitas:
SemR-11: A Multi-Lingual Gold-Standard for Semantic Similarity and Relatedness for Eleven Languages. - Fabian Barteld, Sarah Ihden, Katharina Dreessen, Ingrid Schröder:
HiNTS: A Tagset for Middle Low German. - Yang Yu, Vincent Ng:
Improving Unsupervised Keyphrase Extraction using Background Knowledge. - Sarah Fillwock, David R. Traum:
Identification of Personal Information Shared in Chat-Oriented Dialogue. - Ji Young Lee, Franck Dernoncourt, Peter Szolovits:
Transfer Learning for Named-Entity Recognition with Neural Networks. - Natalia A. Tomashenko, Yannick Estève:
Evaluation of Feature-Space Speaker Adaptation for End-to-End Acoustic Models. - Ramy Eskander:
SentiArabic: A Sentiment Analyzer for Standard Arabic. - Melania Cabezas-García, Pilar León Araúz:
Towards the Inference of Semantic Relations in Complex Nominals: a Pilot Study. - Steven Neale, Kevin Donnelly, Gareth Watkins, Dawn Knight:
Leveraging Lexical Resources and Constraint Grammar for Rule-Based Part-of-Speech Tagging in Welsh. - Steven Moran, Danica Pajovic, Sabine Stoll:
Cross-linguistically Small World Networks are Ubiquitous in Child-directed Speech. - Franck Dernoncourt, Mohammad Ghassemi, Walter Chang:
A Repository of Corpora for Summarization. - David R. Mortensen, Siddharth Dalmia, Patrick Littell:
Epitran: Precision G2P for Many Languages. - Loïc Grobol, Isabelle Tellier, Éric Villemonte de la Clergerie, Marco Dinarelli, Frédéric Landragin:
ANCOR-AS: Enriching the ANCOR Corpus with Syntactic Annotations. - Keying Li, John Lee:
L1-L2 Parallel Treebank of Learner Chinese: Overused and Underused Syntactic Structures. - Rob Voigt, David Jurgens, Vinodkumar Prabhakaran, Dan Jurafsky, Yulia Tsvetkov:
RtGender: A Corpus for Studying Differential Responses to Gender. - Rüdiger Gleim, Alexander Mehler, Sung Y. Song:
WikiDragon: A Java Framework For Diachronic Content And Network Analysis Of MediaWikis. - Luis Gerardo Mojica de la Vega, Vincent Ng:
Modeling Trolling in Social Media Conversations. - Sara Meftah, Nasredine Semmar:
A Neural Network Model for Part-Of-Speech Tagging of Social Media Texts. - Asad B. Sayeed, Pavel Shkadzko, Vera Demberg:
Rollenwechsel-English: a large-scale semantic role corpus. - Dmitrii Fedotov, Denis Ivanko, Maxim Sidorov, Wolfgang Minker:
Contextual Dependencies in Time-Continuous Multidimensional Affect Recognition. - Jordan Lachler, Lene Antonsen, Trond Trosterud, Sjur N. Moshagen, Antti Arppe:
Modeling Northern Haida Verb Morphology. - Cécile Fougeron, Véronique Delvaux, Lucie Ménard, Marina Laganaro:
The MonPaGe_HA Database for the Documentation of Spoken French Throughout Adulthood. - Muhammad Abdul-Mageed, Hassan Alhuzali, Mohamed Elaraby:
You Tweet What You Speak: A City-Level Dataset of Arabic Dialects. - Roberts Dargis, Ilze Auzina, Kristine Levane-Petrova:
The Use of Text Alignment in Semi-Automatic Error Analysis: Use Case in the Development of the Corpus of the Latvian Language Learners. - Diptesh Kanojia, Kevin Patel, Pushpak Bhattacharyya:
Indian Language Wordnets and their Linkages with Princeton WordNet. - Girishkumar Ponkiya, Kevin Patel, Pushpak Bhattacharyya, Girish K. Palshikar:
Towards a Standardized Dataset for Noun Compound Interpretation. - Ekaterina Lapshinova-Koltunski, Christian Hardmeier, Pauline Krielke:
ParCorFull: a Parallel Corpus Annotated with Full Coreference. - Thi-Lan Ngo, Khac Linh Pham, Hideaki Takeda:
A Vietnamese Dialog Act Corpus Based on ISO 24617-2 standard. - Pierre Zweigenbaum, Serge Sharoff, Reinhard Rapp:
A Multilingual Dataset for Evaluating Parallel Sentence Extraction from Comparable Corpora. - Daniel Peñaloza, Juanjosé Tenorio Peña, Rodrigo López, Héctor Gómez, Arturo Oncevay-Marcos, Marco Antonio Sobrevilla Cabezudo:
Corpus Building and Evaluation of Aspect-based Opinion Summaries from Tweets in Spanish. - Saif M. Mohammad, Svetlana Kiritchenko:
WikiArt Emotions: An Annotated Dataset of Emotions Evoked by Art. - Sara Rodríguez-Fernández, Roberto Carlini, Leo Wanner:
Generation of a Spanish Artificial Collocation Error Corpus. - Caitlin Richter, Matthew Wickes, Deniz Beser, Mitchell Marcus:
Low-resource Post Processing of Noisy OCR Output for Historical Corpus Digitisation. - Yo Ehara:
Building an English Vocabulary Knowledge Dataset of Japanese English-as-a-Second-Language Learners Using Crowdsourcing. - Manjuan Duan, William Schuler:
Test Sets for Chinese Nonlocal Dependency Parsing. - Yuchen Zhang, Nianwen Xue:
Structured Interpretation of Temporal Relations. - Nishitha Guntakandla, Rodney Nielsen:
Annotating Reflections for Health Behavior Change Therapy. - Antske Fokkens, Nel Ruigrok, Camiel J. Beukeboom, Gagestein Sarah, Wouter Van Attveldt:
Studying Muslim Stereotyping through Microportrait Extraction. - Sebastien Delecraz, Alexis Nasr, Frédéric Béchet, Benoît Favre:
Adding Syntactic Annotations to Flickr30k Entities Corpus for Multimodal Ambiguous Prepositional-Phrase Attachment Resolution. - Hanna Hedeland, Timm Lehmberg, Felix Rau, Sophie Salffner, Mandana Seyfeddinipur, Andreas Witt:
Introducing the CLARIN Knowledge Centre for Linguistic Diversity and Language Documentation. - Yuanliang Meng, Anna Rumshisky, Florence R. Sullivan:
Automatic Labeling of Problem-Solving Dialogues for Computational Microgenetic Learning Analytics. - Petra Galuscáková, Lucie Neuzilova:
Low Resource Methods for Medieval Document Sections Analysis. - Jan Kocon, Arkadiusz Janz, Maciej Piasecki:
Classifier-based Polarity Propagation in a WordNet. - Iria del Río Gayo, Amália Mendes:
Error annotation in a Learner Corpus of Portuguese. - Ada Wan:
Visualizing the "Dictionary of Regionalisms of France" (DRF). - Richard Eckart de Castilho, Giulia Dore, Thomas Margoni, Penny Labropoulou, Iryna Gurevych:
A Legal Perspective on Training Models for Natural Language Processing. - Robert Herms, Maria Wirzberger, Maximilian Eibl, Günter Daniel Rey:
CoLoSS: Cognitive Load Corpus with Speech and Performance Data from a Symbol-Digit Dual-Task. - Ralf Grubenmann, Don Tuggener, Pius von Däniken, Jan Deriu, Mark Cieliebak:
SB-CH: A Swiss German Corpus with Sentiment Annotations. - Israa Alsarsour, Esraa Mohamed, Reem Suwaileh, Tamer Elsayed:
DART: A Large Dataset of Dialectal Arabic Tweets. - Jennifer Tracey, Stephanie M. Strassel:
VAST: A Corpus of Video Annotation for Speech Technologies. - Markus Zopf:
Auto-hMDS: Automatic Construction of a Large Heterogeneous Multilingual Multi-Document Summarization Corpus. - Aleksander Wawer, Justyna Sarzynska:
The Linguistic Category Model in Polish (LCM-PL). - Andreas Blätte, André Blessing:
The GermaParl Corpus of Parliamentary Protocols. - Valerij Fredriksen, Brage Ekroll Jahren, Björn Gambäck:
Utilizing Large Twitter Corpora to Create Sentiment Lexica. - Winston Wu, David Yarowsky:
Massively Translingual Compound Analysis and Translation Discovery. - Steven Neale:
A Survey on Automatically-Constructed WordNets and their Evaluation: Lexical and Word Embedding-based Approaches. - Verónica Pérez-Rosas, Xuetong Sun, Christy Li, Yuchen Wang, Kenneth Resnicow, Rada Mihalcea:
Analyzing the Quality of Counseling Conversations: the Tell-Tale Signs of High-quality Counseling. - Hajime Senuma, Akiko Aizawa:
Universal Dependencies for Ainu. - Adam Ek, Mats Wirén, Robert Östling, Kristina Nilsson Björkenstam, Gintare Grigonyte, Sofia Gustafson-Capková:
Identifying Speakers and Addressees in Dialogues Extracted from Literary Fiction. - Diego Maguiño Valencia, Arturo Oncevay-Marcos, Marco Antonio Sobrevilla Cabezudo:
WordNet-Shp: Towards the Building of a Lexical Database for a Peruvian Minority Language. - Injy Hamed, Mohamed Elmahdy, Slim Abdennadher:
Collection and Analysis of Code-switch Egyptian Arabic-English Speech Corpus. - Milagro Teruel, Cristian Cardellino, Fernando Cardellino, Laura Alonso Alemany, Serena Villata:
Increasing Argument Annotation Reproducibility by Using Inter-annotator Agreement to Improve Guidelines. - Leonardo Zilio, Rodrigo Wilkens, Cédrick Fairon:
An SLA Corpus Annotated with Pedagogically Relevant Grammatical Structures. - Hayakawa Akira, Carl Vogel, Saturnino Luz, Nick Campbell:
Speech Rate Calculations with Short Utterances: A Study from a Speech-to-Speech, Machine Translation Mediated Map Task. - Eric Kergosien, Amin Farvardin, Maguelonne Teisseire, Marie-Noëlle Bessagnet, Joachim Schöpfel, Stéphane Chaudiron, Bernard Jacquemin, Annig Lacayrelle, Mathieu Roche, Christian Sallaberry, Jean-Philippe Tonneau:
Automatic Identification of Research Fields in Scientific Papers. - Dusan Varis, Natalia Klyueva:
Improving a Neural-based Tagger for Multiword Expressions Identification. - Erwan Moreau, Carl Vogel:
Multilingual Word Segmentation: Training Many Language-Specific Tokenizers Smoothly Thanks to the Universal Dependencies Corpus. - Petra Steiner, Josef Ruppenhofer:
Building a Morphological Treebank for German from a Linguistic Database. - Ada Wan:
Tel(s)-Telle(s)-Signs: Highly Accurate Automatic Crosslingual Hypernym Discovery. - Denys Katerenchuk, David Guy Brizan, Andrew Rosenberg:
Interpersonal Relationship Labels for the CALLHOME Corpus. - Suemi Higuchi, Cláudia Freitas, Bruno Cuconato, Alexandre Rademaker:
Text Mining for History: first steps on building a large dataset. - Charles Welch, Jonathan K. Kummerfeld, Song Feng, Rada Mihalcea:
World Knowledge for Abstract Meaning Representation Parsing. - Andrejs Vasiljevs, Rihards Kalnins, Roberts Rozis, Aivars Berzins:
Collecting Language Resources from Public Administrations in the Nordic and Baltic Countries. - Katsiaryna Aharodnik, Anna Feldman, Jing Peng:
Designing a Russian Idiom-Annotated Corpus. - Aibek Makazhanov, Bagdat Myrzakhmetov, Zhenisbek Assylbekov:
Manual vs Automatic Bitext Extraction. - Djamé Seddah, Éric Villemonte de la Clergerie, Benoît Sagot, Héctor Martínez Alonso, Marie Candito:
Cheating a Parser to Death: Data-driven Cross-Treebank Annotation Transfer. - Samantha Wray:
Classification of Closely Related Sub-dialects of Arabic Using Support-Vector Machines. - Chiara Alzetta, Felice Dell'Orletta, Simonetta Montemagni, Giulia Venturi:
Universal Dependencies and Quantitative Typological Trends. A Case Study on Word Order. - Lorraine Goeuriot, Josiane Mothe, Philippe Mulhem, Eric SanJuan:
Building Evaluation Datasets for Cultural Microblog Retrieval. - Andrea Lösch, Valérie Mapelli, Stelios Piperidis, Andrejs Vasiljevs, Lilli Smal, Thierry Declerck, Eileen Schnur, Khalid Choukri, Josef van Genabith:
European Language Resource Coordination: Collecting Language Resources for Public Sector Multilingual Information Management. - Housam Ziad, John Philip McCrae, Paul Buitelaar:
Teanga: A Linked Data based platform for Natural Language Processing. - Nadezda Okinina, Lionel Nicolas, Verena Lyding:
Transc&Anno: A Graphical Tool for the Transcription and On-the-Fly Annotation of Handwritten Documents. - Dmitry Ustalov, Denis Teslenko, Alexander Panchenko, Mikhail Chernoskutov, Chris Biemann, Simone Paolo Ponzetto:
An Unsupervised Word Sense Disambiguation System for Under-Resourced Languages. - Roeland van Hout, Nicoline van der Sijs, Erwin Komen, Henk van den Heuvel:
A Fast and Flexible Webinterface for Dialect Research in the Low Countries. - Jan Odijk, Alexis Dimitriadis, Martijn van der Klis, Marjo van Koppen, Meie Otten, Remco van der Veen:
The AnnCor CHILDES Treebank. - Zdenka Uresová, Eva Fucíková, Eva Hajicová, Jan Hajic:
Tools for Building an Interlinked Synonym Lexicon Network. - Pierre-Edouard Honnet, Andrei Popescu-Belis, Claudiu Musat, Michael Baeriswyl:
Machine Translation of Low-Resource Spoken Dialects: Strategies for Normalizing Swiss German. - Adrien Barbaresi:
A corpus of German political speeches from the 21st century. - Alice Millour, Karën Fort:
Toward a Lightweight Solution for Less-resourced Languages: Creating a POS Tagger for Alsatian Using Voluntary Crowdsourcing. - Talha Javed, Nizar Habash, Dima Taji:
Palmyra: A Platform Independent Dependency Annotation Tool for Morphologically Rich Languages. - Henk van den Heuvel, Erwin Komen, Nelleke Oostdijk:
Metadata Collection Records for Language Resources. - Stephen Tratz, Nhien Phan:
A Web-based System for Crowd-in-the-Loop Dependency Treebanking. - Ossama Obeid, Salam Khalifa, Nizar Habash, Houda Bouamor, Wajdi Zaghouani, Kemal Oflazer:
MADARi: A Web Interface for Joint Arabic Morphological Annotation and Spelling Correction. - Emily Chen, Lane Schwartz:
A Morphological Analyzer for St. Lawrence Island / Central Siberian Yupik. - Sarah Masud Preum, Md. Rizwan Parvez, Kai-Wei Chang, John A. Stankovic:
A Corpus of Drug Usage Guidelines Annotated with Type of Advice. - Luise Dürlich, Thomas François:
EFLLex: A Graded Lexical Resource for Learners of English as a Foreign Language. - Rodolfo Mercado-Gonzales, José Pereira-Noriega, Marco Antonio Sobrevilla Cabezudo, Arturo Oncevay-Marcos:
ChAnot: An Intelligent Annotation Tool for Indigenous and Highly Agglutinative Languages in Peru. - Hanae Koiso, Yasuharu Den, Yuriko Iseki, Wakako Kashino, Yoshiko Kawabata, Ken'ya Nishikawa, Yayoi Tanaka, Yasuyuki Usuda:
Construction of the Corpus of Everyday Japanese Conversation: An Interim Report. - Steve Cassidy, Onno Crasborn, Henri Nieminen, Wessel Stoop, Micha Hulsbosch, Susan Even, Erwin Komen, Trevor Johnson:
Signbank: Software to Support Web Based Dictionaries of Sign Language. - Naiara Pérez, Montse Cuadros, German Rigau:
Biomedical term normalization of EHRs with UMLS. - Alicia Burga, Mónica Domínguez, Mireia Farrús, Leo Wanner:
Compilation of Corpora for the Study of the Information Structure-Prosody Interface. - Paul Meurer:
The Abkhaz National Corpus. - David Arps, Simon Petitjean:
A Parser for LTAG and Frame Semantics. - Nisarg Jhaveri, Manish Gupta, Vasudeva Varma:
A Workbench for Rapid Generation of Cross-Lingual Summaries. - Torsten Zesch, Andrea Horbach:
ESCRITO - An NLP-Enhanced Educational Scoring Toolkit. - Sanja Stajner, Marc Franco-Salvador, Paolo Rosso, Simone Paolo Ponzetto:
CATS: A Tool for Customized Alignment of Text Simplification Corpora. - Annie Rialland, Martine Adda-Decker, Guy-Noël Kouarata, Gilles Adda, Laurent Besacier, Lori Lamel, Elodie Gauthier, Pierre Godard, Jamison Cooper-Leavitt:
Parallel Corpora in Mboshi (Bantu C25, Congo-Brazzaville). - Roberto Bartolini, Sara Goggi, Monica Monachini, Gabriella Pardelli:
The LREC Workshops Map. - Guillaume Wisniewski:
Errator: a Tool to Help Detect Annotation Errors in the Universal Dependencies Project. - Nancy Ide, Keith Suderman, Jin-Dong Kim:
Mining Biomedical Publications With The LAPPS Grid. - Piek Vossen, Filip Ilievski, Marten Postma, Roxane Segers:
Don't Annotate, but Validate: a Data-to-Text Method for Capturing Event Data. - Adrien Barbaresi, Lothar Lemnitzer, Alexander Geyken:
A database of German definitory contexts from selected web sources. - Hiroyuki Shindo, Yohei Munesada, Yuji Matsumoto:
PDFAnno: a Web-based Linguistic Annotation Tool for PDF Documents. - Markus Gärtner, Uli Hahn, Sibylle Hermann:
Preserving Workflow Reproducibility: The RePlay-DH Client as a Tool for Process Documentation. - Federica Vezzani, Giorgio Maria Di Nunzio, Geneviève Henrot:
TriMED: A Multilingual Terminological Database. - Nicolas Hernandez, Amir Hazem:
PyRATA, Python Rule-based feAture sTructure Analysis. - René Witte, Bahar Sateli:
The LODeXporter: Flexible Generation of Linked Open Data Triples from NLP Frameworks for Automatic Knowledge Base Construction. - Olivier Galibert, Guillaume Bernard, Agnès Delaborde, Sabrina Lecadre, Juliette Kahn:
Matics Software Suite: New Tools for Evaluation and Data Exploration. - Alessandro Panunzi, Lorenzo Gregori, Andrea Amelio Ravelli:
One event, many representations. Mapping action concepts through visual features. - Anisia Katinskaia, Javad Nouri, Roman Yangarber:
Revita: a Language-learning Platform at the Intersection of ITS and CALL. - Jacobo Rouces, Nina Tahmasebi, Lars Borin, Stian Rødven Eide:
Generating a Gold Standard for a Swedish Sentiment Lexicon. - Rodrigo Agerri, Xavier Gómez Guinovart, German Rigau, Miguel Anxo Solla Portela:
Developing New Linguistic Resources and Tools for the Galician Language. - Chantal van Son, Oana Inel, Roser Morante, Lora Aroyo, Piek Vossen:
Resource Interoperability for Sustainable Benchmarking: The Case of Events. - Christian Chiarcos, Niko Schenk:
The ACoLi CoNLL Libraries: Beyond Tab-Separated Values. - Katherine Schmirler, Antti Arppe, Trond Trosterud, Lene Antonsen:
Building a Constraint Grammar Parser for Plains Cree Verbs and Arguments. - Amy Isard, Jon Oberlander, Claire Grover:
Up-cycling Data for Natural Language Generation. - Michael Wayne Goodman, Ryan Georgi, Fei Xia:
PDF-to-Text Reanalysis for Linguistic Data Mining. - Per Fallgren, Zofia Malisz, Jens Edlund:
Bringing Order to Chaos: A Non-Sequential Approach for Browsing Large Sets of Found Audio Data. - Daniel Ferrés, Horacio Saggion, Francesco Ronzano, Àlex Bravo:
PDFdigest: an Adaptable Layout-Aware PDF-to-XML Textual Content Extractor for Scientific Articles. - Christian Chiarcos, Benjamin Kosmehl, Christian Fäth, Maria Sukhareva:
Analyzing Middle High German Syntax with RDF and SPARQL. - Xi Victoria Lin, Chenglong Wang, Luke Zettlemoyer, Michael D. Ernst:
NL2Bash: A Corpus and Semantic Parser for Natural Language Interface to the Linux Operating System. - Mostafa Abdou, Artur Kulmizev, Vinit Ravishankar:
MGAD: Multilingual Generation of Analogy Datasets. - Ingmar Steiner, Sébastien Le Maguer:
Creating New Language and Voice Components for the Updated MaryTTS Text-to-Speech Synthesis Platform. - Vadim Sheinin, Elahe Khorasani, Hangu Yeo, Kun Xu, Ngoc Phuoc An Vo, Octavian Popescu:
QUEST: A Natural Language Interface to Relational Databases. - Patrik Jonell, Catharine Oertel, Dimosthenis Kontogiorgos, Jonas Beskow, Joakim Gustafson:
Crowdsourced Multimodal Corpora Collection Tool. - Montserrat Marimon, Lluís Padró, Jordi Turmo:
Coreference Resolution in FreeLing 4.0. - Tobias Horsmann, Torsten Zesch:
DeepTC - An Extension of DKPro Text Classification for Fostering Reproducibility of Deep Learning Experiments. - Thomas Proisl:
SoMeWeTa: A Part-of-Speech Tagger for German Social Media and Web Texts. - Vivien Macketanz, Renlong Ai, Aljoscha Burchardt, Hans Uszkoreit:
TQ-AutoTest - An Automated Test Suite for (Machine) Translation Quality. - Daniel Khashabi, Mark Sammons, Ben Zhou, Tom Redman, Christos Christodoulopoulos, Vivek Srikumar, Nicholas Rizzolo, Lev-Arie Ratinov, Guanheng Luo, Quang Do, Chen-Tse Tsai, Subhro Roy, Stephen Mayhew, Zhili Feng, John Wieting, Xiaodong Yu, Yangqiu Song, Shashank Gupta, Shyam Upadhyay, Naveen Arivazhagan, Qiang Ning, Shaoshi Ling, Dan Roth:
CogCompNLP: Your Swiss Army Knife for NLP. - Masaya Yamaguchi, Masanori Kitamura, Naomi Yanagida:
Development of a Mobile Observation Support System for Students: FishWatchr Mini. - Tuomo Hiippala, Serafina Orekhova:
Enhancing the AI2 Diagrams Dataset Using Rhetorical Structure Theory. - Bruno Oberle:
SACR: A Drag-and-Drop Based Tool for Coreference Annotation. - Andrei Malchanau, Volha Petukhova, Harry Bunt:
Towards Continuous Dialogue Corpus Creation: writing to corpus and generating from it. - Xiaoqing Li, Jiajun Zhang, Chengqing Zong:
One Sentence One Model for Neural Machine Translation. - Zbynek Zajíc, Lucie Skorkovská, Petr Neduchal, Pavel Ircing, Josef V. Psutka, Marek Hrúz, Ales Prazák, Daniel Soutner, Jan Svec, Lukás Bures, Ludek Müller:
Towards Processing of the Oral History Interviews and Related Printed Documents. - Angus G. Forbes, Kristine Lee, Gus Hahn-Powell, Marco Antonio Valenzuela-Escárcega, Mihai Surdeanu:
Text Annotation Graphs: Annotating Complex Natural Language Phenomena. - Philippe Boula de Mareüil, Albert Rilliard, Frédéric Vernier:
A Speaking Atlas of the Regional Languages of France. - Arianne Reimerink, Pilar León Araúz:
Manzanilla: An Image Annotation Tool for TKB Building. - Stéphan Tulkens, Dominiek Sandra, Walter Daelemans:
WordKit: a Python Package for Orthographic and Phonological Featurization. - Christopher Tauchmann, Thomas Arnold, Andreas Hanselowski, Christian M. Meyer, Margot Mieskes:
Beyond Generic Summarization: A Multi-faceted Hierarchical Summarization Corpus of Large Heterogeneous Data. - António Branco, Ruben Branco, Chakaveh Saedi, João Silva:
Browsing and Supporting Pluricentric Global Wordnet, or just your Wordnet of Interest. - Steffen Remus, Chris Biemann:
Retrofitting Word Representations for Unsupervised Sense Aware Word Similarities. - Tomoko Ohsuga, Yuichi Ishimoto, Tomoko Kajiyama, Shunsuke Kozawa, Kiyotaka Uchimoto, Shuichi Itahashi:
Extending Search System based on Interactive Visualization for Speech Corpora. - Xuan-Nga Cao, Cyrille Dakhlia, Patricia Del Carmen, Mohamed-Amine Jaouani, Malik Ould-Arbi, Emmanuel Dupoux:
BabyCloud, a Technological Platform for Parents and Researchers. - Katrin Schweitzer, Kerstin Eckart, Markus Gärtner, Agnieszka Falenska, Arndt Riester, Ina Rösiger, Antje Schweitzer, Sabrina Stehwien, Jonas Kuhn:
German Radio Interviews: The GRAIN Release of the SFB732 Silver Standard Collection. - Andrew U. Frank, Christine Ivanovic:
Building Literary Corpora for Computational Literary Analysis - A Prototype to Bridge the Gap between CL and DH. - Georg Rehm, Julián Moreno Schneider, Peter Bourgonje:
Automatic and Manual Web Annotations in an Infrastructure to handle Fake News and other Online Media Phenomena. - Behnam Sabeti, Hossein Abedi Firouzjaee, Ali Janalizadeh Choobbasti, S. H. E. Mortazavi Najafabadi, Amir Vaheb:
MirasText: An Automatically Generated Text Corpus for Persian. - Shi Yu, Carlo Geraci, Natasha Abner:
Sign Languages and the Online World Online Dictionaries & Lexicostatistics. - Fahad AlGhamdi, Mona T. Diab:
WASA: A Web Application for Sequence Annotation. - Pilar León Araúz, Arianne Reimerink:
Evaluating EcoLexiCAT: a Terminology-Enhanced CAT Tool. - Wei-Yun Ma, Yueh-Yin Shih:
Extended HowNet 2.0 - An Entity-Relation Common-Sense Representation Model. - Philipp Helfrich, Elias Rieb, Giuseppe Abrami, Andy Lücking, Alexander Mehler:
TreeAnnotator: Versatile Visual Annotation of Hierarchical Text Relations. - Erhard W. Hinrichs, Nancy Ide, James Pustejovsky, Jan Hajic, Marie Hinrichs, Mohammad Fazleh Elahi, Keith Suderman, Marc Verhagen, Kyeongmin Rim, Pavel Stranák, Jozef Misutka:
Bridging the LAPPS Grid and CLARIN. - Markus Gärtner, Jonas Kuhn:
A Lightweight Modeling Middleware for Corpus Processing. - Kazuki Sakai, Akari Inago, Ryuichiro Higashinaka, Yuichiro Yoshikawa, Hiroshi Ishiguro, Junji Tomita:
Creating Large-Scale Argumentation Structures for Dialogue Systems. - Tamás Váradi, Eszter Simon, Bálint Sass, Iván Mittelholcz, Attila Novák, Balázs Indig, Richárd Farkas, Veronika Vincze:
E-magyar - A Digital Language Processing System. - Andreas Niekler, Arnim Bleier, Christian Kahmann, Lisa Posch, Gregor Wiedemann, Kenan Erdogan, Gerhard Heyer, Markus Strohmaier:
ILCM - A Virtual Research Infrastructure for Large-Scale Qualitative Data. - Bettina Klimek, Robert Schädlich, Dustin Kröger, Edwin Knese, Benedikt Elßmann:
LiDo RDF: From a Relational Database to a Linked Data Graph of Linguistic Terms and Bibliographic Data. - Kevin Bowden, JiaQi Wu, Shereen Oraby, Amita Misra, Marilyn A. Walker:
SlugNERDS: A Named Entity Recognition Tool for Open Domain Dialogue Systems. - Deepak Gupta, Surabhi Kumari, Asif Ekbal, Pushpak Bhattacharyya:
MMQA: A Multi-domain Multi-lingual Question-Answering Framework for English and Hindi. - Abdulrahman Alosaimy, Eric Atwell:
Web-based Annotation Tool for Inflectional Language Resources. - Balázs Indig, András Simonyi, Noémi Ligeti-Nagy:
What's Wrong, Python? - A Visual Differ and Graph Library for NLP in Python. - Piotr Pezik:
Increasing the Accessibility of Time-Aligned Speech Corpora with Spokes Mix. - Salar Mohtaj, Behnam Roshanfekr, Atefeh Zafarian, Habibollah Asghari:
Parsivar: A Language Processing Toolkit for Persian. - Juliano Efson Sales, Leonardo Souza, Siamak Barzegar, Brian Davis, André Freitas, Siegfried Handschuh:
Indra: A Word Embedding and Semantic Relatedness Server. - Mokanarangan Thayaparan, Surangika Ranathunga, Uthayasanker Thayasivam:
Graph Based Semi-Supervised Learning Approach for Tamil POS tagging. - Normunds Gruzitis, Lauma Pretkalnina, Baiba Saulite, Laura Rituma, Gunta Nespore-Berzkalne, Arturs Znotins, Peteris Paikens:
Creation of a Balanced State-of-the-Art Multilayer Corpus for NLU. - Giuseppe Abrami, Alexander Mehler:
A UIMA Database Interface for Managing NLP-related Text Annotations. - Gerard de Melo:
Metaphor Suggestions based on a Semantic Metaphor Repository. - Alessandra Teresa Cignarella, Cristina Bosco, Viviana Patti, Mirko Lai:
Application and Analysis of a Multi-layered Scheme for Irony on the Italian Twitter Corpus TWITTIRÒ. - Paul Rodrigues, Valerie Novak, C. Anton Rytting, Julie Yelle, Jennifer Boutz:
Arabic Data Science Toolkit: An API for Arabic Language Feature Extraction. - Benjamin Heinzerling, Michael Strube:
BPEmb: Tokenization-free Pre-trained Subword Embeddings in 275 Languages. - Shabnam Tafreshi, Mona T. Diab:
Sentence and Clause Level Emotion Annotation, Detection, and Classification in a Multi-Genre Corpus. - Hamdy Mubarak:
Build Fast and Accurate Lemmatization for Arabic. - Yanjun Gao, Andrew Warner, Rebecca J. Passonneau:
PyrEval: An Automated Method for Summary Content Analysis. - Alexsandro Fonseca, Fatiha Sadat, François Lareau:
Retrieving Information from the French Lexical Network in RDF/OWL Format. - Danillo da Silva Rocha, Ivandré Paraboni:
Reference production in human-computer interaction: Issues for Corpus-based Referring Expression Generation. - Alexander Panchenko, Dmitry Ustalov, Stefano Faralli, Simone Paolo Ponzetto, Chris Biemann:
Improving Hypernymy Extraction with Distributional Semantic Classes. - Azadeh Mirzaei, Pegah Safari:
Persian Discourse Treebank and coreference corpus. - Alexander Gutkin, Martin Jansche, Tatiana Merkulova:
FonBund: A Library for Combining Cross-lingual Phonological Segment Data. - Jaka Aris Eko Wibawa, Supheakmungkol Sarin, Chenfang Li, Knot Pipatsrisawat, Keshan Sodimana, Oddur Kjartansson, Alexander Gutkin, Martin Jansche, Linne Ha:
Building Open Javanese and Sundanese Corpora for Multilingual Text-to-Speech. - Pasindu De Silva, Theeraphol Wattanavekin, Tang Hao, Knot Pipatsrisawat:
Voice Builder: A Tool for Building Text-To-Speech Voices. - Kazunari Tanaka, Tomoya Iwakura, Yusuke Koyanagi, Noriko Ikeda, Hiroyuki Shindo, Yuji Matsumoto:
Chemical Compounds Knowledge Visualization with Natural Language Processing and Linked Data. - Dietmar Schabus, Marcin Skowron:
Academic-Industrial Perspective on the Development and Deployment of a Moderation System for a Newspaper Website. - Kazuma Takaoka, Sorami Hisamoto, Noriko Kawahara, Miho Sakamoto, Yoshitaka Uchida, Yuji Matsumoto:
Sudachi: a Japanese Tokenizer for Business. - Mason Chua, Daan van Esch, Noah Coccaro, Eunjoon Cho, Sujeet Bhandari, Libin Jia:
Text Normalization Infrastructure that Scales to Hundreds of Language Varieties. - Marcis Pinnis, Andrejs Vasiljevs, Rihards Kalnins, Roberts Rozis, Raivis Skadins, Valters Sics:
Tilde MT Platform for Developing Client Specific MT Solutions. - Christina Funk, Michael Tseng, Ravindran Rajakumar, Linne Ha:
Community-Driven Crowdsourcing: Data Collection with Local Developers. - Kyle Gorman, Gleb Mazovetskiy, Vitaly Nikolaev:
Improving homograph disambiguation with supervised machine learning.
manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.