default search action
12th LREC 2020: Marseille, France
- Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asunción Moreno, Jan Odijk, Stelios Piperidis:
Proceedings of The 12th Language Resources and Evaluation Conference, LREC 2020, Marseille, France, May 11-16, 2020. European Language Resources Association 2020, ISBN 979-10-95546-34-4 - Juntao Yu, Bernd Bohnet, Massimo Poesio:
Neural Mention Detection. 1-10 - Juntao Yu, Alexandra Uma, Massimo Poesio:
A Cluster Ranking Model for Full Anaphora Resolution. 11-20 - Timothée Bernard, Ting Han:
Mandarinograd: A Chinese Collection of Winograd Schemas. 21-26 - Alexander Henlein, Alexander Mehler:
On the Influence of Coreference Resolution on Word Embeddings in Lexical-semantic Evaluation Tasks. 27-33 - Payal Khullar, Kushal Majmundar, Manish Shrivastava:
NoEl: An Annotated Corpus for Noun Ellipsis in English. 34-43 - David Bamman, Olivia Lewke, Anya Mansoor:
An Annotated Dataset of Coreference in English Literature. 44-54 - Janis Pagel, Nils Reiter:
GerDraCor-Coref: A Coreference Corpus for Dramatic Texts in German. 55-64 - Parag Dakle, Takshak Desai, Dan I. Moldovan:
A Study on Entity Resolution for Email Conversations. 65-73 - Rahul Aralikatte, Anders Søgaard:
Model-based Annotation of Coreference. 74-79 - Rodrigo Wilkens, Bruno Oberle, Frédéric Landragin, Amalia Todirascu:
French Coreference for Spoken and Written Language. 80-89 - Abdulrahman Aloraini, Massimo Poesio:
Cross-lingual Zero Pronoun Resolution. 90-98 - Sharid Loáiciga, Christian Hardmeier, Asad B. Sayeed:
Exploiting Cross-Lingual Hints to Discover Event Pronouns. 99-103 - Scott Martin, Shivani Poddar, Kartikeya Upasani:
MuDoCo: Corpus for Multidomain Coreference Resolution and Referring Expression Generation. 104-111 - Rong Xiang, Yunfei Long, Mingyu Wan, Jinghang Gu, Qin Lu, Chu-Ren Huang:
Affection Driven Neural Networks for Sentiment Analysis. 112-119 - Shohini Bhattasali, Jonathan Brennan, Wen-Ming Luh, Berta Franzluebbers, John T. Hale:
The Alice Datasets: fMRI & EEG Observations of Natural Language Comprehension. 120-125 - Elena Mikhalkova, Timofei Protasov, Polina Sokolova, Anastasiia Bashmakova, Anastasiia Drozdova:
Modelling Narrative Elements in a Short Story: A Study on Annotation Schemes and Guidelines. 126-132 - Harald Höge:
Cortical Speech Databases For Deciphering the Articulatory Code. 133-137 - Nora Hollenstein, Marius Troendle, Ce Zhang, Nicolas Langer:
ZuCo 2.0: A Dataset of Physiological Recordings During Natural Reading and Annotation. 138-146 - Tim Reinboth, Stephanie Gross, Laura Bishop, Brigitte Krenn:
Linguistic, Kinematic and Gaze Information in Task Descriptions: The LKG-Corpus. 147-155 - Anna Jancso, Steven Moran, Sabine Stoll:
The ACQDIV Corpus Database and Aggregation Pipeline. 156-165 - Didier Schwab, Pauline Trial, Céline Vaschalde, Loïc Vial, Emmanuelle Esperança-Rodier, Benjamin Lecouteux:
Providing Semantic Knowledge to a Set of Pictograms for People with Disabilities: a Set of Links between WordNet and Arasaac: Arasaac-WN. 166-171 - Stéphan Tulkens, Dominiek Sandra, Walter Daelemans:
Orthographic Codes and the Neighborhood Effect: Lessons from Information Theory. 172-181 - Elma Kerz, Fabio Pruneri, Daniel Wiechmann, Yu Qiao, Marcus Ströbel:
Understanding the Dynamics of Second Language Writing through Keystroke Logging and Complexity Contours. 182-188 - Yohei Oseki, Masayuki Asahara:
Design of BCCWJ-EEG: Balanced Corpus with Human Electroencephalography. 189-194 - Katerina Smirnova, Nikolay Korotaev, Yana Panikratova, Irina Lebedeva, Ekaterina Pechenkova, Olga Fedorova:
Using the RUPEX Multichannel Corpus in a Pilot fMRI Study on Speech Disfluencies. 195-203 - Aomi Koyama, Tomoshige Kiyuna, Kenji Kobayashi, Mio Arai, Mamoru Komachi:
Construction of an Evaluation Corpus for Grammatical Error Correction for Learners of Japanese as a Second Language. 204-211 - Sangha Nam, Minho Lee, Donghwan Kim, Kijong Han, Kuntae Kim, Sooji Yoon, Eun-Kyung Kim, Key-Sun Choi:
Effective Crowdsourcing of Multiple Tasks for Comprehensive Knowledge Extraction. 212-219 - Antonio Roque, Alexander Tsuetaki, Vasanth Sarathy, Matthias Scheutz:
Developing a Corpus of Indirect Speech Act Schemas. 220-228 - Yoshinao Sato, Kouki Miyazawa:
Quality Estimation for Partially Subjective Classification Tasks via Crowdsourcing. 229-235 - YoungGyun Hahm, Youngbin Noh, Jiyoon Han, Tae Hwan Oh, Hyonsu Choe, Hansaem Kim, Key-Sun Choi:
Crowdsourcing in the Development of a Multilingual FrameNet: A Case Study of Korean FrameNet. 236-244 - Neslihan Iskender, Tim Polzehl, Sebastian Möller:
Towards a Reliable and Robust Methodology for Crowd-Based Subjective Quality Assessment of Query-Based Extractive Text Summarization. 245-253 - Priya Radhakrishnan:
A Seed Corpus of Hindu Temples in India. 254-258 - Yu-Yun Chang, Shu-Kai Hsieh:
Do You Believe It Happened? Assessing Chinese Readers' Veridicality Judgments. 259-267 - Lionel Nicolas, Verena Lyding, Claudia Borg, Corina Forascu, Karën Fort, Katerina Zdravkova, Iztok Kosem, Jaka Cibej, Spela Arhar Holdt, Alice Millour, Alexander König, Christos T. Rodosthenous, Federico Sangati, Umair ul Hassan, Anisia Katinskaia, Anabela Barreiro, Lavinia Aparaschivei, Yaakov HaCohen-Kerner:
Creating Expert Knowledge by Relying on Language Learners: a Generic Approach for Mass-Producing Language Resources by Combining Implicit Crowdsourcing and Language Learning. 268-278 - Hessel Haagsma, Johan Bos, Malvina Nissim:
MAGPIE: A Large Corpus of Potentially Idiomatic Expressions. 279-287 - Francisco Javier Chiyah Garcia, José Lopes, Xingkun Liu, Helen F. Hastie:
CRWIZ: A Framework for Crowdsourcing Real-Time Wizard-of-Oz Dialogues. 288-297 - Inês Gomes, Rui Correia, Jorge Ribeiro, João Freitas:
Effort Estimation in Named Entity Tagging Tasks. 298-306 - Christos T. Rodosthenous, Verena Lyding, Federico Sangati, Alexander König, Umair ul Hassan, Lionel Nicolas, Jolita Horbacauskiene, Anisia Katinskaia, Lavinia Aparaschivei:
Using Crowdsourced Exercises for Vocabulary Training to Expand ConceptNet. 307-316 - Gérard Bailly, Erika Godde, Anne-Laure Piat-Marchand, Marie-Line Bosse:
Predicting Multidimensional Subjective Ratings of Children' Readings from the Speech Signals for the Automatic Assessment of Fluency. 317-322 - Elham Akhlaghi, Branislav Bédi, Fatih Bektas, Harald Berthelsen, Matthias Butterweck, Cathy Chua, Catia Cucchiarini, Gülsen Eryigit, Johanna Gerlach, Hanieh Habibi, Neasa Ní Chiaráin, Manny Rayner, Steinþór Steingrímsson, Helmer Strik:
Constructing Multimodal Language Learner Texts Using LARA: Experiences with Nine Languages. 323-331 - Ildikó Pilán, John Lee, Chak Yan Yeung, Jonathan J. Webster:
A Dataset for Investigating the Impact of Feedback on Student Revision Outcome. 332-339 - Ryo Nagata, Kentaro Inui, Shin'ichiro Ishikawa:
Creating Corpora for Research in Feedback Comment Generation. 340-345 - Johannes Graën, David Alfter, Gerold Schneider:
Using Multilingual Resources to Evaluate CEFRLex for Learner Applications. 346-355 - Benny Platte, Anett Platte, Christian Roschke, Rico Thomanek, Tony Rolletschke, Frank Zimmer, Marc Ritter:
Immersive Language Exploration with Object Recognition and Augmented Reality. 356-362 - Rianne Conijn, Emily Dux Speltz, Menno van Zaanen, Luuk van Waes, Evgeny Chukharev-Hudilainen:
A Process-oriented Dataset of Revisions during Writing. 363-368 - Luís Morgado da Costa, Roger Vivek Placidus Winder, Shu Yun Li, Benedict Christopher Tzer Liang Lin, Joseph MacKinnon, Francis Bond:
Automated Writing Support Using Deep Linguistic Parsers. 369-377 - Roberto Gretter, Marco Matassoni, Stefano Bannò, Daniele Falavigna:
TLT-school: a Corpus of Non Native Children Speech. 378-385 - Anisia Katinskaia, Sardana Ivanova, Roman Yangarber:
Toward a Paradigm Shift in Collection of Learner Corpora. 386-391 - Roberts Dargis, Ilze Auzina, Kristine Levane-Petrova, Inga Kaija:
Quality Focused Approach to a Learner Corpus Development. 392-396 - Orphée De Clercq, Senne Van Hoecke:
An Exploratory Study into Automated Précis Grading. 397-404 - Tzu-Hsiang Lin, Alexander I. Rudnicky, Trung Bui, Doo Soon Kim, Jean Oh:
Adjusting Image Attributes of Localized Regions with Low-level Dialogue. 405-412 - Wen-wai Yim, Meliha Yetisgen, Jenny Huang, Micah Grossman:
Alignment Annotation for Clinic Visit Dialogue to Clinical Note Sentence Language Generation. 413-421 - Mihail Eric, Rahul Goel, Shachi Paul, Abhishek Sethi, Sanchit Agarwal, Shuyang Gao, Adarsh Kumar, Anuj Kumar Goyal, Peter Ku, Dilek Hakkani-Tür:
MultiWOZ 2.1: A Consolidated Multi-Domain Dialogue Dataset with State Corrections and State Tracking Baselines. 422-428 - Matthias Kraus, Fabian Fischbach, Pascal Jansen, Wolfgang Minker:
A Comparison of Explicit and Implicit Proactive Dialogue Strategies for Conversational Recommendation. 429-435 - Arantxa Otegi, Aitor Gonzalez-Agirre, Jon Ander Campos, Aitor Soroa, Eneko Agirre:
Conversational Question Answering in Low Resource Scenarios: A Dataset and Case Study for Basque. 436-442 - Yoshihiro Yamazaki, Yuya Chiba, Takashi Nose, Akinori Ito:
Construction and Analysis of a Multimodal Chat-talk Corpus for Dialog Systems Considering Interpersonal Closeness. 443-448 - Jelte van Waterschoot, Iris Hendrickx, Arif Khan, Esther Klabbers, Marcel de Korte, Helmer Strik, Catia Cucchiarini, Mariët Theune:
BLISS: An Agent for Collecting Spoken Dialogue Data about Health and Well-being. 449-458 - Meng Chen, Ruixue Liu, Lei Shen, Shaozu Yuan, Jingyan Zhou, Youzheng Wu, Xiaodong He, Bowen Zhou:
The JDDC Corpus: A Large-Scale Multi-Turn Chinese Dialogue Dataset for E-commerce Customer Service. 459-466 - Béatrice Priego-Valverde, Brigitte Bigi, Mary Amoyal:
"Cheese!": a Corpus of Face-to-face French Interactions. A Case Study for Analyzing Smiling and Conversational Humor. 467-475 - Alberto Chierici, Nizar Habash, Margarita Bicec:
The Margarita Dialogue Corpus: A Data Set for Time-Offset Interactions and Unstructured Dialogue Systems. 476-484 - Maria Schmidt, Wolfgang Minker, Steffen Werner:
How Users React to Proactive Voice Assistant Behavior While Driving. 485-490 - Sara Asai, Koichiro Yoshino, Seitaro Shinagawa, Sakriani Sakti, Satoshi Nakamura:
Emotional Speech Corpus for Persuasive Dialogue System. 491-497 - Reshmashree Bangalore Kantharaju, Caroline Langlet, Mukesh Barange, Chloé Clavel, Catherine Pelachaud:
Multimodal Analysis of Cohesion in Multi-party Interactions. 498-507 - Rostislav Nedelchev, Ricardo Usbeck, Jens Lehmann:
Treating Dialogue Quality Evaluation as an Anomaly Detection Problem. 508-512 - Niklas Rach, Yuki Matsuda, Johannes Daxenberger, Stefan Ultes, Keiichi Yasumoto, Wolfgang Minker:
Evaluation of Argument Search Approaches in the Context of Argumentative Dialogue Systems. 513-522 - Alessandra Zarcone, Touhidul Alam, Zahra Kolagar:
PATE: A Corpus of Temporal Expressions for the In-car Voice Assistant Domain. 523-530 - Eugénio Ribeiro, Ricardo Ribeiro, David Martins de Matos:
Mapping the Dialog Act Annotations of the LEGO Corpus into ISO 24617-2 Communicative Functions. 531-539 - Juliana Miehle, Isabel Feustel, Julia Hornauer, Wolfgang Minker, Stefan Ultes:
Estimating User Communication Styles for Spoken Dialogue Systems. 540-548 - Harry Bunt, Volha Petukhova, Emer Gilmartin, Catherine Pelachaud, Alex Chengyu Fang, Simon Keizer, Laurent Prévot:
The ISO Standard for Dialogue Act Annotation, Second Edition. 549-558 - Kristiina Jokinen:
The AICO Multimodal Corpus - Data Collection and Preliminary Analyses. 559-564 - Fabian Galetzka, Chukwuemeka Uchenna Eneh, David Schlangen:
A Corpus of Controlled Opinionated and Knowledgeable Movie Discussions for Training Neural Conversation Models. 565-573 - Fréjus A. A. Laleye, Gaël de Chalendar, Antonia Blanié, Antoine Brouquet, Dan Benhamou:
A French Medical Conversations Corpus Annotated for a Virtual Patient Dialogue System. 574-580 - Chien-Sheng Wu, Andrea Madotto, Zhaojiang Lin, Peng Xu, Pascale Fung:
Getting To Know You: User Attribute Extraction from Dialogues. 581-589 - Abhinav Kumar, Barbara Di Eugenio, Jillian Aurisano, Andrew E. Johnson:
Augmenting Small Data to Classify Contextualized Dialogue Acts for Exploratory Visualization. 590-599 - Maike Paetzel, Deepthi Karkada, Ramesh R. Manuvinakurike:
RDG-Map: A Multimodal Corpus of Pedagogical Human-Agent Spoken Interactions. 600-609 - Yi-Ting Chen, Hen-Hsen Huang, Hsin-Hsi Chen:
MPDD: A Multi-Party Dialogue Dataset for Analysis of Emotions and Interpersonal Relationships. 610-614 - Ingo Siegert:
"Alexa in the wild" - Collecting Unconstrained Conversations with a Modern Voice Assistant in a Public Environment. 615-619 - Chandrakant Bothe, Cornelius Weber, Sven Magg, Stefan Wermter:
EDA: Enriching Emotional Dialogue Acts using an Ensemble of Neural Annotators. 620-627 - Mary Amoyal, Béatrice Priego-Valverde, Stéphane Rauzy:
PACO: a Corpus to Analyze the Impact of Common Ground in Spontaneous Face-to-Face Interaction. 628-633 - Mika Enomoto, Yasuharu Den, Yuichi Ishimoto:
A Conversation-Analytic Annotation of Turn-Taking Behavior in Japanese Multi-Party Conversation and its Preliminary Analysis. 644-652 - Yoshihiko Asao, Julien Kloetzer, Junta Mizuno, Dai Saiki, Kazuma Kadowaki, Kentaro Torisawa:
Understanding User Utterances in a Dialog System for Caregiving. 653-661 - Donghui Lin, Masayuki Otani, Ryosuke Okuno, Toru Ishida:
Designing Multilingual Interactive Agents using Small Dialogue Corpora. 662-667 - Birgit Rauchbauer, Youssef Hmamouche, Brigitte Bigi, Laurent Prévot, Magalie Ochs, Thierry Chaminade:
Multimodal Corpus of Bidirectional Conversation of Human-human and Human-robot Interaction during fMRI Scanning. 668-675 - Magalie Ochs, Roxane Bertrand, Aurélie Goujon, Deirdre Bolger, Anne Sophie Dubarry, Philippe Blache:
The Brain-IHM Dataset: a New Resource for Studying the Brain Basis of Human-Human and Human-Machine Conversations. 676-683 - Claire Bonial, Lucia Donatelli, Mitchell Abrams, Stephanie M. Lukin, Stephen Tratz, Matthew Marge, Ron Artstein, David R. Traum, Clare R. Voss:
Dialogue-AMR: Abstract Meaning Representation for Dialogue. 684-695 - Koichiro Ito, Masaki Murata, Tomohiro Ohno, Shigeki Matsubara:
Relation between Degree of Empathy for Narrative Speech and Type of Responsive Utterance in Attentive Listening. 696-701 - Robin Rojowiec, Benjamin Roth, Maximilian Fink:
Intent Recognition in Doctor-Patient Interviews. 702-709 - Youssef Hmamouche, Laurent Prévot, Magalie Ochs, Thierry Chaminade:
BrainPredict: a Tool for Predicting and Visualising Local Brain Activity. 710-716 - Matteo Antonio Senese, Giuseppe Rizzo, Mauro Dragoni, Maurizio Morisio:
MTSI-BERT: A Session-aware Knowledge-based Conversational Agent. 717-725 - Kallirroi Georgila, Carla Gordon, Volodymyr Yanov, David R. Traum:
Predicting Ratings of Real Dialogue Participants from Artificial Data and Ratings of Human Dialogue Observers. 726-734 - Seyed Hossein Alavi, Anton Leuski, David R. Traum:
Which Model Should We Use for a Real-World Conversational Dialogue System? a Cross-Language Relevance Model or a Deep Neural Net? 735-742 - Dimosthenis Kontogiorgos, Elena Sibirtseva, Joakim Gustafson:
Chinese Whispers: A Multimodal Dataset for Embodied Language Grounding. 743-749 - Gaurav Kumar, Rishabh Joshi, Jaspreet Singh, Promod Yenigalla:
AMUSED: A Multi-Stream Vector Representation Method for Use in Natural Dialogue. 750-758 - Vidya Somashekarappa, Christine Howes, Asad B. Sayeed:
An Annotation Approach for Social and Referential Gaze in Dialogue. 759-765 - Hannah Booth, Anne Breitbarth, Aaron Ecay, Melissa Farasyn:
A Penn-style Treebank of Middle Low German. 766-775 - Amir Hazem, Béatrice Daille, Christopher Kermorvant, Dominique Stutzmann, Marie-Laurence Bonhomme, Martin Maarand, Mélodie Boillet:
Books of Hours. the First Liturgical Data Set for Text Segmentation. 776-784 - Sergey Zinin, Yang Xu:
Corpus of Chinese Dynastic Histories: Gender Analysis over Two Millennia. 785-793 - Stefan Fischer, Jörg Knappen, Katrin Menzel, Elke Teich:
The Royal Society Corpus 6.0: Providing 300+ Years of Scientific Writing for Humanistic Study. 794-802 - Annelen Brunner, Stefan Engelberg, Fotis Jannidis, Ngoc Duyen Tanja Tu, Lukas Weimer:
Corpus REDEWIEDERGABE. 803-812 - Mattia Egloff, Davide Picca:
WeDH - a Friendly Tool for Building Literary Corpora Enriched with Encyclopedic Metadata. 813-816 - Valentino Sabbatino, Laura Ana Maria Bostan, Roman Klinger:
Automatic Section Recognition in Obituaries. 817-825 - Sara Stymne, Carin Östman:
SLäNDa: An Annotated Corpus of Narrative and Dialogue in Swedish Literary Fiction. 826-834 - Sean Papay, Sebastian Padó:
RiQuA: A Corpus of Rich Quotation Annotation for English Literary Text. 835-841 - Roman Schneider:
A Corpus Linguistic Perspective on Contemporary German Pop Lyrics with the Multi-Layer Annotated "Songkorpus". 842-848 - Sara Grilo, Márcia Bolrinha, João Silva, Rui Vaz, António Branco:
The BDCamões Collection of Portuguese Literary Documents: a Research Resource for Digital Humanities and Language Technology. 849-854 - Esteban Frossard, Mickaël Coustaty, Antoine Doucet, Adam Jatowt, Simon Hengchen:
Dataset for Temporal Analysis of English-French Cognates. 855-859 - Michelle Waldispühl, Dana Dannélls, Lars Borin:
Material Philology Meets Digital Onomastic Lexicography: The NordiCon Database of Medieval Nordic Personal Names in Continental Sources. 860-867 - Saif M. Mohammad:
NLP Scholar: A Dataset for Examining the State of NLP Research. 868-877 - Shafqat Mumtaz Virk, Harald Hammarström, Markus Forsberg, Søren Wichmann:
The DReaM Corpus: A Multilingual Annotated Corpus of Grammars for the World's Languages. 878-884 - Klaus Müller, Aleksej Tikhonov, Roland Meyer:
LiViTo: Linguistic and Visual Features Tool for Assisted Analysis of Historic Manuscripts. 885-890 - Giuseppe Abrami, Manuel Stoeckel, Alexander Mehler:
TextAnnotator: A UIMA Based Tool for the Simultaneous and Collaborative Annotation of Texts. 891-900 - Bikash Gyawali, Lucas Anastasiou, Petr Knoth:
Deduplication of Scholarly Documents using Locality Sensitive Hashing and Word Embeddings. 901-910 - Federico Boschetti, Irene De Felice, Stefano Dei Rossi, Felice Dell'Orletta, Michele Di Giorgio, Martina Miliani, Lucia C. Passaro, Angelica Puddu, Giulia Venturi, Nicola Labanca, Alessandro Lenci, Simonetta Montemagni:
"Voices of the Great War": A Richly Annotated Corpus of Italian Texts on the First World War. 911-918 - Gabriella Lapesa, André Blessing, Nico Blokker, Erenay Dayanik, Sebastian Haunss, Jonas Kuhn, Sebastian Padó:
DEbateNet-mig15: Tracing the 2015 Immigration Debate in Germany Over Time. 919-927 - Elena Álvarez Mellado:
A Corpus of Spanish Political Speeches from 1937 to 2019. 928-932 - Flavio Massimiliano Cecchini, Timo Korkiakangas, Marco Passarotti:
A New Latin Treebank for Universal Dependencies: Charters between Ancient Latin and Romance Languages. 933-942 - Renato Rocha Souza, Amelie Dorn, Barbara Piringer, Eveline Wandl-Vogt:
Identification of Indigenous Knowledge Concepts through Semantic Networks, Spelling Tools and Word Embeddings. 943-947 - Jonne Sälevä:
A Multi-Orthography Parallel Corpus of Yiddish Nouns. 948-952 - Katharina Gerhalter, Gerlinde Schneider, Christopher Pollin, Martin Hummel:
An Annotated Corpus of Adjective-Adverb Interfaces in Romance Languages. 953-957 - Maud Ehrmann, Matteo Romanello, Simon Clematide, Phillip Ströbel, Raphaël Barman:
Language Resources for Historical Newspapers: the Impresso Collection. 958-968 - Bernd Kampe, Tinghui Duan, Udo Hahn:
Allgemeine Musikalische Zeitung as a Searchable Online Corpus. 969-976 - Silvie Cinková, Jan Rybicki:
Stylometry in a Bilingual Setup. 977-984 - Yo Sato, Kevin Heffernan:
Dialect Clustering with Character-Based Metrics: in Search of the Boundary of Language and Dialect. 985-990 - Damien Sileo, Tim Van de Cruys, Camille Pradel, Philippe Muller:
DiscSense: Automated Semantic Analysis of Discourse Markers. 991-999 - Mónica Domínguez, Juan Soler Company, Leo Wanner:
ThemePro: A Toolkit for the Analysis of Thematic Progression. 1000-1007 - Yohan Jo, Elijah Mayfield, Chris Reed, Eduard H. Hovy:
Machine-Aided Annotation for Fine-Grained Proposition Types in Argumentation. 1008-1018 - Lin Chuan-An, Shyh-Shiun Hung, Hen-Hsen Huang, Hsin-Hsi Chen:
Chinese Discourse Parsing: Model and Evaluation. 1019-1024 - Wanqiu Long, Xinyi Cai, James E. M. Reid, Bonnie Webber, Deyi Xiong:
Shallow Discourse Annotation for Chinese TED Talks. 1025-1032 - Christopher Olshefski, Luca Lugini, Ravneet Singh, Diane J. Litman, Amanda Godley:
The Discussion Tracker Corpus of Collaborative Argumentation. 1033-1043 - Henny Sluyter-Gäthje, Peter Bourgonje, Manfred Stede:
Shallow Discourse Parsing for Under-Resourced Languages: Combining Machine Translation and Annotation Projection. 1044-1050 - Nathan Rasmussen, William Schuler:
A Corpus of Encyclopedia Articles with Logical Forms. 1051-1060 - Peter Bourgonje, Manfred Stede:
The Potsdam Commentary Corpus 2.2: Extending Annotations for Shallow Discourse Parsing. 1061-1066 - Elham Mohammadi, Timothe Beiko, Leila Kosseim:
On the Creation of a Corpus for Coherence Evaluation of Discursive Units. 1067-1072 - Takshak Desai, Parag Dakle, Dan I. Moldovan:
Joint Learning of Syntactic Features Helps Discourse Segmentation. 1073-1080 - Verena Ruf, Costanza Navarretta:
Creating a Corpus of Gestures and Predicting the Audience Response based on Gestures in Speeches of Donald Trump. 1081-1088 - Lucie Poláková, Katerina Rysova, Magdaléna Rysová, Jirí Mírovský:
GeCzLex: Lexicon of Czech and German Anaphoric Connectives. 1089-1096 - Debopam Das, Manfred Stede, Soumya Sankar Ghosh, Lahari Chatterjee:
DiMLex-Bangla: A Lexicon of Bangla Discourse Connectives. 1097-1102 - René Knaebel, Manfred Stede:
Semi-Supervised Tri-Training for Explicit Discourse Argument Expansion. 1103-1109 - Dhivya Chinnappa, Alexis Palmer, Eduardo Blanco:
WikiPossessions: Possession Timeline Generation as an Evaluation Benchmark for Machine Reading Comprehension of Long Texts. 1110-1117 - Matthijs Westera, Laia Mayol, Hannah Rohde:
TED-Q: TED Talks and the Questions they Evoke. 1118-1127 - Jirí Mírovský, Lucie Poláková, Pavlína Synková:
CzeDLex 0.6 and its Representation in the PML-TQ. 1128-1134 - Ryo Egawa, Gaku Morio, Katsuhide Fujita:
Corpus for Modeling User Interactions in Online Persuasive Discussions. 1135-1141 - Rodrigo Wilkens, Amalia Todirascu:
Simplifying Coreference Chains for Dyslexic Children. 1142-1151 - Yudai Kishimoto, Yugo Murawaki, Sadao Kurohashi:
Adapting BERT to Implicit Discourse Relation Classification with a Focus on Discourse Connectives. 1152-1158 - Angèle Barbedette, Iris Eshkol-Taravella:
What Speakers really Mean when they Ask Questions: Classification of Intentions with a Supervised Approach. 1159-1166 - Shahla Farzana, Mina Valizadeh, Natalie Parde:
Modeling Dialogue in Conversational Cognitive Health Screening Interviews. 1167-1177 - Nadiya Straton, Hyeju Jang, Raymond T. Ng:
Stigma Annotation Scheme and Stigmatized Language Detection in Health-Care Discussions on Social Media. 1178-1190 - Swapnil Dhanwal, Hritwik Dutta, Hitesh Nankani, Nilay Shrivastava, Yaman Kumar, Junyi Jessy Li, Debanjan Mahata, Rakesh Gosangi, Haimin Zhang, Rajiv Ratn Shah, Amanda Stent:
An Annotated Dataset of Discourse Modes in Hindi Stories. 1191-1196 - Hassan S. Shavarani, Satoshi Sekine:
Multi-class Multilingual Classification of Wikipedia Articles Using Extended Named Entity Tag Set. 1197-1201 - Leila Moudjari, Karima Akli-Astouati, Farah Benamara:
An Algerian Corpus and an Annotation Platform for Opinion and Emotion Analysis. 1202-1210 - Valeriya Slovikovskaya, Giuseppe Attardi:
Transfer Learning from Transformers to Fake News Challenge Stance Detection (FNC-1) Task. 1211-1218 - Deyan Ginev, Bruce R. Miller:
Scientific Statement Classification over arXiv.org. 1219-1226 - Rafael Dias, Ivandré Paraboni:
Cross-domain Author Gender Classification in Brazilian Portuguese. 1227-1234 - Don Tuggener, Pius von Däniken, Thomas Peetz, Mark Cieliebak:
LEDGAR: A Large-Scale Multi-label Corpus for Text Classification of Legal Provisions in Contracts. 1235-1241 - Simon Rodier, Dave Carter:
Online Near-Duplicate Detection of News Articles. 1242-1249 - Reo Hirao, Mio Arai, Hiroki Shimanaka, Satoru Katsumata, Mamoru Komachi:
Automated Essay Scoring System for Nonnative Japanese Learners. 1250-1257 - Jan Neerbek, Morten Eskildsen, Peter Dolog, Ira Assent:
A Real-World Data Resource of Complex Sensitive Sentences Based on Documents from the Monsanto Trial. 1258-1267 - Konstantina Lazaridou, Alexander Löser, Maria Mestre, Felix Naumann:
Discovering Biased News Articles Leveraging Multiple Human Annotations. 1268-1277 - Hugo Gonçalo Oliveira, André Clemêncio, Ana Alves:
Corpora and Baselines for Humour Recognition in Portuguese. 1278-1285 - Marten van der Meulen, W. Gudrun Reijnierse:
FactCorp: A Corpus of Dutch Fact-checks and its Multiple Usages. 1286-1292 - Katrin Ortmann, Stefanie Dipper:
Automatic Orality Identification in Historical Texts. 1293-1302 - Xingyi Song, Johnny Downs, Sumithra Velupillai, Rachel Holden, Maxim Kikoler, Kalina Bontcheva, Rina Dutta, Angus Roberts:
Using Deep Neural Networks with Intra- and Inter-Sentence Context to Classify Suicidal Behaviour. 1303-1310 - Emad Mohamed, Le An Ha:
A First Dataset for Film Age Appropriateness Investigation. 1311-1317 - Mahmoud El-Haj:
Habibi - a multi Dialect multi National Arabic Song Lyrics Corpus. 1318-1326 - Mahsa Shafaei, Niloofar Safi Samghabadi, Sudipta Kar, Thamar Solorio:
Age Suitability Rating: Predicting the MPAA Rating Based on Movie Dialogues. 1327-1335 - Sakhar B. Alkhereyf, Owen Rambow:
Email Classification Incorporating Social Networks and Thread Structure. 1336-1345 - Yuen-Hsien Tseng, Wun-Syuan Wu, Chia-Yueh Chang, Hsueh-Chih Chen, Wei-Lun Hsu:
Development and Validation of a Corpus for Machine Humor Comprehension. 1346-1352 - Núria Gala, Anaïs Tack, Ludivine Javourey-Drevet, Thomas François, Johannes C. Ziegler:
Alector: A Parallel Corpus of Simplified French Texts with Alignments of Misreadings by Poor and Dyslexic Readers. 1353-1361 - Edward T. Moseley, Joy T. Wu, Jonathan Welt, John Foote Jr., Patrick D. Tyler, David W. Grant, Eric T. Carlson, Sebastian Gehrmann, Franck Dernoncourt, Leo Anthony Celi:
A Corpus for Detecting High-Context Medical Conditions in Intensive Care Patient Notes Focusing on Frequently Readmitted Patients. 1362-1367 - Elena Zotova, Rodrigo Agerri, Manuel Núñez, German Rigau:
Multilingual Stance Detection in Tweets: The Catalonia Independence Corpus. 1368-1375 - Abdul Moeed, Gerhard Hagerer, Sumit Dugar, Sarthak Gupta, Mainak Ghosh, Hannah Danner, Oliver Mitevski, Andreas Nawroth, Georg Groh:
An Evaluation of Progressive Neural Networksfor Transfer Learning in Natural Language Processing. 1376-1381 - Noé Cecillon, Vincent Labatut, Richard Dufour, Georges Linarès:
WAC: A Corpus of Wikipedia Conversations for Online Abuse Detection. 1382-1390 - Btool Hamoui, Mourad Mars, Khaled Hatem Almotairi:
FloDusTA: Saudi Tweets Dataset for Flood, Dust Storm, and Traffic Accident Events. 1391-1396 - Patricia Chiril, Véronique Moriceau, Farah Benamara, Alda Mari, Gloria Origgi, Marlène Coulomb-Gully:
An Annotated Corpus for Sexism Detection in French Tweets. 1397-1403 - Roney L. S. Santos, Gabriela Wick-Pedro, Sidney Evaldo Leal, Oto A. Vale, Thiago A. S. Pardo, Kalina Bontcheva, Carolina Scarton:
Measuring the Impact of Readability Features in Fake News Detection. 1404-1413 - Sanja Stajner, Ioana Hulpus:
When Shallow is Good Enough: Automatic Assessment of Conceptual Text Complexity using Shallow Semantic Features. 1414-1422 - Pasquale Capuozzo, Ivano Lauriola, Carlo Strapparava, Fabio Aiolli, Giuseppe Sartori:
DecOp: A Multilingual and Multi-domain Corpus For Detecting Deception In Typed Text. 1423-1430 - Alexis Blandin, Gwénolé Lecorvé, Delphine Battistelli, Aline Étienne:
Age Recommendation for Texts. 1431-1439 - Xiaolei Huang, Linzi Xing, Franck Dernoncourt, Michael J. Paul:
Multilingual Twitter Corpus and Baselines for Evaluating Demographic Bias in Hate Speech Recognition. 1440-1448 - Pedro Henrique Luz de Araujo, Teófilo Emídio de Campos, Fabricio Ataides Braz, Nilton Correia da Silva:
VICTOR: a Dataset for Brazilian Legal Documents Classification. 1449-1458 - Krutarth Patel, Cornelia Caragea, Mark E. Phillips:
Dynamic Classification in Web Archiving Collections. 1459-1468 - Larissa Vasconcelos, Cláudio E. C. Campelo, Caio Libânio Melo Jerônimo:
Aspect Flow Representation and Audio Inspired Analysis for Texts. 1469-1477 - Sora Lim, Adam Jatowt, Michael Färber, Masatoshi Yoshikawa:
Annotating and Analyzing Biased Sentences in News Articles using Crowdsourcing. 1478-1484 - P. Jayashree, P. K. Srijith:
Evaluation of Deep Gaussian Processes for Text Classification. 1485-1491 - Flor Miriam Plaza del Arco, Carlo Strapparava, Luis Alfonso Ureña López, María Teresa Martín Valdivia:
EmoEvent: A Multilingual Emotion Corpus based on different Events. 1492-1498 - Mimansa Jaiswal, Cristian-Paul Bara, Yuanhang Luo, Mihai Burzo, Rada Mihalcea, Emily Mower Provost:
MuSE: a Multimodal Dataset of Stressed Emotion. 1499-1510 - Linrui Zhang, Hsin-Lun Huang, Yang Yu, Dan Moldovan:
Affect inTweets: A Transfer Learning Approach. 1511-1516 - Aniruddha Tammewar, Alessandra Cervone, Eva-Maria Messner, Giuseppe Riccardi:
Annotation of Emotion Carriers in Personal Narratives. 1517-1525 - Jane Wottawa, Marie Tahon, Apolline Marin, Nicolas Audibert:
Towards Interactive Annotation for Hesitation in Conversational Speech. 1526-1532 - Marta R. Costa-jussà, Esther González, Asunción Moreno, Eudald Cumalat:
Abusive language in Spanish children and young teenager's conversations: data preparation and short text classification with contextual word embeddings. 1533-1537 - Banothu Rambabu, Kishore Kumar Botsa, P. Gangamohan, Suryakanth V. Gangashetty:
IIIT-H TEMD Semi-Natural Emotional Speech Database from Professional Actors and Non-Actors. 1538-1545 - Thomas Janssoone, Kévin Bailly, Gaël Richard, Chloé Clavel:
The POTUS Corpus, a Database of Weekly Addresses for the Study of Stance in Politics and Virtual Agents. 1546-1553 - Laura Ana Maria Bostan, Evgeny Kim, Roman Klinger:
GoodNewsEveryone: A Corpus of News Headlines Annotated with Emotions, Semantic Roles, and Reader Perception. 1554-1566 - Svetlana Kiritchenko, Will E. Hipson, Robert J. Coplan, Saif M. Mohammad:
SOLO: A Corpus of Tweets for Examining the State of Being Alone. 1567-1577 - Will E. Hipson, Saif M. Mohammad:
PoKi: A Large Dataset of Poems by Children. 1578-1589 - Manon Macary, Marie Tahon, Yannick Estève, Anthony Rousseau:
AlloSat: A New Call Center French Corpus for Satisfaction and Frustration Analysis. 1590-1597 - Shih-Hung Wu, Sheng-Lun Chien:
Learning the Human Judgment for the Automatic Evaluation of Chatbot. 1598-1602 - Young-Jun Lee, Chae-Gyun Lim, Ho-Jin Choi:
Korean-Specific Emotion Annotation Procedure Using N-Gram-Based Distant Supervision and Korean-Specific-Feature-Based Distant Supervision. 1603-1610 - Jiajun Xu, Kyosuke Masuda, Hiromitsu Nishizaki, Fumiyo Fukumoto, Yoshimi Suzuki:
Semi-Automatic Construction and Refinement of an Annotated Corpus for a Deep Learning Framework for Emotion Classification. 1611-1617 - Soumitra Ghosh, Asif Ekbal, Pushpak Bhattacharyya:
CEASE, a Corpus of Emotion Annotated Suicide notes in English. 1618-1626 - Oliver Guhr, Anne-Kathrin Schumann, Frank Bahrmann, Hans-Joachim Böhme:
Training a Broad-Coverage German Sentiment Classification Model for Dialog Systems. 1627-1632 - Sophia Yat Mei Lee, Helena Yan Ping Lau:
An Event-comment Social Media Corpus for Implicit Emotion Analysis. 1633-1642 - Luna De Bruyne, Orphée De Clercq, Véronique Hoste:
An Emotional Mess! Deciding on a Framework for Building a Dutch Emotion-Annotated Corpus. 1643-1651 - Thomas N. Haider, Steffen Eger, Evgeny Kim, Roman Klinger, Winfried Menninghaus:
PO-EMO: Conceptualization, Annotation, and Modeling of Aesthetic Emotions in German and English Poetry. 1652-1663 - João Sedoc, Sven Buechel, Yehonathan Nachmany, Anneke Buffone, Lyle H. Ungar:
Learning Word Ratings for Empathy and Distress from Document-Level User Responses. 1664-1673 - Slawomir Dadas, Michal Perelkiewicz, Rafal Poswiata:
Evaluation of Sentence Representations in Polish. 1674-1680 - Rachid Riad, Anne-Catherine Bachoud-Lévi, Frank Rudzicz, Emmanuel Dupoux:
Identification of Primary and Collateral Tracks in Stuttered Speech. 1681-1688 - Alain Ghio, Muriel Lalain, Laurence Giusti, Corinne Fredouille, Virginie Woisard:
How to Compare Automatically Two Phonological Strings: Application to Intelligibility Measurement in the Case of Atypical Speech. 1689-1694 - Sennan Liu, Shuang Zeng, Sujian Li:
Evaluating Text Coherence at Sentence and Paragraph Levels. 1695-1703 - Gabriel Bernier-Colborne, Philippe Langlais:
HardEval: Focusing on Challenging Tokens to Assess Robustness of NER. 1704-1711 - Kenichi Iwatsuki, Florian Boudin, Akiko Aizawa:
An Evaluation Dataset for Identifying Communicative Functions of Sentences in English Scholarly Papers. 1712-1720 - Fabio Fassetti, Ilaria Fassetti:
An Automatic Tool For Language Evaluation. 1721-1726 - Jordan L. Boyd-Graber, Fenfei Guo, Leah Findlater, Mohit Iyyer:
Which Evaluations Uncover Sense Representations that Actually Make Sense? 1727-1738 - Yi-An Lai, Xuan Zhu, Yi Zhang, Mona T. Diab:
Diversity, Density, and Homogeneity: Quantitative Characteristic Metrics for Text Collections. 1739-1746 - Bonan Min, Yee Seng Chan, Lingjun Zhao:
Towards Few-Shot Event Mention Retrieval: An Evaluation Framework and A Siamese Network Approach. 1747-1752 - Andrea Horbach, Itziar Aldabe, Marie Bexte, Oier Lopez de Lacalle, Montse Maritxalar:
Linguistic Appropriateness and Pedagogic Usefulness of Reading Comprehension Questions. 1753-1762 - Leo Born, Maximilian Bacher, Katja Markert:
Dataset Reproducibility and IR Methods in Timeline Summarization. 1763-1771 - Stefanie Nadig, Martin Braschler, Kurt Stockinger:
Database Search vs. Information Retrieval: A Novel Method for Studying Natural Language Querying of Semi-Structured Data. 1772-1779 - Christopher Grimsley, Elijah Mayfield, Julia R. S. Bursten:
Why Attention is Not Explanation: Surgical Intervention and Causal Reasoning about Neural Models. 1780-1790 - Anna K. Marczyk, Alain Ghio, Muriel Lalain, Marie Rebourg, Corinne Fredouille, Virginie Woisard:
Have a Cake and Eat it Too: Assessing Discriminating Performance of an Intelligibility Index Obtained from a Reduced Sample Size. 1791-1795 - Abdul Moeed, Yang An, Gerhard Hagerer, Georg Groh:
Evaluation Metrics for Headline Generation Using Deep Pre-Trained Embeddings. 1796-1802 - Gustavo Aguilar, Sudipta Kar, Thamar Solorio:
LinCE: A Centralized Benchmark for Linguistic Code-switching Evaluation. 1803-1813 - Eetu Sjöblom, Mathias Creutz, Yves Scherrer:
Paraphrase Generation and Evaluation on Colloquial-Style Sentences. 1814-1822 - Namgi Han, Katsuhiko Hayashi, Yusuke Miyao:
Analyzing Word Embedding Through Structural Equation Modeling. 1823-1832 - Yevhenii Prokopalo, Sylvain Meignier, Olivier Galibert, Loïc Barrault, Anthony Larcher:
Evaluation of Lifelong Learning Systems. 1833-1841 - Elzbieta Hajnicz:
Interannotator Agreement for Lexico-Semantic Annotation of a Corpus. 1842-1848 - Markus Näther:
An In-Depth Comparison of 14 Spelling Correction Tools on a Common Benchmark. 1849-1857 - Yu Yuan, Serge Sharoff:
Sentence Level Human Translation Quality Estimation with Attention-based Neural Networks. 1858-1865 - Diego Alves, Gaurish Thakkar, Marko Tadic:
Evaluating Language Tools for Fifteen EU-official Under-resourced Languages. 1866-1873 - Dimuthu Lakmal, Surangika Ranathunga, Saman Peramuna, Indu Herath:
Word Embedding Evaluation for Sinhala. 1874-1881 - Carlos Aspillaga, Andrés Carvallo, Vladimir Araujo:
Stress Test Evaluation of Transformer-based Models in Natural Language Understanding Tasks. 1882-1894 - Arkadiusz Janz, Lukasz Kopoci'nski, Maciej Piasecki, Agnieszka Pluwak:
Brand-Product Relation Extraction Using Heterogeneous Vector Space Representations. 1895-1901 - Maja Buljan, Joakim Nivre, Stephan Oepen, Lilja Øvrelid:
A Tale of Three Parsers: Towards Diagnostic Evaluation for Meaning Representation Parsing. 1902-1909 - Mu Yang, Chi-Yen Chen, Yi-Hui Lee, Qian-hui Zeng, Wei-Yun Ma, Chen-Yang Shih, Wei-Jhih Chen:
Headword-Oriented Entity Linking: A Special Entity Linking Task with Dataset and Baseline. 1910-1917 - Minghao Li, Lei Cui, Shaohan Huang, Furu Wei, Ming Zhou, Zhoujun Li:
TableBank: Table Benchmark for Image-based Table Detection and Recognition. 1918-1925 - Jibril Frej, Didier Schwab, Jean-Pierre Chevallet:
WIKIR: A Python Toolkit for Building a Large-scale Wikipedia-based English Information Retrieval Dataset. 1926-1933 - Koji Tanaka, Chenhui Chu, Haolin Ren, Benjamin Renoust, Yuta Nakashima, Noriko Takemura, Hajime Nagahara, Takao Fujikawa:
Constructing a Public Meeting Corpus. 1934-1940 - Fusataka Kuniyoshi, Kohei Makino, Jun Ozawa, Makoto Miwa:
Annotating and Extracting Synthesis Process of All-Solid-State Batteries from Scientific Literature. 1941-1950 - Michael Strobl, Amine Trabelsi, Osmar R. Zaïane:
WEXEA: Wikipedia EXhaustive Entity Annotation. 1951-1958 - Arnaud Ferré, Robert Bossy, Mouhamadou Ba, Louise Deléger, Thomas Lavergne, Pierre Zweigenbaum, Claire Nédellec:
Handling Entity Normalization with no Annotated Corpus: Weakly Supervised Methods Based on Distributional Representation and Ontological Information. 1959-1966 - Francesca Bonin, Martin Gleize, Ailbhe Finnerty, Candice Moore, Charles Jochim, Emma Norris, Yufang Hou, Alison J. Wright, Debasis Ganguly, Emily Hayes, Silje Zink, Alessandra Pascale, Pol Mac Aonghusa, Susan Michie:
HBCP Corpus: A New Resource for the Analysis of Behavioural Change Intervention Reports. 1967-1975 - Di Lu, Ananya Subburathinam, Heng Ji, Jonathan May, Shih-Fu Chang, Avirup Sil, Clare R. Voss:
Cross-lingual Structure Transfer for Zero-resource Event Extraction. 1976-1981 - Alan Ramponi, Barbara Plank, Rosario Lombardo:
Cross-Domain Evaluation of Edge Detection for Biomedical Event Extraction. 1982-1989 - Paul Thompson, Tim Yates, Emrah Inan, Sophia Ananiadou:
Semantic Annotation for Improved Safety in Construction Work. 1990-1999 - Leonidas Tsekouras, Georgios Petasis, George Giannakopoulos, Aris Kosmopoulos:
Social Web Observatory: A Platform and Method for Gathering Knowledge on Entities from Different Textual Sources. 2000-2008 - Jaya Chaturvedi, Natalia Viani, Jyoti Sanyal, Chloe Tytherleigh, Idil Hasan, Kate Baird, Sumithra Velupillai, Robert Stewart, Angus Roberts:
Development of a Corpus Annotated with Medications and their Attributes in Psychiatric Health Records. 2009-2016 - Angrosh Mandya, James O'Neill, Danushka Bollegala, Frans Coenen:
Do not let the history haunt you: Mitigating Compounding Errors in Conversational Question Answering. 2017-2025 - Weixin Zeng, Xiang Zhao, Jiuyang Tang, Zhen Tan, Xuqian Huang:
CLEEK: A Chinese Long-text Corpus for Entity Linking. 2026-2035 - Izhak Shafran, Nan Du, Linh Tran, Amanda Perry, Lauren Keyes, Mark Knichel, Ashley Domin, Lei Huang, Yuhui Chen, Gang Li, Mingqiu Wang, Laurent El Shafey, Hagen Soltau, Justin S. Paul:
The Medical Scribe: Corpus Development and Model Performance Analyses. 2036-2044 - Ruka Funaki, Yusuke Nagata, Kohei Suenaga, Shinsuke Mori:
A Contract Corpus for Recognizing Rights and Obligations. 2045-2053 - Scott Pezanowski, Prasenjit Mitra:
Recognition of Implicit Geographic Movement in Text. 2054-2063 - Keiichi Takamaru, Yasutomo Kimura, Hideyuki Shibuki, Hokuto Ototake, Yuzu Uchida, Kotaro Sakamoto, Madoka Ishioroshi, Teruko Mitamura, Noriko Kando:
Extraction of the Argument Structure of Tokyo Metropolitan Assembly Minutes: Segmentation of Question-and-Answer Sets. 2064-2068 - Cécile Robin, Mona Isazad Mashinchi, Fatemeh Ahmadi Zeleti, Adegboyega Ojo, Paul Buitelaar:
A Term Extraction Approach to Survey Analysis in Health Care. 2069-2077 - Ruben Kruiper, Julian F. V. Vincent, Jessica Chen-Burger, Marc P. Y. Desmulliez, Ioannis Konstas:
A Scientific Information Extraction Dataset for Nature Inspired Engineering. 2078-2085 - Natalia Vanetik, Marina Litvak, Sergey Shevchuk, Lior Reznik:
Automated Discovery of Mathematical Definitions in Text. 2086-2094 - Chuan Wu, Evangelos Kanoulas, Maarten de Rijke, Wei Lu:
WN-Salience: A Corpus of News Articles with Entity Salience Annotations. 2095-2102 - Ephrem Tadesse, Rosa Tsegaye, Kuulaa Qaqqabaa:
Event Extraction from Unstructured Amharic Text. 2103-2109 - Bernardo Magnini, Alberto Lavelli, Simone Magnolini:
Comparing Machine Learning and Deep Learning Approaches on NLP Tasks for the Italian Language. 2110-2119 - Nima Nabizadeh, Dorothea Kolossa, Martin Heckmann:
MyFixit: An Annotated Dataset, Annotation Tool, and Baseline Methods for Information Extraction from Repair Manuals. 2120-2128 - Marieke van Erp, Paul Groth:
Towards Entity Spaces. 2129-2137 - Michael Fell, Elena Cabrio, Elmahdi Korfed, Michel Buffa, Fabien Gandon:
Love Me, Love Me, Say (and Write!) that You Love Me: Enriching the WASABI Song Corpus with Lyrics Annotations. 2138-2147 - Mustafa Ocal, Mark A. Finlayson:
Evaluating Information Loss in Temporal Dependency Trees. 2148-2156 - Llio Humphreys, Guido Boella, Luigi Di Caro, Livio Robaldo, Leon van der Torre, Sepideh Ghanavati, Robert Muthuri:
Populating Legal Ontologies using Semantic Role Labeling. 2157-2166 - Michal Marcinczuk, Marcin Oleksy, Jan Wieczorek:
PST 2.0 - Corpus of Polish Spatial Texts. 2167-2174 - Deborah Ferreira, André Freitas:
Natural Language Premise Selection: Finding Supporting Statements for Mathematical Text. 2175-2182 - Marco Antonio Valenzuela-Escárcega, Gus Hahn-Powell, Dane Bell:
Odinson: A Fast Rule-based Information Extraction Framework. 2183-2191 - Jennifer D'Souza, Anett Hoppe, Arthur Brack, Mohamad Yaser Jaradeh, Sören Auer, Ralph Ewerth:
The STEM-ECR Dataset: Grounding Scientific Entity References in STEM Scholarly Content to Authoritative Encyclopedic and Lexicographic Sources. 2192-2203 - Maria Alexeeva, Rebecca Sharp, Marco Antonio Valenzuela-Escárcega, Jennifer Kadowaki, Adarsh Pyarelal, Clayton T. Morrison:
MathAlign: Linking Formula Identifiers to their Contextual Natural Language Descriptions. 2204-2212 - Oscar Sainz, Oier Lopez de Lacalle, Itziar Aldabe, Montse Maritxalar:
Domain Adapted Distant Supervision for Pedagogically Motivated Relation Extraction. 2213-2222 - Jingcheng Niu, Victoria Ng, Gerald Penn, Erin E. Rees:
Temporal Histories of Epidemic Events (THEE): A Case Study in Temporal Annotation for Public Health. 2223-2230 - Anita Khadka, Iván Cantador, Miriam Fernández:
Exploiting Citation Knowledge in Personalised Recommendation of Recent Scientific Publications. 2231-2240 - Sovan Kumar Sahoo, Saumajit Saha, Asif Ekbal, Pushpak Bhattacharyya:
A Platform for Event Extraction in Hindi. 2241-2250 - Surabhi Datta, Morgan Ulinski, Jordan Godfrey-Stovall, Shekhar Khanpara, Roy Riascos-Castaneda, Kirk Roberts:
Rad-SpatialNet: A Frame-based Resource for Fine-Grained Spatial Relations in Radiology Reports. 2251-2260 - Corentin Masson, Patrick Paroubek:
NLP Analytics in Finance with DoRe: A French 250M Tokens Corpus of Corporate Annual Reports. 2261-2267 - Ramón Maldonado, Sanda M. Harabagiu:
The Language of Brain Signals: Natural Language Processing of Electroencephalography Reports. 2268-2275 - Tatiana Shavrina, Anton A. Emelyanov, Alena Fenogenova, Vadim Fomin, Vladislav Mikhailov, Andrey Evlampiev, Valentin Malykh, Vladimir Larin, Alex Natekin, Aleksandr Vatulin, Peter Romov, Daniil Anastasiev, Nikolai Zinov, Andrey Chertok:
Humans Keep It One Hundred: an Overview of AI Journey. 2276-2284 - Maaike de Boer, Jack P. C. Verhoosel:
Towards Data-driven Ontologies: a Filtering Approach using Keywords and Natural Language Constructs. 2285-2292 - Ali Jabbari, Olivier Sauvage, Hamada Zeine, Hamza Chergui:
A French Corpus and Annotation Schema for Named Entity Recognition and Relation Extraction of Financial News. 2293-2299 - Nadia Bebeshina, Mathieu Lafourcade:
Inferences for Lexical Semantic Resource Building with Less Supervision. 2300-2305 - Ritsuko Iwai, Daisuke Kawahara, Takatsune Kumada, Sadao Kurohashi:
Acquiring Social Knowledge about Personality and Driving-related Behavior. 2306-2315 - Maria Becker, Katharina Korfhage, Anette Frank:
Implicit Knowledge in Argumentative Texts: An Annotated Corpus. 2316-2324 - Stefano Faralli, Paola Velardi, Farid Yusifli:
Multiple Knowledge GraphDB (MKGDB). 2325-2331 - Julián Moreno Schneider, Georg Rehm, Elena Montiel-Ponsoda, Víctor Rodríguez-Doncel, Artem Revenko, Sotirios Karampatakis, Maria Khvalchik, Christian Sageder, Jorge Gracia, Filippo Maganza:
Orchestrating NLP Services for the Legal Domain. 2332-2340 - Georgeta Bordea, Stefano Faralli, Fleur Mougin, Paul Buitelaar, Gayo Diallo:
Evaluation Dataset and Methodology for Extracting Application-Specific Taxonomies from the Wikipedia Knowledge Graph. 2341-2347 - Estelle I. S. Randria, Lionel Fontan, Maxime Le Coz, Isabelle Ferrané, Julien Pinquier:
Subjective Evaluation of Comprehensibility in Movie Interactions. 2348-2357 - Pilar León Araúz, Arianne Reimerink, Melania Cabezas-García:
Representing Multiword Term Variation in a Terminological Knowledge Base: a Corpus-Based Study. 2358-2367 - Soham Dan, Hangfeng He, Dan Roth:
Understanding Spatial Relations through Multiple Modalities. 2368-2372 - Dwaipayan Roy, Sumit Bhatia, Prateek Jain:
A Topic-Aligned Multilingual Corpus of Wikipedia Articles for Studying Information Asymmetry in Low Resource Languages. 2373-2380 - Zoltán Kmetty, Veronika Vincze, Dorottya Demszky, Orsolya Ring, Balázs Nagy, Martina Katalin Szabó:
Pártélet: A Hungarian Corpus of Propaganda Texts from the Hungarian Socialist Era. 2381-2388 - Kristian Noullet, Rico Mix, Michael Färber:
KORE 50DYWC: An Evaluation Data Set for Entity Linking Based on DBpedia, YAGO, Wikidata, and Crunchbase. 2389-2395 - Özge Alaçam, Eugen Ruppert, Amr Rekaby Salama, Tobias Staron, Wolfgang Menzel:
Eye4Ref: A Multimodal Eye Movement Dataset of Referentially Complex Situations. 2396-2404 - Jiahao Chen, Chenjie Cao, Xiuyan Jiang:
SiBert: Enhanced Chinese Pre-trained Language Model with Sentence Insertion. 2405-2412 - Brian Roark, Lawrence Wolf-Sonkin, Christo Kirov, Sabrina J. Mielke, Cibu Johny, Isin Demirsahin, Keith B. Hall:
Processing South Asian Languages Written in the Latin Script: the Dakshina Dataset. 2413-2423 - Gabor Melli, Abdelrhman Eldallal, Bassim Lazem, Olga Moreira:
GM-RKB WikiText Error Correction Task and Baselines. 2424-2430 - Anne Beyer, Göran Kauermann, Hinrich Schütze:
Embedding Space Correlation as a Measure of Domain Similarity. 2431-2439 - Mandy Guo, Zihang Dai, Denny Vrandecic, Rami Al-Rfou:
Wiki-40B: Multilingual Language Model Dataset. 2440-2452 - Serge Sharoff:
Know thy Corpus! Robust Methods for Digital Curation of Web corpora. 2453-2460 - Milton King, Paul Cook:
Evaluating Approaches to Personalizing Language Models. 2461-2469 - Irina S. Kipyatkova, Alexey Karpov:
Class-based LSTM Russian Language Model with Linguistic Information. 2470-2474 - Sello Ralethe:
Adaptation of Deep Bidirectional Transformers for Afrikaans Language. 2475-2478 - Hang Le, Loïc Vial, Jibril Frej, Vincent Segonne, Maximin Coavoux, Benjamin Lecouteux, Alexandre Allauzen, Benoît Crabbé, Laurent Besacier, Didier Schwab:
FlauBERT: Unsupervised Language Model Pre-training for French. 2479-2490 - Manuel R. Ciosici, Ira Assent, Leon Derczynski:
Accelerated High-Quality Mutual-Information Based Word Clustering. 2491-2496 - Sylvain Coulange, Solange Rossato:
Rhythmic Proximity Between Natives And Learners Of French - Evaluation of a metric based on the CEFC corpus. 2497-2502 - Giulia Speranza, Maria Pia di Buono, Johanna Monti, Federico Sangati:
From Linguistic Resources to Ontology-Aware Terminologies: Minding the Representation Gap. 2503-2510 - Fatma Arslan, Josue Caraballo, Damian Jimenez, Chengkai Li:
Modeling Factual Claims with Semantic Frames. 2511-2520 - Vishwa Gupta, Gilles Boulianne:
Automatic Transcription Challenges for Inuktitut, a Low-Resource Polysynthetic Language. 2521-2527 - Jonathan Dunn, Benjamin Adams:
Geographically-Balanced Gigaword Corpora for 50 Language Varieties. 2528-2536 - Maaz Amjad, Grigori Sidorov, Alisa Zhila:
Data Augmentation using Machine Translation for Fake News Detection in the Urdu Language. 2537-2542 - Stamatis Outsios, Christos Karatsalos, Konstantinos Skianis, Michalis Vazirgiannis:
Evaluation of Greek Word Embeddings. 2543-2551 - Katerina Papavassiliou, Gareth Owens, Dimitrios I. Kosmopoulos:
A Dataset of Mycenaean Linear B Sequences. 2552-2561 - Eric Joanis, Rebecca Knowles, Roland Kuhn, Samuel Larkin, Patrick Littell, Chi-kiu Lo, Darlene A. Stewart, Jeffrey Micher:
The Nunavut Hansard Inuktitut-English Parallel Corpus 3.0 with Preliminary Machine Translation Results. 2562-2572 - Leah Michel, Viktor Hangya, Alexander M. Fraser:
Exploring Bilingual Word Embeddings for Hiligaynon, a Low-Resource Language. 2573-2580 - Anna Zueva, Anastasia Kuznetsova, Francis M. Tyers:
A Finite-State Morphological Analyser for Evenki. 2581-2589 - Amanuel Mersha, Stephen Wu:
Morphology-rich Alphasyllabary Embeddings. 2590-2595 - Jan Christian Blaise Cruz, Julianne Agatha Tan, Charibeth Cheng:
Localization of Fake News Detection via Multitask Transfer Learning. 2596-2604 - Edresson Casanova, Marcos V. Treviso, Lilian Hübner, Sandra M. Aluísio:
Evaluating Sentence Segmentation in Different Datasets of Neuropsychological Language Tests in Brazilian Portuguese. 2605-2614 - Kyubyong Park, Yo Joong Choe, Jiyeon Ham:
Jejueo Datasets for Machine Translation and Speech Synthesis. 2615-2621 - Kohei Matsuura, Sei Ueno, Masato Mimura, Shinsuke Sakai, Tatsuya Kawahara:
Speech Corpus of Ainu Folklore and End-to-end Speech Recognition for Ainu Language. 2622-2628 - Luis Chiruzzo, Pedro J. Amarilla, Adolfo A. Rios, Gustavo Giménez Lugo:
Development of a Guarani - Spanish Parallel Corpus. 2629-2633 - Leila Ouahrani, Djamal Bennouar:
AR-ASAG An ARabic Dataset for Automatic Short Answer Grading Evaluation. 2634-2643 - Anne Ferger:
Processing Language Resources of Under-Resourced and Endangered Languages for the Generation of Augmentative Alternative Communication Boards. 2644-2648 - Jocelyn Aznar, Núria Gala:
The Nisvai Corpus of Oral Narrative Practices from Malekula (Vanuatu) and its Associated Language Resources. 2649-2656 - Ludger Paschen, François Delafontaine, Christoph Draxler, Susanne Fuchs, Matthew Stave, Frank Seifart:
Building a Time-Aligned Cross-Linguistic Reference Corpus from Language Documentation Data (DoReCo). 2657-2666 - Kevin Duh, Paul McNamee, Matt Post, Brian Thompson:
Benchmarking Neural and Statistical Machine Translation on Low-Resource African Languages. 2667-2675 - Emily Chen, Hyunji Hayley Park, Lane Schwartz:
Improved Finite-State Morphological Analysis for St. Lawrence Island Yupik Using Paradigm Function Morphology. 2676-2684 - Marcelo Yuji Himoro, Antonio Pareja-Lora:
Towards a Spell Checker for Zamboanga Chavacano Orthography. 2685-2697 - Wafia Adouane, Samia Touileb, Jean-Philippe Bernardy:
Identifying Sentiments in Algerian Code-switched User-generated Comments. 2698-2705 - Lucy Linder, Michael Jungo, Jean Hennebert, Claudiu Cristian Musat, Andreas Fischer:
Automatic Creation of Text Corpora for Low-Resource Languages from the Internet: The Case of Swiss German. 2706-2711 - Ali Hakimi Parizi, Paul Cook:
Evaluating Sub-word Embeddings in Cross-lingual Models. 2712-2719 - Larissa Schmidt, Lucy Linder, Sandra Djambazovska, Alexandros Lazaridis, Tanja Samardzic, Claudiu Musat:
A Swiss German Dictionary: Variation in Speech and Writing. 2720-2725 - Laurent Kevers, Stella Retali-Medori:
Towards a Corsican Basic Language Resource Kit. 2726-2735 - Jeremie Boudreau, Akankshya Patra, Ashima Suvarna, Paul Cook:
Evaluating the Impact of Sub-word Information and Cross-lingual Word Embeddings on Mi'kmaq Language Modelling. 2736-2745 - Jacqueline Brixey, David J. Sides, Timothy Vizthum, David R. Traum, Khalil Iskarous:
Exploring a Choctaw Language Corpus with Word Vectors and Minimum Distance Length. 2746-2753 - Jesujoba O. Alabi, Kwabena Amponsah-Kaakyire, David Ifeoluwa Adelani, Cristina España-Bonet:
Massive vs. Curated Embeddings for Low-Resourced Languages: the Case of Yorùbá and Twi. 2754-2762 - Neslihan Kara, Deniz Baran Aslan, Büsra Marsan, Özge Bakay, Koray Ak, Olcay Taner Yildiz:
TRopBank: Turkish PropBank V2.0. 2763-2772 - Dan Tufis, Maria Mitrofan, Vasile Florian Pais, Radu Ion, Andrei Coman:
Collection and Annotation of the Romanian Legal Corpus. 2773-2777 - Kilu von Prince, Sebastian Nordhoff:
An Empirical Evaluation of Annotation Practices in Corpora from Language Documentation. 2778-2787 - Gaurav Mohanty, Pruthwik Mishra, Radhika Mamidi:
Annotated Corpus for Sentiment Analysis in Odia Language. 2788-2795 - Maddalen Lopez de Lacalle, Xabier Saralegi, Iñaki San Vicente:
Building a Task-oriented Dialog System for Languages with no Training Data: the Case for Basque. 2796-2802 - Elhadji Mamadou Nguer, Alla Lo, Cheikh M. Bamba Dione, Sileye O. Ba, Moussa Lo:
SENCORPUS: A French-Wolof Parallel Corpus. 2803-2811 - Gábor Bella, Fiona McNeill, Rody Gorman, Caoimhin O. Donnaile, Kirsty MacDonald, Yamini Chandrashekar, Abed Alhakim Freihat, Fausto Giunchiglia:
A Major Wordnet for a Minority Language: Scottish Gaelic. 2812-2818 - Basil Abraham, Danish Goel, Divya Siddarth, Kalika Bali, Manu Chopra, Monojit Choudhury, Pratik Joshi, Preethi Jyothi, Sunayana Sitaram, Vivek Seshadri:
Crowdsourcing Speech Data for Low-Resource Languages from Low-Income Workers. 2819-2826 - Hilaria Cruz, Antonios Anastasopoulos, Gregory Stump:
A Resource for Studying Chatino Verbal Morphology. 2827-2831 - Devansh Mehta, Sebastin Santy, Ramaravind Kommiya Mothilal, Brij Mohan Lal Srivastava, Alok Sharma, Anurag Shukla, Vishnu Prasad, Venkanna U, Amit Sharma, Kalika Bali:
Learnings from Technological Interventions in a Low Resource Language: A Case-Study on Gondi. 2832-2838 - Preni Golazizian, Behnam Sabeti, Seyed Arad Ashrafi Asli, Zahra Majdabadi, Omid Momenzadeh, Reza Fahmi:
Irony Detection in Persian Language: A Transfer Learning Approach Using Emoji Prediction. 2839-2845 - David Bamutura, Peter Ljunglöf, Peter Nebende:
Towards Computational Resource Grammars for Runyankore and Rukiga. 2846-2854 - Seyed Arad Ashrafi Asli, Behnam Sabeti, Zahra Majdabadi, Preni Golazizian, Reza Fahmi, Omid Momenzadeh:
Optimizing Annotation Effort Using Active Learning Strategies: A Sentiment Analysis Case Study in Persian. 2855-2861 - Md Zobaer Hossain, Md Ashraful Rahman, Md. Saiful Islam, Sudipta Kar:
BanFakeNews: A Dataset for Detecting Fake News in Bangla. 2862-2871 - Mingjun Duan, Carlos Fasola, Sai Krishna Rallabandi, Rodolfo Vega, Antonios Anastasopoulos, Lori S. Levin, Alan W. Black:
A Resource for Computational Experiments on Mapudungun. 2872-2877 - Erich R. Round, Mark Ellison, Jayden L. Macklin-Cordes, Sacha Beniamine:
Automated Parsing of Interlinear Glossed Text from Page Images of Grammatical Descriptions. 2878-2883 - Arya D. McCarthy, Rachel Wicks, Dylan Lewis, Aaron Mueller, Winston Wu, Oliver Adams, Garrett Nicolai, Matt Post, David Yarowsky:
The Johns Hopkins University Bible Corpus: 1600+ Tongues for Typological Exploration. 2884-2892 - Alexander Zahrer, Andrej Zgank, Barbara Schuppler:
Towards Building an Automatic Transcription System for Language Documentation: Experiences from Muyu. 2893-2900 - Daniel Jettka, Timm Lehmberg:
Towards Flexible Cross-Resource Exploitation of Heterogeneous Language Documentation Data. 2901-2905 - Grégoire Winterstein, Carmen Tang, Regine Lai:
CantoMap: a Hong Kong Cantonese MapTask Corpus. 2906-2913 - Gina Bustamante, Arturo Oncevay, Roberto Zariquiey:
No Data to Crawl? Monolingual Corpus Creation from PDF Files of Truly low-Resource Languages in Peru. 2914-2923 - Hildur Jónsdóttir, Anton Karl Ingason:
Creating a Parallel Icelandic Dependency Treebank from Raw Text to Universal Dependencies. 2924-2931 - Aleksandra Miletic, Myriam Bras, Marianne Vergez-Couret, Louise Esher, Clamença Poujade, Jean Sibille:
Building a Universal Dependencies Treebank for Occitan. 2932-2939 - David Moeljadi, Zakariya Pamuji Aminullah:
Building the Old Javanese Wordnet. 2940-2946 - Gerardo Eugenio Sierra Martínez, Cynthia Montaño, Gemma Bel-Enguix, Diego Córdova, Margarita Mota Montoya:
CPLM, a Parallel Corpus for Mexican Languages: Development and Interface. 2947-2952 - Wazir Ali, Junyu Lu, Zenglin Xu:
SiNER: A Large Dataset for Sindhi Named Entity Recognition. 2953-2961 - Li Song, Yuling Dai, Yihuan Liu, Bin Li, Weiguang Qu:
Construct a Sense-Frame Aligned Predicate Lexicon for Chinese AMR Corpus. 2962-2969 - Lifeng Han, Gareth J. F. Jones, Alan F. Smeaton:
MultiMWE: Building a Multi-lingual Multi-Word Expression (MWE) Parallel Corpora. 2970-2979 - Aye Myat Mon, Chenchen Ding, Hour Kaing, Khin Mar Soe, Masao Utiyama, Eiichiro Sumita:
A Myanmar (Burmese)-English Named Entity Transliteration Dictionary. 2980-2983 - Peng-Hsuan Li, Tsan-Yu Yang, Wei-Yun Ma:
CA-EHN: Commonsense Analogy from E-HowNet. 2984-2990 - Valentina Leone, Giovanni Siragusa, Luigi Di Caro, Roberto Navigli:
Building Semantic Grams of Human Knowledge. 2991-3000 - Ana Sabina Uban, Liviu P. Dinu:
Automatically Building a Multilingual Lexicon of False Friends With No Supervision. 3001-3007 - Krasimir Angelov:
A Parallel WordNet for English, Swedish and Bulgarian. 3008-3015 - Franck Sajous, Basilio Calderone, Nabil Hathout:
ENGLAWI: From Human- to Machine-Readable Wiktionary. 3016-3026 - Sacha Beniamine, Martin Maiden, Erich R. Round:
Opening the Romance Verbal Inflection Dataset 2.0: A CLDF lexicon. 3027-3035 - Yo Joong Choe, Kyubyong Park, Dongwoo Kim:
word2word: A Collection of Bilingual Lexicons for 3, 564 Language Pairs. 3036-3045 - Bruno Cartoni, Daniel Calvelo Aros, Denny Vrandecic, Saran Lertpradit:
Introducing Lexical Masks: a New Representation of Lexical Entries for Better Evaluation and Exchange of Lexicons. 3046-3052 - Muhamed Al-Khalil, Nizar Habash, Zhengyang Jiang:
A Large-Scale Leveled Readability Lexicon for Standard Arabic. 3053-3062 - Achim Stein:
Preserving Semantic Information from Old Dictionaries: Linking Senses of the 'Altfranzösisches Wörterbuch' to WordNet. 3063-3068 - Regine Lai, Grégoire Winterstein:
Cifu: a Frequency Lexicon of Hong Kong Cantonese. 3069-3077 - Rachele Sprugnoli, Marco Passarotti, Daniela M. Corbetta, Andrea Peverelli:
Odi et Amo. Creating, Evaluating and Extending Sentiment Lexicons for Latin. 3078-3086 - Saif M. Mohammad:
WordWars: A Dataset to Examine the Natural Selection of Words. 3087-3095 - Diptesh Kanojia, Malhar Kulkarni, Pushpak Bhattacharyya, Gholamreza Haffari:
Challenge Dataset of Cognates and False Friend Pairs from Indian Languages. 3096-3102 - Ritsuko Iwai, Daisuke Kawahara, Takatsune Kumada, Sadao Kurohashi:
Development of a Japanese Personality Dictionary based on Psychological Methods. 3103-3108 - Jumayel Islam, Lu Xiao, Robert E. Mercer:
A Lexicon-Based Approach for Detecting Hedges in Informal Text. 3109-3113 - Daiki Nishihara, Tomoyuki Kajiwara:
Word Complexity Estimation for Japanese Lexical Simplification. 3114-3120 - Da Huo, Gerard de Melo:
Inducing Universal Semantic Tag Vectors. 3121-3127 - Matthew Coole, Paul Rayson, John Mariani:
LexiDB: Patterns & Methods for Corpus Linguistic Database Management. 3128-3135 - Václava Kettnerová, Markéta Lopatková, Anna Vernerová, Petra Barancíková:
Towards a Semi-Automatic Detection of Reflexive and Reciprocal Constructions and Their Representation in a Valency Lexicon. 3136-3144 - Chahan Vidal-Gorène, Aliénor Decours-Perez:
Languages Resources for Poorly Endowed Languages : The Case Study of Classical Armenian. 3145-3152 - Koichi Takeuchi, Alastair Butler, Iku Nagasaki, Takuya Okamura, Prashant Pardeshi:
Constructing Web-Accessible Semantic Role Labels and Frames for Japanese as Additions to the NPCMJ Parsed Corpus. 3153-3161 - Piek Vossen, Filip Ilievski, Marten Postma, Antske Fokkens, Gosse Minnema, Levi Remijnse:
Large-scale Cross-lingual Language Resources for Referencing and Framing. 3162-3171 - Anas Fahad Khan, Laurent Romary, Ana Salgado, Jack Bowers, Mohamed Khemakhem, Toma Tasovac:
Modelling Etymology in LMF/TEI: The Grande Dicionário Houaiss da Língua Portuguesa Dictionary as a Use Case. 3172-3180 - Francis Bond, Hiroki Nomoto, Luís Morgado da Costa, Arthur Bond:
Linking the TUFS Basic Vocabulary to the Open Multilingual Wordnet. 3181-3188 - Francis Bond, Luís Morgado da Costa, Michael Wayne Goodman, John Philip McCrae, Ahti Lohk:
Some Issues with Building a Multilingual Wordnet. 3189-3197 - Maria Khokhlova:
Collocations in Russian Lexicography and Russian Collocations Database. 3198-3206 - Clémentine Fourrier, Benoît Sagot:
Methodological Aspects of Developing and Managing an Etymological Lexical Resource: Introducing EtymDB-2.0. 3207-3216 - Gaël Guibon, Benoît Sagot:
OFrLex: A Computational Morphological and Syntactic Lexicon for Old French. 3217-3225 - Alina Maria Ciobanu, Liviu P. Dinu, Laurentiu Zoicas:
Automatic Reconstruction of Missing Romanian Cognates and Unattested Latin Words. 3226-3231 - Sina Ahmadi, John Philip McCrae, Sanni Nimb, Anas Fahad Khan, Monica Monachini, Bolette S. Pedersen, Thierry Declerck, Tanja Wissik, Andrea Bellandi, Irene Pisani, Thomas Troelsgård, Sussi Olsen, Simon Krek, Veronika Lipp, Tamás Váradi, László Simon, András Gyorffy, Carole Tiberius, Tanneke Schoonheim, Yifat Ben Moshe, Maya Rudich, Raya Abu Ahmad, Dorielle Lonke, Kira Kovalenko, Margit Langemets, Jelena Kallas, Oksana Dereza, Theodorus Fransen, David Cillessen, David Lindemann, Mikel Alonso, Ana Salgado, José-Luis Sancho, Rafael-J. Ureña-Ruiz, Jordi Porta Zamorano, Kiril Simov, Petya Osenova, Zara Kancheva, Ivaylo Radev, Ranka Stankovic, Andrej Perdih, Dejan Gabrovsek:
A Multilingual Evaluation Dataset for Monolingual Word Sense Alignment. 3232-3242 - James F. Allen, Hannah An, Ritwik Bose, William de Beaumont, Choh Man Teng:
A Broad-Coverage Deep Semantic Lexicon for Verbs. 3243-3251 - Winston Wu, David Yarowsky:
Computational Etymology and Word Emergence. 3252-3259 - Ewa Rudnicka, Tomasz Naskret:
A Dataset of Translational Equivalents Built on the Basis of plWordNet-Princeton WordNet Synset Mapping. 3260-3264 - Fernando Benites, Gilbert François Duivesteijn, Pius von Däniken, Mark Cieliebak:
TRANSLIT: A Large-scale Name Transliteration Resource. 3265-3271 - Caio Libânio Melo Jerônimo, Cláudio Elízio Calazans Campelo, Leandro Balby Marinho, Allan Sales da Costa Melo, Adriano Veloso, Roberta Viola:
Computing with Subjectivity Lexicons. 3272-3280 - Christian Chiarcos, Christian Fäth, Maxim Ionov:
The ACoLi Dictionary Graph. 3281-3290 - Ludmila Midrigan-Ciochina, Victoria Boyd, Lucila Sanchez-Ortega, Diana Malancea-Malac, Doina Midrigan, David P. Corina:
Resources in Underrepresented Languages: Building a Representative Romanian Corpus. 3291-3296 - Sabine Kirchmeier, Bolette S. Pedersen, Sanni Nimb, Philip Diderichsen, Peter Juel Henrichsen:
World Class Language Technology - Developing a Language Technology Strategy for Danish. 3297-3301 - Alessia Battisti, Dominik Pfütze, Andreas Säuberli, Marek Kostrzewa, Sarah Ebling:
A Corpus for Automatic Readability Assessment and Text Simplification of German. 3302-3311 - Henk van den Heuvel, Nelleke Oostdijk, Caroline F. Rowland, Paul Trilsbeek:
The CLARIN Knowledge Centre for Atypical Communication Expertise. 3312-3316 - Henk van den Heuvel, Aleksei Kelli, Katarzyna Klessa, Satu Salaasti:
Corpora of Disordered Speech in the Light of the GDPR: Two Use Cases from the DELAD Initiative. 3317-3321 - Georg Rehm, Katrin Marheinecke, Stefanie Hegele, Stelios Piperidis, Kalina Bontcheva, Jan Hajic, Khalid Choukri, Andrejs Vasiljevs, Gerhard Backfried, Christoph Prinz, José Manuél Gómez-Pérez, Luc Meertens, Paul Lukowicz, Josef van Genabith, Andrea Lösch, Philipp Slusallek, Morten Irgens, Patrick Gatellier, Joachim Köhler, Laure Le Bars, Dimitra Anastasiou, Albina Auksoriute, Núria Bel, António Branco, Gerhard Budin, Walter Daelemans, Koenraad De Smedt, Radovan Garabík, Maria Gavriilidou, Dagmar Gromann, Svetla Koeva, Simon Krek, Cvetana Krstev, Krister Lindén, Bernardo Magnini, Jan Odijk, Maciej Ogrodniczuk, Eiríkur Rögnvaldsson, Mike Rosner, Bolette S. Pedersen, Inguna Skadina, Marko Tadic, Dan Tufis, Tamás Váradi, Kadri Vider, Andy Way, François Yvon:
The European Language Technology Landscape in 2020: Language-Centric and Human-Centric AI for Cross-Cultural Communication in Multilingual Europe. 3322-3332 - Frances Gillis-Webber, Sabine Tittel:
A Framework for Shared Agreement of Language Tags beyond ISO 639. 3333-3339 - Simon Krek, Spela Arhar Holdt, Tomaz Erjavec, Jaka Cibej, Andraz Repar, Polona Gantar, Nikola Ljubesic, Iztok Kosem, Kaja Dobrovoljc:
Gigafida 2.0: The Reference Corpus of Written Standard Slovene. 3340-3345 - Stefan Evert, Oleg Harlamov, Philipp Heinrich, Piotr Banski:
Corpus Query Lingua Franca part II: Ontology. 3346-3352 - Christoph Draxler, Henk van den Heuvel, Arjan van Hessen, Silvia Calamai, Louise Corti:
A CLARIN Transcription Portal for Interview Data. 3353-3359 - Georgios Petasis, Leonidas Tsekouras:
Ellogon Casual Annotation Infrastructure. 3360-3365 - Georg Rehm, Maria Berger, Ela Elsholz, Stefanie Hegele, Florian Kintzel, Katrin Marheinecke, Stelios Piperidis, Miltos Deligiannis, Dimitris Galanis, Katerina Gkirtzou, Penny Labropoulou, Kalina Bontcheva, David Jones, Ian Roberts, Jan Hajic, Jana Hamrlová, Lukás Kacena, Khalid Choukri, Victoria Arranz, Andrejs Vasiljevs, Orians Anvari, Andis Lagzdins, Julija Melnika, Gerhard Backfried, Erinç Dikici, Miroslav Jánosík, Katja Prinz, Christoph Prinz, Severin Stampler, Dorothea Thomas-Aniola, José Manuél Gómez-Pérez, Andrés García-Silva, Cristian Berrio, Ulrich Germann, Steve Renals, Ondrej Klejch:
European Language Grid: An Overview. 3366-3380 - Andrejs Vasiljevs, Inguna Skadina, Indra Samite, Kaspars Kaulins, Eriks Ajausks, Julija Melnika, Aivars Berzins:
The Competitiveness Analysis of the European Language Technology Market. 3381-3389 - Shatha Altammami, Eric Atwell, Ammar Alsalka:
Constructing a Bilingual Hadith Corpus Using a Segmentation Tool. 3390-3398 - Steinþór Steingrímsson, Starkaður Barkarson, Gunnar Thor Örnólfsson:
Facilitating Corpus Usage: Making Icelandic Corpora More Accessible for Researchers and Language Users. 3399-3405 - Franciska de Jong, Bente Maegaard, Darja Fiser, Dieter Van Uytvanck, Andreas Witt:
Interoperability in an Infrastructure Enabling Multidisciplinary Research: The case of CLARIN. 3406-3413 - Anna Björk Nikulásdóttir, Jón Guðnason, Anton Karl Ingason, Hrafn Loftsson, Eiríkur Rögnvaldsson, Einar Freyr Sigurðsson, Steinþór Steingrímsson:
Language Technology Programme for Icelandic 2019-2023. 3414-3422 - Pawel Kamocki, Andreas Witt:
Privacy by Design and Language Resources. 3423-3427 - Penny Labropoulou, Katerina Gkirtzou, Maria Gavriilidou, Miltos Deligiannis, Dimitris Galanis, Stelios Piperidis, Georg Rehm, Maria Berger, Valérie Mapelli, Mickaël Rigault, Victoria Arranz, Khalid Choukri, Gerhard Backfried, José Manuél Gómez-Pérez, Andrés García-Silva:
Making Metadata Fit for Next Generation Language Technology Platforms: The Metadata Schema of the European Language Grid. 3428-3437 - Daniel Jaquette, Christopher Cieri, Denise DiPersio:
Related Works in the Linguistic Data Consortium Catalog. 3438-3442 - Lilli Smal, Andrea Lösch, Josef van Genabith, Maria Giagkou, Thierry Declerck, Stephan Busemann:
Language Data Sharing in European Public Services - Overcoming Obstacles and Creating Sustainable Data Sharing Infrastructures. 3443-3448 - Christopher Cieri, James Fiumara, Stephanie M. Strassel, Jonathan Wright, Denise DiPersio, Mark Liberman:
A Progress Report on Activities at the Linguistic Data Consortium Benefitting the LREC Community. 3449-3456 - Verena Lyding, Alexander König, Monica Pretti:
Digital Language Infrastructures - Documenting Language Actors. 3457-3462 - David Erik Mollberg, Ólafur Helgi Jónsson, Sunneva THorsteinsdóttir, Steinþór Steingrímsson, Eydís Huld Magnúsdóttir, Jón Guðnason:
Samrómur: Crowd-sourcing Data Collection for Icelandic Speech Recognition. 3463-3467 - Astik Biswas, Emre Yilmaz, Febe de Wet, Ewald van der Westhuizen, Thomas Niesler:
Semi-supervised Development of ASR Systems for Multilingual Code-switched Speech in Under-resourced Languages. 3468-3474 - Michail Mersinias, Stergos D. Afantenos, Georgios Chalkiadakis:
CLFD: A Novel Vectorization Technique and Its Application in Fake News Detection. 3475-3483 - Namoos Hayat Qasmi, Haris Bin Zia, Awais Athar, Agha Ali Raza:
SimplifyUR: Unsupervised Lexical Text Simplification for Urdu. 3484-3489 - Sangwhan Moon, Naoaki Okazaki:
Jamo Pair Encoding: Subcharacter Representation-based Extreme Korean Vocabulary Compression for Efficient Subword Tokenization. 3490-3497 - Gudbjartur Ingi Sigurbergsson, Leon Derczynski:
Offensive Language and Hate Speech Detection for Danish. 3498-3508 - Zheng Xin Yong, Tiago Timponi Torrent:
Semi-supervised Deep Embedded Clustering with Anomaly Detection for Semantic Frame Induction. 3509-3519 - Ritiz Tambi, Ajinkya Kale, Tracy Holloway King:
Search Query Language Identification Using Weak Labeling. 3520-3527 - Aleksi Sahala, Miikka Silfverberg, Antti Arppe, Krister Lindén:
Automated Phonological Transcription of Akkadian Cuneiform Text. 3528-3534 - Petra Barancíková, Ondrej Bojar:
COSTRA 1.0: A Dataset of Complex Sentence Transformations. 3535-3541 - Maria Joana Correia, Isabel Trancoso, Bhiksha Raj:
Automatic In-the-wild Dataset Annotation with Deep Generalized Multiple Instance Learning. 3542-3550 - Phillip Ströbel, Simon Clematide, Martin Volk:
How Much Data Do You Need? About the Creation of a Ground Truth for Black Letter and the Effectiveness of Neural OCR. 3551-3559 - Jakob Jungmaier, Nora Kassner, Benjamin Roth:
Dirichlet-Smoothed Word Embeddings for Low-Resource Settings. 3560-3565 - João Monteiro, Jahangir Alam, Tiago H. Falk:
On The Performance of Time-Pooling Strategies for End-to-End Spoken Language Identification. 3566-3572 - José María Hoya Quecedo, Koppatz Maximilian, Roman Yangarber:
Neural Disambiguation of Lemma and Part of Speech in Morphologically Rich Languages. 3573-3582 - Jiawei Zhao, Andrew Gilman:
Non-Linearity in Mapping Based Cross-Lingual Word Embeddings. 3583-3589 - Benjamin Beilharz, Xin Sun, Sariya Karimova, Stefan Riezler:
LibriVoxDeEn: A Corpus for German-to-English Speech Translation and German Speech Recognition. 3590-3594 - Abbas Ghaddar, Philippe Langlais:
SEDAR: a Large Scale French-English Financial Domain Parallel Corpus. 3595-3602 - Makoto Morishita, Jun Suzuki, Masaaki Nagata:
JParaCrawl: A Large Scale Web-Based English-Japanese Parallel Corpus. 3603-3609 - Himanshu Choudhary, Shivansh Rao, Rajesh Rohilla:
Neural Machine Translation for Low-Resourced Indian Languages. 3610-3615 - Hideya Mino, Hideki Tanaka, Hitoshi Ito, Isao Goto, Ichiro Yamada, Takenobu Tokunaga:
Content-Equivalent Translated Parallel News Corpus and Extension of Domain Adaptation for NMT. 3616-3622 - Helena de Medeiros Caseli, Marcio Lima Inácio:
NMT and PBSMT Error Analyses in English to Brazilian Portuguese Automatic Translations. 3623-3629 - Sho Shimazu, Sho Takase, Toshiaki Nakazawa, Naoaki Okazaki:
Evaluation Dataset for Zero Pronoun in Japanese to English Translation. 3630-3634 - Ádám Kovács, Judit Ács, András Kornai, Gábor Recski:
Better Together: Modern Methods Plus Traditional Thinking in NP Alignment. 3635-3639 - Haiyue Song, Raj Dabre, Atsushi Fujita, Sadao Kurohashi:
Coursera Corpus Mining and Multistage Fine-Tuning for Improving Lectures Translation. 3640-3649 - Arne Defauw, Tom Vanallemeersch, Koen Van Winckel, Sara Szoc, Joachim Van den Bogaert:
Being Generous with Sub-Words towards Small NMT Children. 3650-3656 - Radina Dobreva, Jie Zhou, Rachel Bawden:
Document Sub-structure in Neural Machine Translation. 3657-3667 - Alessandro Raganato, Yves Scherrer, Jörg Tiedemann:
An Evaluation Benchmark for Testing the Word Sense Disambiguation Capabilities of Machine Translation Systems. 3668-3675 - Aurélie Névéol, Antonio Jimeno-Yepes, Mariana L. Neves:
MEDLINE as a Parallel Corpus: a Survey to Gain Insight on French-, Spanish- and Portuguese-speaking Authors' Abstract Writing Practice. 3676-3682 - Zhuoyuan Mao, Fabien Cromierès, Raj Dabre, Haiyue Song, Sadao Kurohashi:
JASS: Japanese-specific Sequence to Sequence Pre-training for Neural Machine Translation. 3683-3691 - Julia Ive, Lucia Specia, Sara Szoc, Tom Vanallemeersch, Joachim Van den Bogaert, Eduardo Farah, Christine Maroti, Artur Ventura, Maxim Khalilov:
A Post-Editing Dataset in the Legal Domain: Do we Underestimate Neural Machine Translation Quality? 3692-3697 - Vikrant Goyal, Pruthwik Mishra, Dipti Misra Sharma:
Linguistically Informed Hindi-English Neural Machine Translation. 3698-3703 - Masaaki Nagata, Makoto Morishita:
A Test Set for Discourse Translation from Japanese to English. 3704-3709 - Aaron Mueller, Garrett Nicolai, Arya D. McCarthy, Dylan Lewis, Winston Wu, David Yarowsky:
An Analysis of Massively Multilingual Neural Machine Translation for Low-Resource Languages. 3710-3718 - Nobushige Doi, Yusuke Oda, Toshiaki Nakazawa:
TDDC: Timely Disclosure Documents Corpus. 3719-3726 - Alina Karakanta, Matteo Negri, Marco Turchi:
MuST-Cinema: a Speech-to-Subtitles corpus. 3727-3734 - Sheila Castilho, Maja Popovic, Andy Way:
On Context Span Needed for Machine Translation Evaluation. 3735-3742 - Shashank Siripragada, Jerin Philip, Vinay P. Namboodiri, C. V. Jawahar:
A Multilingual Parallel Corpora Collection Effort for Indian Languages. 3743-3751 - Thierry Etchegoyhen, Harritxu Gete:
To Case or not to case: Evaluating Casing Methods for Neural Machine Translation. 3752-3760 - Tamás Váradi, Svetla Koeva, Martin Yamalov, Marko Tadic, Bálint Sass, Bartlomiej Niton, Maciej Ogrodniczuk, Piotr Pezik, Verginica Barbu Mititelu, Radu Ion, Elena Irimia, Maria Mitrofan, Vasile Florian Pais, Dan Tufis, Radovan Garabík, Simon Krek, Andraz Repar, Matjaz Rihtar, Janez Brank:
The MARCELL Legislative Corpus. 3761-3768 - Felipe Soares, Mark Stevenson, Diego Bartolomé, Anna Zaretskaya:
ParaPat: The Multi-Million Sentences Parallel Corpus of Patents Abstracts. 3769-3774 - Siyou Liu, Xiaojun Zhang:
Corpora for Document-Level Neural Machine Translation. 3775-3781 - Mikko Aulamo, Umut Sulubacak, Sami Virpioja, Jörg Tiedemann:
OpusTools and Parallel Corpus Diagnostics. 3782-3789 - Margot Fonteyne, Arda Tezcan, Lieve Macken:
Literary Machine Translation under the Magnifying Glass: Assessing the Quality of an NMT-Translated Detective Novel on Document Level. 3790-3798 - Thierry Etchegoyhen, Harritxu Gete:
Handle with Care: A Case Study in Comparable Corpora Exploitation for Neural Machine Translation. 3799-3807 - Jörg Tiedemann, Tommi Nieminen, Mikko Aulamo, Jenna Kanerva, Akseli Leino, Filip Ginter, Niko Papula:
The FISKMÖ Project: Resources and Tools for Finnish-Swedish Machine Translation and Cross-Linguistic Research. 3808-3815 - Andrea Zaninello, Alexandra Birch:
Multiword Expression aware Neural Machine Translation. 3816-3825 - Myung Hee Kim, Nathalie Colineau:
An Enhanced Mapping Scheme of the Universal Part-Of-Speech for Korean. 3826-3833 - Maha Alkhairy, Afshan Jafri, David Smith:
Finite State Machine Pattern-Root Arabic Morphological Generator, Analyzer and Diacritizer. 3834-3841 - Amr Keleg, Francis M. Tyers, Nick Howell, Flammie A. Pirinen:
An Unsupervised Method for Weighting Finite-state Morphological Analyzers. 3842-3850 - Danushka Bollegala, Ryuichi Kiryo, Kosuke Tsujino, Haruki Yukawa:
Language-Independent Tokenisation Rivals Language-Specific Tokenisation for Word Similarity Prediction. 3851-3860 - Maria Nefeli Nikiforos, Katia Lida Kermanidis:
A Supervised Part-Of-Speech Tagger for the Greek Language of the Social Web. 3861-3867 - Anne Jonker, Corné de Ruijt, Jornt de Gruijl:
Bag & Tag'em - A New Dutch Stemmer. 3868-3876 - Nabil Hathout, Franck Sajous, Basilio Calderone, Fiammetta Namer:
Glawinette: a Linguistically Motivated Derivational Description of French Acquired from GLAWI. 3877-3885 - Aleksi Sahala, Miikka Silfverberg, Antti Arppe, Krister Lindén:
BabyFST - Towards a Finite-State Based Computational Model of Ancient Babylonian. 3886-3894 - Salam Khalifa, Nasser Zalmout, Nizar Habash:
Morphological Analysis and Disambiguation for Gulf Arabic: The Interplay between Resources and Methods. 3895-3904 - Eleni Metheniti, Guenter Neumann:
Wikinflection Corpus: A (Better) Multilingual, Morpheme-Annotated Inflectional Corpus. 3905-3912 - Oanh Tran, Tu Pham, Vu Dang, Bang Nguyen:
Introducing a Large-Scale Dataset for Vietnamese POS Tagging on Conversational Texts. 3913-3921 - Arya D. McCarthy, Christo Kirov, Matteo Grella, Amrit Nidhi, Patrick Xia, Kyle Gorman, Ekaterina Vylomova, Sabrina J. Mielke, Garrett Nicolai, Miikka Silfverberg, Timofey Arkhangelskiy, Nataly Krizhanovsky, Andrew Krizhanovsky, Elena Klyachko, Alexey Sorokin, John Mansfield, Valts Ernstreits, Yuval Pinter, Cassandra L. Jacobs, Ryan Cotterell, Mans Hulden, David Yarowsky:
UniMorph 3.0: Universal Morphology. 3922-3931 - Bojana Mikelenic, Marko Tadic:
Building the Spanish-Croatian Parallel Corpus. 3932-3936 - Daniil Vodolazsky:
DerivBase.Ru: a Derivational Morphology Resource for Russian. 3937-3943 - Stig-Arne Grönroos, Sami Virpioja, Mikko Kurimo:
Morfessor EM+Prune: Improved Subword Segmentation with Expectation Maximization and Pruning. 3944-3953 - Ranka Stankovic, Branislava Sandrih, Cvetana Krstev, Milos Utvic, Mihailo Skoric:
Machine Learning and Deep Neural Network-Based Lemmatization and Morphosyntactic Tagging for Serbian. 3954-3962 - Garrett Nicolai, Dylan Lewis, Arya D. McCarthy, Aaron Mueller, Winston Wu, David Yarowsky:
Fine-grained Morphosyntactic Analysis and Generation Tools for More Than One Thousand Languages. 3963-3972 - Mohamed Balabel, Injy Hamed, Slim Abdennadher, Ngoc Thang Vu, Özlem Çetinoglu:
Cairo Student Code-Switch (CSCS) Corpus: An Annotated Egyptian Arabic-English Corpus. 3973-3977 - Alexey Sorokin:
Getting More Data for Low-resource Morphological Inflection: Language Models and Data Augmentation. 3978-3983 - Berke Özenç, Ercan Solak:
Visual Modeling of Turkish Morphology. 3984-3990 - Jón Daðason, David Erik Mollberg, Hrafn Loftsson, Kristín Bjarnadóttir:
Kvistur 2.0: a BiLSTM Compound Splitter for Icelandic. 3991-3995 - Justin Mott, Ann Bies, Stephanie M. Strassel, Jordan Kodner, Caitlin Richter, Hongzhi Xu, Mitchell Marcus:
Morphological Segmentation for Low Resource Languages. 3996-4002 - Guillaume Wenzek, Marie-Anne Lachaux, Alexis Conneau, Vishrav Chaudhary, Francisco Guzmán, Armand Joulin, Edouard Grave:
CCNet: Extracting High Quality Monolingual Datasets from Web Crawl Data. 4003-4012 - Yerai Doval, José Camacho-Collados, Luis Espinosa Anke, Steven Schockaert:
On the Robustness of Unsupervised and Semi-supervised Cross-lingual Word Embedding Learning. 4013-4023 - Yuming Zhai, Lufei Liu, Xinyi Zhong, Gabriel Illouz, Anne Vilnat:
Building an English-Chinese Parallel Corpus Annotated with Sub-sentential Translation Techniques. 4024-4033 - Joakim Nivre, Marie-Catherine de Marneffe, Filip Ginter, Jan Hajic, Christopher D. Manning, Sampo Pyysalo, Sebastian Schuster, Francis M. Tyers, Daniel Zeman:
Universal Dependencies v2: An Evergrowing Multilingual Treebank Collection. 4034-4043 - Iris Serrat Roozen, José Manuel Martínez Martínez:
EMPAC: an English-Spanish Corpus of Institutional Subtitles. 4044-4053 - Elmurod Kuriyozov, Yerai Doval, Carlos Gómez-Rodríguez:
Cross-Lingual Word Embeddings for Turkic Languages. 4054-4062 - Hiroshi Kanayama, Ran Iwamoto:
How Universal are Universal Dependencies? Exploiting Syntax for Multilingual Clause-level Sentiment Detection. 4063-4073 - Matej Ulcar, Kristiina Vaik, Jessica Lindström, Milda Dailidenaite, Marko Robnik-Sikonja:
Multilingual Culture-Independent Word Analogy Datasets. 4074-4080 - Marta R. Costa-jussà, Pau Li Lin, Cristina España-Bonet:
GeBioToolkit: Automatic Extraction of Gender-Balanced Multilingual Corpus of Wikipedia Biographies. 4081-4088 - Khia A. Johnson, Molly Babel, Ivan Fong, Nancy Yiu:
SpiCE: A New Open-Access Corpus of Conversational Bilingual Speech in Cantonese and English. 4089-4095 - Els Lefever, Sofie Labat, Pranaydeep Singh:
Identifying Cognates in English-Dutch and French-Dutch by means of Orthographic Information and Cross-lingual Word Embeddings. 4096-4101 - Maria Kunilovskaya, Ekaterina Lapshinova-Koltunski:
Lexicogrammatic translationese across two targets and competence levels. 4102-4112 - Ehsaneddin Asgari, Fabienne Braune, Benjamin Roth, Christoph Ringlstetter, Mohammad R. K. Mofrad:
UniSent: Universal Adaptable Sentiment Lexica for 1000+ Languages. 4113-4120 - Li Nguyen, Christopher Bryant:
CanVEC - the Canberra Vietnamese-English Code-switching Natural Speech Corpus. 4121-4129 - Fadhl Eryani, Nizar Habash, Houda Bouamor, Salam Khalifa:
A Spelling Correction Corpus for Multiple Arabic Dialects. 4130-4138 - Stephen Mutuvi, Antoine Doucet, Gaël Lejeune, Moses Odeo:
A Dataset for Multi-lingual Epidemiological Event Extraction. 4139-4144 - Julia Krasselt, Philipp Dressen, Matthias Fluor, Cerstin Mahlow, Klaus Rothenhäusler, Maren Runte:
Swiss-AL: A Multilingual Swiss Web Corpus for Applied Linguistics. 4145-4151 - Martha Yifiru Tachbelie, Solomon Teferra Abate, Tanja Schultz:
Analysis of GlobalPhone and Ethiopian Languages Speech Corpora for Multilingual ASR. 4152-4156 - Long-Huei Chen, Kyo Kageura:
Multilingualization of Medical Terminology: Semantic and Structural Embedding Approaches. 4157-4166 - Solomon Teferra Abate, Martha Yifiru Tachbelie, Michael Melese, Hafte Abera, Tewodros Abebe, Wondwossen Mulugeta, Yaregal Assabie, Million Meshesha, Solomon Afnafu, Binyam Ephrem Seyoum:
Large Vocabulary Read Speech Corpora for Four Ethiopian Languages: Amharic, Tigrigna, Oromo and Wolaytta. 4167-4171 - Mauajama Firdaus, Asif Ekbal, Pushpak Bhattacharyya:
Incorporating Politeness across Languages in Customer Care Responses: Towards building a Multi-lingual Empathetic Dialogue Agent. 4172-4182 - Cezar Sas, Meriem Beloucif, Anders Søgaard:
WikiBank: Using Wikidata to Improve Multilingual Frame-Semantic Parsing. 4183-4189 - Mahtab Ahmed, Chahna Dixit, Robert E. Mercer, Atif Khan, Muhammad Rifayat Samee, Felipe Urra:
Multilingual Corpus Creation for Multilingual Semantic Similarity Task. 4190-4196 - Changhan Wang, Juan Miguel Pino, Anne Wu, Jiatao Gu:
CoVoST: A Diverse Multilingual Speech-To-Text Translation Corpus. 4197-4203 - Hideki Nakayama, Akihiro Tamura, Takashi Ninomiya:
A Visually-Grounded Parallel Corpus with Phrase-to-Region Linking. 4204-4210 - Winston Wu, Garrett Nicolai, David Yarowsky:
Multilingual Dictionary Based Construction of Core Vocabulary. 4211-4217 - Rosana Ardila, Megan Branson, Kelly Davis, Michael Kohler, Josh Meyer, Michael Henretty, Reuben Morais, Lindsay Saunders, Francis M. Tyers, Gregor Weber:
Common Voice: A Massively-Multilingual Speech Corpus. 4218-4222 - Jackson L. Lee, Lucas F. E. Ashby, M. Elizabeth Garza, Yeonju Lee-Sikka, Sean Miller, Alan Wong, Arya D. McCarthy, Kyle Gorman:
Massively Multilingual Pronunciation Modeling with WikiPron. 4223-4228 - Anssi Yli-Jyrä, Josi Purhonen, Matti Liljeqvist, Arto Antturi, Pekka Nieminen, Kari M. Räntilä, Valtter Luoto:
HELFI: a Hebrew-Greek-Finnish Parallel Bible Corpus with Cross-Lingual Morpheme Alignment. 4229-4236 - Injy Hamed, Ngoc Thang Vu, Slim Abdennadher:
ArzEn: A Speech Corpus for Code-switched Egyptian Arabic-English. 4237-4246 - Aleksandr Khakhmovich, Svetlana Pavlova, Kira Kirillova, Nikolay Arefyev, Ekaterina Savilova:
Cross-lingual Named Entity List Search via Transliteration. 4247-4255 - Xavier Bost, Vincent Labatut, Georges Linarès:
Serial Speakers: a Dataset of TV Series. 4256-4264 - Masayasu Muraoka, Ryosuke Kohita, Etsuko Ishii:
Image Position Prediction in Multimodal Documents. 4265-4274 - Taichi Nishimura, Suzushi Tomori, Hayato Hashimoto, Atsushi Hashimoto, Yoko Yamakata, Jun Harashima, Yoshitaka Ushiku, Shinsuke Mori:
Visual Grounding Annotation of Recipe Flow Graph. 4275-4284 - Omar Adjali, Romaric Besançon, Olivier Ferret, Hervé Le Borgne, Brigitte Grau:
Building a Multimodal Entity Linking Dataset From Tweets. 4285-4292 - Salima Mdhaffar, Yannick Estève, Antoine Laurent, Nicolas Hernandez, Richard Dufour, Delphine Charlet, Géraldine Damnati, Solen Quiniou, Nathalie Camelin:
A Multimodal Educational Corpus of Oral Courses: Annotation, Analysis and Case Study. 4293-4301 - Hirotaka Kameko, Shinsuke Mori:
Annotating Event Appearance for Japanese Chess Commentary Corpus. 4302-4308 - Cleber Alcântara, Viviane Pereira Moreira, Diego de Vargas Feijó:
Offensive Video Detection: Dataset and Baseline Results. 4309-4319 - Daniela Trotta, Alessio Palmero Aprosio, Sara Tonelli, Annibale Elia:
Adding Gesture, Posture and Facial Displays to the PoliModal Corpus of Political Interviews. 4320-4326 - Lydia-Mai Ho-Dac, Serge Fleury, Claude Ponton:
E: Calm Resource: a Resource for Studying Texts Produced by French Pupils and Students. 4327-4332 - Michel-Pierre Jansen, Khiet P. Truong, Dirk K. J. Heylen, Deniece S. Nazareth:
Introducing MULAI: A Multimodal Database of Laughter during Dyadic Interactions. 4333-4342 - Nelleke Oostdijk, Hans van Halteren, Erkan Basar, Martha A. Larson:
The Connection between the Text and Images of News Articles: New Insights for Multimedia Analysis. 4343-4351 - Santiago Castro, Mahmoud Azab, Jonathan C. Stroud, Cristina Noujaim, Ruoyao Wang, Jia Deng, Rada Mihalcea:
LifeQA: A Real-life Dataset for Video Question Answering. 4352-4358 - Julia Bettinger, Anna Hätty, Michael Dorna, Sabine Schulte im Walde:
A Domain-Specific Dataset of Difficulty Ratings for German Noun Compounds in the Domains DIY, Cooking and Automotive. 4359-4367 - Yana Strakatova, Neele Falk, Isabel Fuhrmann, Erhard W. Hinrichs, Daniela Rossmann:
All That Glitters is Not Gold: A Gold Standard of Adjective-Noun Collocations for German. 4368-4378 - Pegah Alipoor, Sabine Schulte im Walde:
Variants of Vector Space Reductions for Predicting the Compositionality of English Noun Compounds. 4379-4387 - Anurag Nigam, Anna Hätty, Sabine Schulte im Walde:
Varying Vector Representations and Integrating Meaning Shifts into a PageRank Model for Automatic Term Extraction. 4388-4394 - Karën Fort, Bruno Guillaume, Yann-Alan Pilatte, Mathieu Constant, Nicolas Lefebvre:
Rigor Mortis: Annotating MWEs with a Gamified Platform. 4395-4401 - Murathan Kurfali, Robert Östling, Johan Sjons, Mats Wirén:
A Multi-word Expression Dataset for Swedish. 4402-4409 - Irina Krotova, Sergey Aksenov, Ekaterina Artemova:
A Joint Approach to Compound Splitting and Idiomatic Compound Detection. 4410-4417 - Ferdy Hubers, Catia Cucchiarini, Helmer Strik:
Dedicated Language Resources for Interdisciplinary Research on Multiword Expressions: Best Thing since Sliced Bread. 4418-4425 - Ekaterina Kochmar, Sian Gooding, Matthew Shardlow:
Detecting Multiword Expression Type Helps Lexical Complexity Assessment. 4426-4435 - Stefan Daniel Dumitrescu, Andrei-Marius Avram:
Introducing RONEC - the Romanian Named Entity Corpus. 4436-4443 - Hanna Berg, Hercules Dalianis:
A Semi-supervised Approach for De-identification of Swedish Clinical Text. 4444-4450 - Chin Lee, Hongliang Dai, Yangqiu Song, Xin Li:
A Chinese Corpus for Fine-grained Entity Typing. 4451-4457 - Helena Hubková, Pavel Král, Eva Pettersson:
Czech Historical Named Entity Corpus v 1.0. 4458-4465 - Elisabeth Eder, Ulrike Krieg-Holz, Udo Hahn:
CodE Alltag 2.0 - A Pseudonymized German-Language Email Corpus. 4466-4477 - Elena Leitner, Georg Rehm, Julián Moreno Schneider:
A Dataset of German Legal Documents for Named Entity Recognition. 4478-4485 - Aitor García Pablos, Naiara Pérez, Montse Cuadros:
Sensitive Data Detection and Classification in Spanish Clinical Text: Experiments with BERT. 4486-4494 - Sarah Schulz, Jurica Seva, Samuel Rodríguez, Malte Ostendorff, Georg Rehm:
Named Entities in Medical Case Reports: Corpus and Experiments. 4495-4500 - Marcus Klang, Pierre Nugues:
Hedwig: A Named Entity Linker. 4501-4508 - Sabine Barreaux, Dominique Besagni:
An Experiment in Annotating Animal Species Names from ISTEX Resources. 4509-4513 - Antoine Caubrière, Sophie Rosset, Yannick Estève, Antoine Laurent, Emmanuel Morin:
Where are we in Named Entity Recognition from Speech? 4514-4520 - Paul McNamee, James Mayfield, Cash Costello, Caitlyn Bishop, Shelby Anderson:
Tagging Location Phrases in Text. 4521-4528 - Hannah Smith, Zeyu Zhang, John Culnan, Peter Jansen:
ScienceExamCER: A High-Density Fine-Grained Science-Domain Corpus for Common Entity Recognition. 4529-4546 - Fredrik Jørgensen, Tobias Aasmoe, Anne-Stine Ruud Husevåg, Lilja Øvrelid, Erik Velldal:
NorNE: Annotating Named Entities for Norwegian. 4547-4556 - Felicitas Löffler, Nora Abdelmageed, Samira Babalou, Pawandeep Kaur, Birgitta König-Ries:
Tag Me If You Can! Semantic Annotation of Biodiversity Metadata with the QEMP Corpus and the BiodivTagger. 4557-4564 - Shuntaro Yada, Ayami Joh, Ribeka Tanaka, Fei Cheng, Eiji Aramaki, Sadao Kurohashi:
Towards a Versatile Medical-Annotation Guideline Feasible Without Heavy Medical Knowledge: Starting From Critical Lung Diseases. 4565-4572 - Alex Brandsen, Suzan Verberne, Milco Wansleeben, Karsten Lambers:
Creating a Dataset for Named Entity Recognition in the Archaeology Domain. 4573-4577 - Hongkuan Zhang, Ryohei Sasano, Koichi Takeda, Zoie Shui-Yee Wong:
Development of a Medical Incident Report Corpus with Intention and Factuality Annotation. 4578-4584 - Erik Faessler, Luise Modersohn, Christina Lohr, Udo Hahn:
ProGene - A Large-scale, High-Quality Protein-Gene Annotated Benchmark Corpus. 4585-4596 - Rasmus Hvingelby, Amalie Brogaard Pauli, Maria Barrett, Christina Rosted, Lasse Malm Lidegaard, Anders Søgaard:
DaNE: A Named Entity Resource for Danish. 4597-4604 - Josef Ruppenhofer, Ines Rehbein, Carolina Flinz:
Fine-grained Named Entity Annotations for German Biographic Interviews. 4605-4614 - Jouni Luoma, Miika Oinonen, Maria Pyykönen, Veronika Laippala, Sampo Pyysalo:
A Broad-coverage Corpus for Finnish Named Entity Recognition. 4615-4624 - Bernardo Scapini Consoli, Joaquim Santos, Diogo Gomes, Fábio Corrêa Cordeiro, Renata Vieira, Viviane Pereira Moreira:
Embeddings for Named Entity Recognition in Geoscience Portuguese Literature. 4625-4630 - Pedro Javier Ortiz Suárez, Yoann Dupont, Benjamin Muller, Laurent Romary, Benoît Sagot:
Establishing a New State-of-the-Art for French Named Entity Recognition. 4631-4638 - Dawn J. Lawrie, James Mayfield, David Etter:
Building OCR/NER Test Collections. 4639-4646 - Iva Marinova, Laska Laskova, Petya Osenova, Kiril Simov, Alexander Popov:
Reconstructing NER Corpora: a Case Study on Bulgarian. 4647-4652 - Kira Klimt, Daniel Braun, Daniela Schneider, Florian Matthes:
MucLex: A German Lexicon for Surface Realisation. 4653-4657 - Jinyi Hu, Maosong Sun:
Generating Major Types of Chinese Classical Poetry in a Uniformed Framework. 4658-4663 - Yutaro Shigeto, Yuya Yoshikawa, Jiaqing Lin, Akikazu Takeuchi:
Video Caption Dataset for Describing Human Actions in Japanese. 4664-4670 - Zhiyuan Wen, Jiannong Cao, Ruosong Yang, Senzhang Wang:
Decode with Template: Content Preserving Sentiment Transfer. 4671-4679 - Jonathan Sauder, Ting Hu, Xiaoyin Che, Gonçalo Mordido, Haojin Yang, Christoph Meinel:
Best Student Forcing: A Simple Training Mechanism in Adversarial Language Generation. 4680-4688 - Louis Martin, Éric de la Clergerie, Benoît Sagot, Antoine Bordes:
Controllable Sentence Simplification. 4689-4698 - Ali Amin-Nejad, Julia Ive, Sumithra Velupillai:
Exploring Transformer Text Generation for Medical Dataset Augmentation. 4699-4708 - Vijini Liyanage, Surangika Ranathunga:
Multi-lingual Mathematical Word Problem Generation using Long Short Term Memory Networks with Enhanced Input Features. 4709-4716 - Jad Doughman, Fatima Abu Salem, Shady Elbassuoni:
Time-Aware Word Embeddings for Three Lebanese News Archives. 4717-4725 - Ruosong Yang, Jiannong Cao, Zhiyuan Wen:
GGP: Glossary Guided Post-processing for Word Embedding Learning. 4726-4730 - Matej Ulcar, Marko Robnik-Sikonja:
High Quality ELMo Embeddings for Seven Less-Resourced Languages. 4731-4738 - Rudolf Schneider, Tom Oberhauser, Paul Grundmann, Felix A. Gers, Alexander Löser, Steffen Staab:
Is Language Modeling Enough? Evaluating Effective Embedding Combinations. 4739-4748 - Diego Maupomé, Marie-Jean Meurs:
Language Modeling with a General Second-Order RNN. 4749-4753 - Nina Schneidermann, Rasmus Hvingelby, Bolette S. Pedersen:
Towards a Gold Standard for Evaluating Danish Word Embeddings. 4754-4763 - Steven R. Wilson, Walid Magdy, Barbara McGillivray, Kiran Garimella, Gareth Tyson:
Urban Dictionary Embeddings for Slang NLP Applications. 4764-4773 - Yeachan Kim, Kang-Min Kim, SangKeun Lee:
Representation Learning for Unseen Words by Bridging Subwords to Semantic Networks. 4774-4780 - Rodrigo Agerri, Iñaki San Vicente, Jon Ander Campos, Ander Barrena, Xabier Saralegi, Aitor Soroa, Eneko Agirre:
Give your Text Representation Models some Love: the Case for Basque. 4781-4788 - François Torregrossa, Vincent Claveau, Nihel Kooli, Guillaume Gravier, Robin Allesiardo:
On the Correlation of Word Embedding Evaluation Metrics. 4789-4797 - Attila Novák, László János Laki, Borbála Novák:
CBOW-tag: a Modified CBOW Algorithm for Generating Embedding Models from Annotated Corpora. 4798-4801 - Andrea Dömötör, Zijian Gyozo Yang, Attila Novák:
Much Ado About Nothing - Identification of Zero Copulas in Hungarian Using an NMT Model. 4802-4810 - Matej Martinc, Petra Kralj Novak, Senja Pollak:
Leveraging Contextual Embeddings for Detecting Diachronic Semantic Shift. 4811-4819 - Duane K. Dougal, Deryle Lonsdale:
Improving NMT Quality Using Terminology Injection. 4820-4827 - Joaquim Santos, Bernardo Scapini Consoli, Renata Vieira:
Word Embedding Evaluation in Downstream Tasks and Semantic Analogies. 4828-4834 - Piroska Lendvai, Sándor Darányi, Christian Geng, Moniek M. Kuijpers, Oier Lopez de Lacalle, Jean-Christophe Mensonides, Simone Rebora, Uwe D. Reichel:
Detection of Reading Absorption in User-Generated Book Reviews: Resources Creation and Evaluation. 4835-4841 - Lama Alsudias, Paul Rayson:
Developing an Arabic Infectious Disease Ontology to Include Non-Standard Terminology. 4842-4850 - Antoni Oliver:
Aligning Wikipedia with WordNet: a Review and Evaluation of Different Techniques. 4851-4858 - António Branco, Sara Grilo, Márcia Bolrinha, Chakaveh Saedi, Ruben Branco, João Silva, Andreia Querido, Rita de Carvalho, Rosa Del Gaudio, Mariana Avelãs, Clara Pinto:
The MWN.PT WordNet for Portuguese: Projection, Validation, Cross-lingual Alignment and Distribution. 4859-4866 - Savong Bou, Naoki Suzuki, Makoto Miwa, Yutaka Sasaki:
Ontology-Style Relation Annotation: A Case Study. 4867-4876 - Rositsa Dekova:
The Ontology of Bulgarian Dialects - Architecture and Information Retrieval. 4877-4882 - Julia Bonn, Martha Palmer, Zheng Cai, Kristin Wright-Bettner:
Spatial AMR: Expanded Spatial Annotation in the Context of a Grounded Minecraft Corpus. 4883-4892 - Filip Klubicka, Alfredo Maldonado, Abhijit Mahalunkar, John D. Kelleher:
English WordNet Random Walk Pseudo-Corpora. 4893-4902 - Federica Vezzani, Giorgio Maria Di Nunzio:
On the Formal Standardization of Terminology Resources: The Case Study of TriMED. 4903-4910 - Israa Alsiyat, Scott Piao:
Metaphorical Expressions in Automatic Arabic Sentiment Analysis. 4911-4916 - Diego Antognini, Boi Faltings:
HotelRec: a Novel Very Large-Scale Hotel Recommendation Dataset. 4917-4923 - Esther van den Berg, Katharina Korfhage, Josef Ruppenhofer, Michael Wiegand, Katja Markert:
Doctor Who? Framing Through Names and Titles in German. 4924-4932 - Alexander Rietzler, Sebastian Stabinger, Paul Opitz, Stefan Engl:
Adapt or Get Left Behind: Domain Adaptation through BERT Language Model Finetuning for Aspect-Target Sentiment Classification. 4933-4941 - Hyun Jung Kang, Iris Eshkol-Taravella:
An Empirical Examination of Online Restaurant Reviews. 4942-4947 - Lalitha Kameswari, Radhika Mamidi:
Manovaad: A Novel Approach to Event Oriented Corpus Creation Capturing Subjectivity and Focus. 4948-4954 - Amira Barhoumi, Nathalie Camelin, Chafik Aloulou, Yannick Estève, Lamia Hadrich Belguith:
Toward Qualitative Evaluation of Embeddings for Arabic Sentiment Analysis. 4955-4963 - Roser Morante, Chantal van Son, Isa Maks, Piek Vossen:
Annotating Perspectives on Vaccination. 4964-4973 - Mara Chinea-Rios, Marc Franco-Salvador, Yassine Benajiba:
Aspect On: an Interactive Solution for Post-Editing the Aspect Extraction based on Online Learning. 4974-4981 - Akash Sheoran, Diptesh Kanojia, Aditya Joshi, Pushpak Bhattacharyya:
Recommendation Chart of Domains for Cross-Domain Sentiment Analysis: Findings of A 20 Domain Study. 4982-4990 - Liyun Yan, Danni E, Mei Gan, Cyril Grouin, Mathieu Valette:
Inference Annotation of a Chinese Corpus for Opinion Mining. 4991-4999 - Elham Mohammadi, Nada Naji, Louis Marceau, Marc Queudot, Eric Charton, Leila Kosseim, Marie-Jean Meurs:
Cooking Up a Neural-based Model for Recipe Classification. 5000-5009 - Marc Schulder, Michael Wiegand, Josef Ruppenhofer:
Enhancing a Lexicon of Polarity Shifters through the Supervised Classification of Shifting Directions. 5010-5016 - Yashwanth Reddy Regatte, Rama Rohit Reddy Gangula, Radhika Mamidi:
Dataset Creation and Evaluation of Aspect Based Sentiment Analysis in Telugu, a Low Resource Language. 5017-5024 - Lilja Øvrelid, Petter Mæhlum, Jeremy Barnes, Erik Velldal:
A Fine-grained Sentiment Dataset for Norwegian. 5025-5033 - Xiaochang Gong, Qin Zhao, Jun Zhang, Ruibin Mao, Ruifeng Xu:
The Design and Construction of a Chinese Sarcasm Dataset. 5034-5039 - Chaofa Yuan, Yuhan Liu, Rongdi Yin, Jun Zhang, Qinling Zhu, Ruibin Mao, Ruifeng Xu:
Target-based Sentiment Annotation in Chinese Financial News. 5040-5045 - Mamta, Asif Ekbal, Pushpak Bhattacharyya, Shikha Srivastava, Alka Kumar, Tista Saha:
Multi-domain Tweet Corpora for Sentiment Analysis: Resource Creation and Evaluation. 5046-5054 - João António Rodrigues, Ruben Branco, João Silva, António Branco:
Reproduction and Revival of the Argument Reasoning Comprehension Task. 5055-5064 - Antonio Moreno-Ortiz, Javier Fernandez-Cruz, Chantal Pérez Hernández:
Design and Evaluation of SentiEcon: a fine-grained Economic/Financial Sentiment Lexicon from a Corpus of Business News. 5065-5072 - Gavin Abercrombie, Riza Batista-Navarro:
ParlVote: A Corpus for Sentiment Analysis of Political Debates. 5073-5078 - Zuoyu Tian, Sandra Kübler:
Offensive Language Detection Using Brown Clustering. 5079-5087 - Stavros Assimakopoulos, Rebecca Vella Muskat, Lonneke van der Plas, Albert Gatt:
Annotating for Hate Speech: The MaNeCo Corpus and Some Input from Critical Discourse Analysis. 5088-5097 - Alessandra Teresa Cignarella, Manuela Sanguinetti, Cristina Bosco, Paolo Rosso:
Marking Irony Activators in a Universal Dependencies Treebank: The Case of an Italian Twitter Corpus. 5098-5105 - Luis Chiruzzo, Santiago Castro, Aiala Rosá:
HAHA 2019 Dataset: A Corpus for Humor Analysis in Spanish. 5106-5112 - Zeses Pitenis, Marcos Zampieri, Tharindu Ranasinghe:
Offensive Language Identification in Greek. 5113-5119 - Eckhard Bick:
Syntax and Semantics in a Treebank for Esperanto. 5120-5127 - Cheikh M. Bamba Dione:
Implementation and Evaluation of an LFG-based Parser for Wolof. 5128-5136 - Oliver Hellwig, Salvatore Scarlata, Elia Ackermann, Paul Widmer:
The Treebank of Vedic Sanskrit. 5137-5146 - Mark Anderson, Carlos Gómez-Rodríguez:
Inherent Dependency Displacement Bias of Transition-Based Algorithms. 5147-5155 - Tolga Kayadelen, Adnan Ozturel, Bernd Bohnet:
A Gold Standard Dependency Treebank for Turkish. 5156-5163 - Iris Eshkol-Taravella, Mariame Maarouf, Flora Badin, Marie Skrovec, Isabelle Tellier:
Chunk Different Kind of Spoken Discourse: Challenges for Machine Learning. 5164-5168 - Agnieszka Falenska, Zoltán Czesznak, Kerstin Jung, Moritz Völkel, Wolfgang Seeker, Jonas Kuhn:
GRAIN-S: Manually Annotated Syntax for German Interviews. 5169-5177 - Olájídé Ishola, Daniel Zeman:
Yorùbá Dependency Treebank (YTB). 5178-5186 - Yoko Yamakata, Shinsuke Mori, John Carroll:
English Recipe Flow Graph Corpus. 5187-5194 - Yusuke Kubota, Koji Mineshima, Noritsugu Hayashi, Shinya Okano:
Development of a General-Purpose Categorial Grammar Treebank. 5195-5201 - Toqeer Ehsan, Miriam Butt:
Dependency Parsing for Urdu: Resources, Conversions and Learning. 5202-5207 - Jan Hajic, Eduard Bejcek, Jaroslava Hlavácová, Marie Mikulová, Milan Straka, Jan Stepánek, Barbora Stepánková:
Prague Dependency Treebank - Consolidated 1.0. 5208-5218 - Richard Johansson, Yvonne Adesam:
Training a Swedish Constituency Parser on Six Incompatible Treebanks. 5219-5224 - Robert Vacareanu, George Caique Gouveia Barbosa, Marco Antonio Valenzuela-Escárcega, Mihai Surdeanu:
Parsing as Tagging. 5225-5231 - Gerlof Bouma, Evie Coussé, Trude Dijkstra, Nicoline van der Sijs:
The EDGeS Diachronic Bible Corpus. 5232-5239 - Manuela Sanguinetti, Cristina Bosco, Lauren Cassidy, Özlem Çetinoglu, Alessandra Teresa Cignarella, Teresa Lynn, Ines Rehbein, Josef Ruppenhofer, Djamé Seddah, Amir Zeldes:
Treebanking User-Generated Content: A Proposal for a Unified Representation in Universal Dependencies. 5240-5250 - Aleksandrs Berdicevskis, Hanne M. Eckhoff:
A Diachronic Treebank of Russian Spanning More Than a Thousand Years. 5251-5256 - Konstantinos Kogkalidis, Michael Moortgat, Richard Moot:
ÆTHEL: Automatically Extracted Typelogical Derivations for Dutch. 5257-5266 - Luke Gessler, Siyao Peng, Yang Liu, Yilun Zhu, Shabnam Behzad, Amir Zeldes:
GUMBY - A Free, Balanced, and Rich English Web Corpus. 5267-5275 - Uwe Quasthoff, Lars Hellan, Erik Körner, Thomas Eckart, Dirk Goldhahn, Dorothee Beermann:
Typical Sentences as a Resource for Valence. 5276-5281 - Jonathan Hildebrand, Wahed Hemati, Alexander Mehler:
Recognizing Sentence-level Logical Document Structures with the Help of Context-free Grammars. 5282-5290 - Gaël Guibon, Marine Courtin, Kim Gerdes, Bruno Guillaume:
When Collaborative Treebank Curation Meets Graph Grammars. 5291-5300 - Ilaine Wang, Aurore Pelletier, Jean-Yves Antoine, Anaïs Lefeuvre-Halftermeyer:
ODIL_Syntax: a Free Spontaneous Spoken French Treebank Annotated with Constituent Trees. 5301-5307 - Alina Wróblewska:
Towards the Conversion of National Corpus of Polish to Universal Dependencies. 5308-5315 - Eitan Grossman, Elad Eisen, Dmitry Nikolaev, Steven Moran:
SegBo: A Database of Borrowed Sounds in the World's Language. 5316-5322 - Mélanie Lancien, Marie-Hélène Côté, Brigitte Bigi:
Developing Resources for Automated Speech Processing of Quebec French. 5323-5328 - David R. Mortensen, Xinjian Li, Patrick Littell, Alexis Michaud, Shruti Rijhwani, Antonios Anastasopoulos, Alan W. Black, Florian Metze, Graham Neubig:
AlloVera: A Multilingual Allophone Database. 5329-5336 - Omnia Ibrahim, Homa Asadi, Eman Kassem, Volker Dellwo:
Arabic Speech Rhythm Corpus: Read and Spontaneous Speaking Styles. 5337-5342 - Janne Bondi Johannessen, Andre Kåsen, Kristin Hagen, Anders Nøklestad, Joel Priestley:
Comparing Methods for Measuring Dialect Similarity in Norwegian. 5343-5350 - Afroz Ahamad, Ankit Anand, Pranesh Bhargava:
AccentDB: A Database of Non-Native English Accents to Assist Neural Speech Recognition. 5351-5358 - Viktor Schlegel, Marco Valentino, André Freitas, Goran Nenadic, Riza Batista-Navarro:
A Framework for Evaluation of Machine Reading Comprehension Gold Standards. 5359-5369 - Dongfang Xu, Peter A. Jansen, Jaycie Martin, Zhengnan Xie, Vikas Yadav, Harish Tayyar Madabushi, Oyvind Tafjord, Peter Clark:
Multi-class Hierarchical Question Classification for Multiple Choice Science Exams. 5370-5382 - Yonas Woldemariam:
Assessing Users' Reputation from Syntactic and Semantic Information in Community Question Answering. 5383-5391 - Kosuke Nishida, Kyosuke Nishida, Itsumi Saito, Hisako Asano, Junji Tomita:
Unsupervised Domain Adaptation of Language Models for Reading Comprehension. 5392-5399 - Seunghyun Yoon, Franck Dernoncourt, Doo Soon Kim, Trung Bui, Kyomin Jung:
Propagate-Selector: Detecting Supporting Sentences for Question Answering via Graph Neural Networks. 5400-5407 - Eduardo G. Cortes, Vinicius Woloszyn, Arne Binder, Tilo Himmelsbach, Dante Augusto Couto Barone, Sebastian Möller:
An Empirical Comparison of Question Classification Methods for Question Answering Systems. 5408-5416 - Jinmeng Wu, Yanbin Hao:
Cross-sentence Pre-trained Model for Interactive QA matching. 5417-5424 - Gyeongbok Lee, Seung-won Hwang, Hyunsouk Cho:
SQuAD2-CR: Semi-supervised Annotation for Cause and Rationales for Unanswerability in SQuAD 2.0. 5425-5432 - Takashi Kodama, Ryuichiro Higashinaka, Koh Mitsuda, Ryo Masumura, Yushi Aono, Ryuta Nakamura, Noritake Adachi, Hidetoshi Kawabata:
Generating Responses that Reflect Meta Information in User-Generated Question Answer Pairs. 5433-5441 - Hugo Gonçalo Oliveira, João Ferreira, José Santos, Pedro Fialho, Ricardo Rodrigues, Luísa Coheur, Ana Alves:
AIA-BDE: A Corpus of FAQs in Portuguese and their Variations. 5442-5449 - Anthony M. Colas, Seokhwan Kim, Franck Dernoncourt, Siddhesh Gupte, Daisy Zhe Wang, Doo Soon Kim:
TutorialVQA: Question Answering Dataset for Tutorial Videos. 5450-5455 - Zhengnan Xie, Sebastian Thiem, Jaycie Martin, Elizabeth Wainwright, Steven Marmorstein, Peter A. Jansen:
WorldTree V2: A Corpus of Science-Domain Structured Explanations and Inference Patterns supporting Multi-Hop Inference. 5456-5473 - Gabriel Luthier, Andrei Popescu-Belis:
Chat or Learn: a Data-Driven Robust Question-Answering System. 5474-5480 - Rachel Keraron, Guillaume Lancrenon, Mathilde Bras, Frédéric Allary, Gilles Moyse, Thomas Scialom, Edmundo-Pavel Soriano-Morales, Jacopo Staiano:
Project PIAF: Building a Native French Question-Answering Dataset. 5481-5490 - Delphine Charlet, Géraldine Damnati, Frédéric Béchet, Gabriel Marzinotto, Johannes Heinecke:
Cross-lingual and Cross-domain Evaluation of Machine Reading Comprehension with Squad and CALOR-Quest Corpora. 5491-5497 - Tanik Saikh, Asif Ekbal, Pushpak Bhattacharyya:
ScholarlyRead: A New Dataset for Scientific Article Reading Comprehension. 5498-5504 - Md. Tahmid Rahman Laskar, Jimmy Xiangji Huang, Enamul Hoque:
Contextualized Embeddings based Transformer Encoder for Sentence Similarity Modeling in Answer Selection Task. 5505-5514 - Casimiro Pio Carrino, Marta R. Costa-jussà, José A. R. Fonollosa:
Automatic Spanish Translation of SQuAD Dataset for Multi-lingual Question Answering. 5515-5523 - Mehrdad Alizadeh, Barbara Di Eugenio:
A Corpus for Visual Question Answering Annotated with Frame Semantic Information. 5524-5531 - Sarvesh Soni, Kirk Roberts:
Evaluation of Dataset Selection for Pre-Training and Fine-Tuning Transformer Language Models for Clinical Question Answering. 5532-5538 - António Branco, Nicoletta Calzolari, Piek Vossen, Gertjan van Noord, Dieter Van Uytvanck, João Silva, Luís Gomes, André Moreira, Willem Elbers:
A Shared Task of a New, Collaborative Type to Foster Reproducibility: A First Exercise in the Area of Language Science and Technology with REPROLANG2020. 5539-5545 - Nicolas Garneau, Mathieu Godbout, David Beauchemin, Audrey Durand, Luc Lamontagne:
A Robust Self-Learning Method for Fully Unsupervised Cross-Lingual Mappings of Word Embeddings: Making the Method Robustly Reproducible as Well. 5546-5554 - Kamil Plucinski, Mateusz Lango, Michal Zimniewicz:
A Closer Look on Unsupervised Cross-lingual Word Embeddings Mapping. 5555-5562 - Yung Han Khoe:
Reproducing a Morphosyntactic Tagger with a Meta-BiLSTM Model over Context Sensitive Token Encodings. 5563-5568 - Kyeongmin Rim, Jingxuan Tu, Kelley Lynch, James Pustejovsky:
Reproducing Neural Ensemble Classifier for Semantic Relation Extraction inScientific Papers. 5569-5578 - Mohamed Abdellatif, Ahmed Elgammal:
ULMFiT replication. 5579-5587 - Michael Cooper, Matthew Shardlow:
CombiNMT: An Exploration into Neural Text Simplification Models. 5588-5594 - Yves Bestgen:
Reproducing Monolingual, Multilingual and Cross-Lingual CEFR Predictions. 5595-5602 - Eva Huber, Çagri Çöltekin:
Reproduction and Replication: A Case Study with Automatic Essay Scoring. 5603-5613 - Andrew Caines, Paula Buttery:
REPROLANG 2020: Automatic Proficiency Scoring of Czech, English, German, Italian, and Spanish Learner Essays. 5614-5623 - Cristina Arhiliuc, Jelena Mitrovic, Michael Granitzer:
Language Proficiency Scoring. 5624-5630 - Nicolas Ballier, Nabil Amari, Laure Merat, Jean-Baptiste Yunès:
The Learnability of the Annotated Input in NMT Replicating (Vanmassenhove and Way, 2018) with OpenNMT. 5631-5640 - Jan Portisch, Michael Hladik, Heiko Paulheim:
KGvec2go - Knowledge Graph Embeddings as a Service. 5641-5647 - Alexandre Bento, Amal Zouaq, Michel Gagnon:
Ontology Matching Using Convolutional Neural Networks. 5648-5653 - Patricia Martín-Chozas, Sina Ahmadi, Elena Montiel-Ponsoda:
Defying Wikidata: Validation of Terminological Relations in the Web of Data. 5654-5659 - Thierry Declerck, John Philip McCrae, Matthias Hartung, Jorge Gracia, Christian Chiarcos, Elena Montiel-Ponsoda, Philipp Cimiano, Artem Revenko, Roser Saurí, Deirdre Lee, Stefania Racioppa, Jamal Abdul Nasir, Matthias Orlikowski, Marta Lanau-Coronas, Christian Fäth, Mariano Rico, Mohammad Fazleh Elahi, Maria Khvalchik, Meritxell González, Katharine Cooney:
Recent Developments for the Linguistic Linked Open Data Infrastructure. 5660-5667 - Christian Chiarcos, Christian Fäth, Frank Abromeit:
Annotation Interoperability for the Post-ISOCat Era. 5668-5677 - Kevin Alex Mathews, Michael Strube:
A Large Harvested Corpus of Location Metonymy. 5678-5687 - Livio Robaldo, Cesare Bartolini, Gabriele Lenzini:
The DAPRECO Knowledge Base: Representing the GDPR in LegalRuleML. 5688-5697 - Aaron Steven White, Elias Stengel-Eskin, Siddharth Vashishtha, Venkata Subrahmanyan Govindarajan, Dee Ann Reisinger, Tim Vieira, Keisuke Sakaguchi, Sheng Zhang, Francis Ferraro, Rachel Rudinger, Kyle Rawlins, Benjamin Van Durme:
The Universal Decompositional Semantics Dataset and Decomp Toolkit. 5698-5707 - Emmanuele Chersoni, Ludovica Pannitto, Enrico Santus, Alessandro Lenci, Chu-Ren Huang:
Are Word Embeddings Really a Bad Fit for the Estimation of Thematic Fit? 5708-5713 - Rong Xiang, Xuefeng Gao, Yunfei Long, Anran Li, Emmanuele Chersoni, Qin Lu, Chu-Ren Huang:
Ciron: a New Benchmark Dataset for Chinese Irony Detection. 5714-5720 - Talita Anthonio, Irshad Bhat, Michael Roth:
wikiHowToImprove: A Resource and Analyses on Edits in Instructional Texts. 5721-5729 - Liza King, Roser Morante:
Must Children be Vaccinated or not? Annotating Modal Verbs in the Vaccination Debate. 5730-5738 - Aditya Khandelwal, Suraj Sawant:
NegBERT: A Transfer Learning Approach for Negation Detection and Scope Resolution. 5739-5748 - Olga Majewska, Diana McCarthy, Jasper J. F. van den Bosch, Nikolaus Kriegeskorte, Ivan Vulic, Anna Korhonen:
Spatial Multi-Arrangement for Clustering and Multi-way Similarity Dataset Construction. 5749-5758 - Tommaso Pasini, José Camacho-Collados:
A Short Survey on Sense-Annotated Corpora. 5759-5765 - Abhik Jana, Nikhil Reddy Varimalla, Pawan Goyal:
Using Distributional Thesaurus Embedding for Co-hyponymy Detection. 5766-5771 - Salvador Lima, Naiara Pérez, Montse Cuadros, German Rigau:
NUBes: A Corpus of Negation and Uncertainty in Spanish Clinical Texts. 5772-5781 - Venelin Kovatchev, Darina Gold, Maria Antònia Martí, Maria Salamó, Torsten Zesch:
Decomposing and Comparing Meaning Relations: Paraphrasing, Textual Entailment, Contradiction, and Specificity. 5782-5791 - Carina Silberer, Sina Zarrieß, Gemma Boleda:
Object Naming in Language and Vision: A Survey and a New Dataset. 5792-5801 - Ting-Yu Yen, Yang-Yin Lee, Yow-Ting Shiue, Hen-Hsen Huang, Hsin-Hsi Chen:
MSD-1030: A Well-built Multi-Sense Evaluation Dataset for Sense Representation Models. 5802-5809 - Omnia Zayed, John Philip McCrae, Paul Buitelaar:
Figure Me Out: A Gold Standard Dataset for Metaphor Interpretation. 5810-5819 - Ludovic Tanguy, Pauline Brunet, Olivier Ferret:
Extrinsic Evaluation of French Dependency Parsers on a Specialized Corpus: Comparison of Distributional Thesauri. 5820-5828 - Xiaojing Yu, Tianlong Chen, Zhengjie Yu, Huiyu Li, Yang Yang, Xiaoqian Jiang, Anxiao Jiang:
Dataset and Enhanced Model for Eligibility Criteria-to-SQL Semantic Parsing. 5829-5837 - Dmitri Roussinov, Serge Sharoff, Nadezhda Puchnina:
Recognizing Semantic Relations by Combining Transformers and Fully Connected Models. 5838-5845 - Mika Hasegawa, Tetsunori Kobayashi, Yoshihiko Hayashi:
Word Attribute Prediction Enhanced by Lexical Entailment Tasks. 5846-5854 - Soham Dan, Parisa Kordjamshidi, Julia Bonn, Archna Bhatia, Zheng Cai, Martha Palmer, Dan Roth:
From Spatial Relations to Spatial Configurations. 5855-5864 - Irene Sucameli, Alessandro Lenci:
Representing Verbs with Visual Argument Vectors. 5865-5870 - Agnieszka Mykowiecka, Malgorzata Marciniak:
Are White Ravens Ever White? - Non-Literal Adjective-Noun Phrases in Polish. 5871-5877 - Carlos Santos Armendariz, Matthew Purver, Matej Ulcar, Senja Pollak, Nikola Ljubesic, Mark Granroth-Wilding:
CoSimLex: A Resource for Evaluating Graded Word Similarity in Context. 5878-5886 - Maxime Amblard, Clément Beysson, Philippe de Groote, Bruno Guillaume, Sylvain Pogodalla:
A French Version of the FraCaS Test Suite. 5887-5895 - Seid Muhie Yimam, Gopalakrishnan Venkatesh, John Lee, Chris Biemann:
Automatic Compilation of Resources for Academic Writing and Evaluating with Informal Word Identification and Paraphrasing System. 5896-5904 - Bianca Scarlini, Tommaso Pasini, Roberto Navigli:
Sense-Annotated Corpora for Word Sense Disambiguation in Multiple Languages and Domains. 5905-5911 - Lucie Barque, Pauline Haas, Richard Huyghe, Delphine Tribout, Marie Candito, Benoît Crabbé, Vincent Segonne:
FrSemCor: Annotating a French Corpus with Supersenses. 5912-5918 - Nikhil Krishnaswamy, James Pustejovsky:
A Formal Analysis of Multimodal Referring Strategies Under Common Ground. 5919-5927 - Gitit Kehat, James Pustejovsky:
Improving Neural Metaphor Detection with Visual Datasets. 5928-5933 - Ben Eyal, Michael Elhadad:
Building a Hebrew Semantic Role Labeling Lexical Resource from Parallel Movie Subtitles. 5934-5942 - Varvara Logacheva, Denis Teslenko, Artem Shelmanov, Steffen Remus, Dmitry Ustalov, Andrey Kutuzov, Ekaterina Artemova, Chris Biemann, Simone Paolo Ponzetto, Alexander Panchenko:
Word Sense Disambiguation for 158 Languages using Word Embeddings Only. 5943-5952 - Antonio San Martín, Catherine Trekker, Pilar León Araúz:
Extraction of Hyponymic Relations in French with Knowledge-Pattern-Based Word Sketches. 5953-5961 - David Strohmaier, Sian Gooding, Shiva Taslimipoor, Ekaterina Kochmar:
SeCoDa: Sense Complexity Dataset. 5962-5967 - Ines Rehbein, Josef Ruppenhofer:
A New Resource for German Causal Language. 5968-5977 - Prafulla Kumar Choubey, Ruihong Huang:
One Classifier for All Ambiguous Words: Overcoming Data Sparsity by Utilizing Sense Correlations Across Words. 5978-5985 - Siyao Peng, Yang Liu, Yilun Zhu, Austin Blodgett, Yushi Zhao, Nathan Schneider:
A Corpus of Adpositional Supersenses for Mandarin Chinese. 5986-5994 - Sarah R. Moeller, Irina Wagner, Martha Palmer, Kathryn Conger, Skatje Myers:
The Russian PropBank. 5995-6002 - Tommi Jantunen, Anna Puupponen, Birgitta Burger:
What Comes First: Combining Motion Capture and Eye Tracking Data to Study the Order of Articulators in Constructed Action in Sign Language Narratives. 6003-6007 - Lucie Naert, Caroline Larboulette, Sylvie Gibet:
LSF-ANIMAL: A Motion Capture Corpus in French Sign Language Designed for the Animation of Signing Avatars. 6008-6017 - Mathieu De Coster, Mieke Van Herreweghe, Joni Dambre:
Sign Language Recognition with Transformer Networks. 6018-6024 - Serena Trolvi, Rodolfo Delmonte:
Annotating a Fable in Italian Sign Language (LIS). 6025-6034 - Carolina C. Neves, Luísa Coheur, Hugo Nicolau:
HamNoSyS2SiGML: Translating HamNoSys Into SiGML. 6035-6039 - Valentin Belissen, Annelies Braffort, Michèle Gouiffès:
Dicta-Sign-LSF-v2: Remake of a Continuous French Sign Language Dialogue Corpus and a First Baseline for Automatic Sign Language Processing. 6040-6048 - Sandrine Tornay, Oya Aran, Mathew Magimai-Doss:
An HMM Approach with Inherent Model Selection for Sign Language and Gesture Recognition. 6049-6056 - Simone Scicluna, Carlo Strapparava:
VROAV: Using Iconicity to Visually Represent Abstract Verbs. 6057-6062 - Hannah Bull, Annelies Braffort, Michèle Gouiffès:
MEDIAPI-SKEL - A 2D-Skeleton Video Database of French Sign Language With Aligned French Subtitles. 6063-6068 - Marion Kaczmarek, Michael Filhol:
Alignment Data base for a Sign Language Concordancer. 6069-6072 - Medet Mukushev, Arman Sabyrov, Alfarabi Imashev, Kenessary Koishybay, Vadim Kimmelman, Anara Sandygulova:
Evaluation of Manual and Non-manual Components for Sign Language Recognition. 6073-6078 - Ildar Kagirov, Denis Ivanko, Dmitry Ryumin, Alexander A. Petrovsky, Alexey Karpov:
TheRuSLan: Database of Russian Sign Language. 6079-6085 - Ray Oshikawa, Jing Qian, William Yang Wang:
A Survey on Natural Language Processing for Fake News Detection. 6086-6093 - Jie Gao, Sooji Han, Xingyi Song, Fabio Ciravegna:
RP-DNN: A Tweet Level Propagation Context Based Deep Neural Networks for Early Rumor Detection in Social Media. 6094-6105 - Chung-Chi Chen, Hen-Hsen Huang, Hsin-Hsi Chen:
Issues and Perspectives from 10, 000 Annotated Financial Social Media Data. 6106-6110 - Wesley Ramos dos Santos, Amanda M. M. Funabashi, Ivandré Paraboni:
Searching Brazilian Twitter for Signs of Mental Health Issues. 6111-6117 - Anna Tigunova, Paramita Mirza, Andrew Yates, Gerhard Weikum:
RedDust: a Large Reusable Dataset of Reddit User Traits. 6118-6126 - Eckhard Bick:
An Annotated Social Media Corpus for German. 6127-6135 - Orion Weller, Kevin D. Seppi:
The rJokes Dataset: a Large Scale Humor Collection. 6136-6141 - Thomas Proisl, Natalie Dykes, Philipp Heinrich, Besim Kabashi, Andreas Blombach, Stefan Evert:
EmpiriST Corpus 2.0: Adding Manual Normalization, Lemmatization and Semantic Tagging to a German Web and CMC Corpus. 6142-6148 - Kai Nakamura, Sharon Levy, William Yang Wang:
Fakeddit: A New Multimodal Benchmark Dataset for Fine-grained Fake News Detection. 6149-6157 - Eric Sanders, Antal van den Bosch:
Optimising Twitter-based Political Election Prediction with Relevance andSentiment Filters. 6158-6165 - Adrian Iftene, Daniela Gîfu, Andrei-Remus Miron, Mihai-Stefan Dudu:
A Real-Time System for Credibility on Twitter. 6166-6173 - Çagri Çöltekin:
A Corpus of Turkish Offensive Language on Social Media. 6174-6184 - Laura Seiffe, Oliver Marten, Michael Mikhailov, Sven Schmeier, Sebastian Möller, Roland Roller:
From Witch's Shot to Music Making Bones - Resources for Medical Laymen to Technical Language and Vice Versa. 6185-6192 - Tommaso Caselli, Valerio Basile, Jelena Mitrovic, Inga Kartoziya, Michael Granitzer:
I Feel Offended, Don't Be Abusive! Implicit/Explicit Messages in Offensive and Abusive Language. 6193-6202 - Shammur A. Chowdhury, Hamdy Mubarak, Ahmed Abdelali, Soon-Gyo Jung, Bernard J. Jansen, Joni Salminen:
A Multi-Platform Arabic News Comment Dataset for Offensive Language Detection. 6203-6212 - Zahra Majdabadi, Behnam Sabeti, Preni Golazizian, Seyed Arad Ashrafi Asli, Omid Momenzadeh, Reza Fahmi:
Twitter Trend Extraction: A Graph-based Approach for Tweet and Hashtag Ranking, Utilizing No-Hashtag Tweets. 6213-6219 - Béatrice Mazoyer, Julia Cagé, Nicolas Hervé, Céline Hudelot:
A French Corpus for Event Detection on Twitter. 6220-6227 - Arindam Chatterjere, Vineeth Guptha, Parul Chopra, Amitava Das:
Minority Positive Sampling for Switching Points - an Anecdote for the Code-Mixing Language Modeling. 6228-6236 - Endang Wahyu Pamungkas, Valerio Basile, Viviana Patti:
Do You Really Want to Hurt Me? Predicting Abusive Swearing in Social Media. 6237-6246 - Lin Miao, Mark Last, Marina Litvak:
Detecting Troll Tweets in a Bilingual Corpus. 6247-6254 - Filip Miletic, Anne Przewozny-Desriaux, Ludovic Tanguy:
Collecting Tweets to Investigate Regional Variation in Canadian English. 6255-6264 - Ines Abbes, Wajdi Zaghouani, Omaima El-Hardlo, Faten Ashour:
DAICT: A Dialectal Arabic Irony Corpus Extracted from Twitter. 6265-6271 - Rob van der Goot, Alan Ramponi, Tommaso Caselli, Michele Cafagna, Lorenzo De Mattei:
Norm It! Lexical Normalization for Italian and Its Downstream Effects for Dependency Parsing. 6272-6278 - Elisa Gugliotta, Marco Dinarelli:
TArC: Incrementally and Semi-Automatically Collecting a Tunisian Arabish Corpus. 6279-6286 - Amy Rechkemmer, Steven R. Wilson, Rada Mihalcea:
Small Town or Metropolis? Analyzing the Relationship between Population Size and Language. 6287-6291 - Zhentao Xu, Verónica Pérez-Rosas, Rada Mihalcea:
Inferring Social Media Users' Mental Health Status from Multimodal Information. 6292-6299 - Kelly Dekker, Rob van der Goot:
Synthetic Data for English Lexical Normalization: How Close Can We Get to Manually Annotated Data? 6300-6309 - Andreas Blombach, Natalie Dykes, Philipp Heinrich, Besim Kabashi, Thomas Proisl:
A Corpus of German Reddit Exchanges (GeRedE). 6310-6316 - Marc Evrard, Rémi Uro, Nicolas Hervé, Béatrice Mazoyer:
French Tweet Corpus for Automatic Stance Detection. 6317-6322 - Hadi Abdi Khojasteh, Ebrahim Ansari, Mahdi Bohlouli:
LSCP: Enhanced Large Scale Colloquial Persian Language Understanding. 6323-6327 - Yin May Oo, Theeraphol Wattanavekin, Chenfang Li, Pasindu De Silva, Supheakmungkol Sarin, Knot Pipatsrisawat, Martin Jansche, Oddur Kjartansson, Alexander Gutkin:
Burmese Speech Corpus, Finite-State Text Normalization and Pronunciation Grammars with an Application to Text-to-Speech. 6328-6339 - Eric G. Booth, Jake Carns, Casey Kennington, Nader Rafla:
Evaluating and Improving Child-Directed Automatic Speech Recognition. 6340-6345 - Mana Ihori, Akihiko Takashima, Ryo Masumura:
Parallel Corpus for Japanese Spoken-to-Written Style Conversion. 6346-6353 - Michael Gref, Oliver Walter, Christoph Schmidt, Sven Behnke, Joachim Köhler:
Multi-Staged Cross-Lingual Acoustic Model Adaption for Robust Speech Recognition in Real-World Applications - A Case Study on German Oral History Interviews. 6354-6362 - Jonás Kratochvíl, Peter Polak, Ondrej Bojar:
Large Corpus of Czech Parliament Plenary Hearings. 6363-6367 - Éva Székely, Jens Edlund, Joakim Gustafson:
Augmented Prompt Selection for Evaluation of Spontaneous Speech Synthesis. 6368-6374 - Marc Schulder, Johannah O'Mahony, Yury Bakanouski, Dietrich Klakow:
ATC-ANNO: Semantic Annotation for Air Traffic Control with Assistive Auto-Annotation. 6375-6380 - Carlos Daniel Hernandez Mena, Albert Gatt, Andrea DeMarco, Claudia Borg, Lonneke van der Plas, Amanda Muscat, Ian Padovani:
MASRI-HEADSET: A Maltese Corpus for Speech Recognition. 6381-6388 - Natalia Kalashnikova, Loïc Grobol, Iris Eshkol-Taravella, François Delafontaine:
Automatic Period Segmentation of Oral French. 6389-6394 - Thierry Desot, François Portet, Michel Vacher:
Corpus Generation for Voice Command in Smart Home and the Effect of Speech Synthesis on End-to-End SLU. 6395-6404 - Najla Ben Abdallah, Saméh Kchaou, Fethi Bougares:
Text and Speech-based Tunisian Arabic Sub-Dialects Identification. 6405-6411 - Luca Rognoni, Judith Bishop, Miriam Corris, Jessica Fernando, Rosanna Smith:
Urdu Pitch Accents and Intonation Patterns in Spontaneous Conversational Speech. 6412-6416 - Nimisha Srivastava, Rudrabha Mukhopadhyay, K. R. Prajwal, C. V. Jawahar:
IndicSpeech: Text-to-Speech Corpus for Indian Languages. 6417-6422 - Jan Gorisch, Michael Gref, Thomas Schmidt:
Using Automatic Speech Recognition in Spoken Corpus Curation. 6423-6428 - Huaijin Deng, Youchao Lin, Takehito Utsuro, Akio Kobayashi, Hiromitsu Nishizaki, Junichi Hoshino:
Integrating Disfluency-based and Prosodic Features with Acoustics in Automatic Fluency Evaluation of Spontaneous Speech. 6429-6437 - Yuki Yamashita, Tomoki Koriyama, Yuki Saito, Shinnosuke Takamichi, Yusuke Ijima, Ryo Masumura, Hiroshi Saruwatari:
DNN-based Speech Synthesis Using Abundant Tags of Spontaneous Speech Corpus. 6438-6443 - Ayimunishagu Abulimiti, Tanja Schultz:
Automatic Speech Recognition for Uyghur through Multilingual Acoustic Modeling. 6444-6449 - Dana Delgado, Kevin Walker, Stephanie M. Strassel, Karen Jones, Christopher Caruso, David Graff:
The SAFE-T Corpus: A New Resource for Simulated Public Safety Communications. 6450-6457 - Parismita Gogoi, Abhishek Dey, Wendy Lalhminghlui, Priyankoo Sarmah, S. R. Mahadeva Prasanna:
Lexical Tone Recognition in Mizo using Acoustic-Prosodic Features. 6458-6461 - Josh Meyer, Lindy Rauchenstein, Joshua D. Eisenberg, Nicholas Howell:
Artie Bias Corpus: An Open Dataset for Detecting Demographic Bias in Speech Applications. 6462-6468 - Kallirroi Georgila, Anton Leuski, Volodymyr Yanov, David R. Traum:
Evaluation of Off-the-shelf Speech Recognizers Across Diverse Dialogue Domains. 6469-6476 - Malgorzata Anna Ulasik, Manuela Hürlimann, Fabian Germann, Esin Gedik, Fernando Benites, Mark Cieliebak:
CEASR: A Corpus for Evaluating Automatic Speech Recognition. 6477-6485 - Marcely Zanon Boito, William Havard, Mahault Garnerin, Éric Le Ferrand, Laurent Besacier:
MaSS: A Large and Clean Multilingual Corpus of Sentence-aligned Spoken Utterances Extracted from the Bible. 6486-6493 - Fei He, Shan-Hui Cathy Chu, Oddur Kjartansson, Clara Rivera, Anna Katanova, Alexander Gutkin, Isin Demirsahin, Cibu Johny, Martin Jansche, Supheakmungkol Sarin, Knot Pipatsrisawat:
Open-source Multi-speaker Speech Corpora for Building Gujarati, Kannada, Malayalam, Marathi, Tamil and Telugu Speech Synthesis Systems. 6494-6503 - Adriana Guevara-Rukoz, Isin Demirsahin, Fei He, Shan-Hui Cathy Chu, Supheakmungkol Sarin, Knot Pipatsrisawat, Alexander Gutkin, Alena Butryna, Oddur Kjartansson:
Crowdsourcing Latin American Spanish for Low-Resource Text-to-Speech. 6504-6513 - Aurélie Chlébowski, Nicolas Ballier:
A Manually Annotated Resource for the Investigation of Nasal Grunts. 6514-6522 - Vincent P. Martin, Jean-Luc Rouas, Jean-Arthur Micoulaud-Franchi, Pierre Philip:
The Objective and Subjective Sleepiness Voice Corpora. 6523-6531 - Isin Demirsahin, Oddur Kjartansson, Alexander Gutkin, Clara Rivera:
Open-source Multi-speaker Corpora of the English Accents in the British Isles. 6532-6541 - Yimin Xiao, Zong-Ying Slaton, Lu Xiao:
TV-AfD: An Imperative-Annotated Corpus from The Big Bang Theory and Wikipedia's Articles for Deletion Discussions. 6542-6548 - Eric Chen, Zhiyun Lu, Hao Xu, Liangliang Cao, Yu Zhang, James Fan:
A Large Scale Speech Sentiment Corpus. 6549-6555 - Tatiana Kachkovskaia, Tatiana Chukaeva, Vera Evdokimova, Pavel Kholiavin, Natalia Kriakina, Daniil Kocharov, Anna Mamushina, Alla Menshikova, Svetlana Zimina:
SibLing Corpus of Russian Dialogue Speech Designed for Research on Speech Entrainment. 6556-6561 - Ana Margarida Ramalho, Maria João Freitas, Yvan Rose:
PhonBank and Data Sharing: Recent Developments in European Portuguese. 6562-6570 - Yuki Saito, Shinnosuke Takamichi, Hiroshi Saruwatari:
SMASH Corpus: A Spontaneous Speech Corpus Recording Third-person Audio Commentaries on Gameplay. 6571-6577 - Meiko Fukuda, Hiromitsu Nishizaki, Yurie Iribe, Ryota Nishimura, Norihide Kitaoka:
Improving Speech Recognition for the Elderly: A New Corpus of Elderly Japanese Speech and Investigation of Acoustic Modeling for Speech Recognition. 6578-6585 - Shafayat Ahmed, Nafis Sadeq, Sudipta Saha Shubha, Md. Nahidul Islam, Muhammad Abdullah Adnan, Mohammad Zuberul Islam:
Preparation of Bangla Speech Corpus from Publicly Available Audio & Text. 6586-6592 - Xian Huang, Xin Jin, Qike Li, Keliang Zhang:
On Construction of the ASR-oriented Indian English Pronunciation Dictionary. 6593-6598 - Mahault Garnerin, Solange Rossato, Laurent Besacier:
Gender Representation in Open Source Speech Resources. 6599-6605 - Alexandru-Lucian Georgescu, Horia Cucu, Andi Buzo, Corneliu Burileanu:
RSC: A Romanian Read Speech Corpus for Automatic Speech Recognition. 6606-6612 - Sean Robertson, Cosmin Munteanu, Gerald Penn:
FAB: The French Absolute Beginner Corpus for Pronunciation Training. 6613-6620 - Karen Jones, Stephanie M. Strassel, Kevin Walker, Jonathan Wright:
Call My Net 2: A New Resource for Speaker Recognition. 6621-6626 - Juan Hussain, Oussama Zenkri, Sebastian Stüker, Alex Waibel:
DaCToR: A Data Collection Tool for the RELATER Project. 6627-6632 - Roberts Dargis, Peteris Paikens, Normunds Gruzitis, Ilze Auzina, Agate Akmane:
Development and Evaluation of Speech Synthesis Corpora for Latvian. 6633-6637 - Nikola I. Nikolov, Richard H. R. Hahnloser:
Abstractive Document Summarization without Parallel Data. 6638-6644 - Diego Antognini, Boi Faltings:
GameWikiSum: a Novel Large Multi-Document Summarization Dataset. 6645-6650 - Dominik Frefel:
Summarization Corpora of Wikipedia Articles. 6651-6655 - Christopher Tauchmann, Margot Mieskes:
Language Agnostic Automatic Summarization Evaluation. 6656-6662 - Erion Çano, Ondrej Bojar:
Two Huge Title and Keyword Generation Corpora of Research Articles. 6663-6671 - Ahmed AbuRa'ed, Horacio Saggion, Luis Chiruzzo:
A Multi-level Annotated Corpus of Scientific Papers for Scientific Document Summarization and Cross-document Relation Discovery. 6672-6679 - Dmitrii Aksenov, Julián Moreno Schneider, Peter Bourgonje, Robert Schwarzenberg, Leonhard Hennig, Georg Rehm:
Abstractive Text Summarization based on Language Model Conditioning and Locality Modeling. 6680-6689 - Margot Mieskes, Eneldo Loza Mencía, Tim Kronsbein:
A Data Set for the Analysis of Text Quality Dimensions in Summarization Evaluation. 6690-6699 - Benjamin Hättasch, Nadja Geisler, Christian M. Meyer, Carsten Binnig:
Summarization Beyond News: The Automatically Acquired Fandom Corpora. 6700-6708 - Lorenzo De Mattei, Michele Cafagna, Felice Dell'Orletta, Malvina Nissim:
Invisible to People but not to Machines: Evaluation of Style-aware HeadlineGeneration in Absence of Reliable Human Judgment. 6709-6717 - Paul Tardy, David Janiszek, Yannick Estève, Vincent Nguyen:
Align then Summarize: Automatic Alignment Methods for Summarization Corpus Creation. 6718-6724 - Marek Suppa, Jergus Adamec:
A Summarization Dataset of Slovak News Articles. 6725-6730 - Daniel Varab, Natalie Schluter:
DaNewsroom: A Large-scale Danish Summarisation Dataset. 6731-6739 - Jinghui Lu, Maeve Henchion, Brian Mac Namee:
Diverging Divergences: Examining Variants of Jensen Shannon Divergence for Corpus Comparison Tasks. 6740-6744 - Victor Bulatov, Vasiliy Alekseev, Konstantin V. Vorontsov, Darya Polyudova, Eugenia Veselova, Alexey Goncharov, Evgeniy S. Egorov:
TopicNet: Making Additive Regularisation for Topic Modelling Accessible. 6745-6752 - Kyosuke Yamaguchi, Ryoji Asahi, Yutaka Sasaki:
SC-CoMIcs: A Superconductivity Corpus for Materials Informatics. 6753-6760 - Masato Hagiwara, Masato Mita:
GitHub Typo Corpus: A Large-Scale Multilingual Dataset of Misspellings and Grammatical Errors. 6761-6768 - Yuki Arase, Tomoyuki Kajiwara, Chenhui Chu:
Annotation of Adverse Drug Reactions in Patients' Weblogs. 6769-6776 - Rezvaneh Rezapour, Jutta Bopp, Norman Fiedler, Diana Steffen, Andreas Witt, Jana Diesner:
Beyond Citations: Corpus-based Methods for Detecting the Impact of Research Outcomes on Society. 6777-6785 - Paula Fortuna, Juan Soler Company, Leo Wanner:
Toxic, Hateful, Offensive or Abusive? What Are We Really Classifying? An Empirical Analysis of Hate Speech Datasets. 6786-6794 - Isaac Persing, Vincent Ng:
Unsupervised Argumentation Mining in Student Essays. 6795-6803 - Gerardo Ocampo Diaz, Xuanming Zhang, Vincent Ng:
Aspect-Based Sentiment Analysis as Fine-Grained Opinion Mining. 6804-6811 - Victoria Yaneva, Le An Ha, Peter Baldwin, Janet Mee:
Predicting Item Survival for Multiple Choice Questions in a High-Stakes Medical Exam. 6812-6818 - Won-Ik Cho, Jong In Kim, Young Ki Moon, Nam Soo Kim:
Discourse Component to Sentence (DC2S): An Efficient Human-Aided Construction of Paraphrase and Sentence Similarity Dataset. 6819-6826 - Yuta Hayashibe:
Japanese Realistic Textual Entailment Corpus. 6827-6834 - Jean-Philippe Bernardy, Stergios Chatzikyriakidis:
Improving the Precision of Natural Textual Entailment Problem Datasets. 6835-6840 - Louisa Pragst, Wolfgang Minker, Stefan Ultes:
Comparative Study of Sentence Embeddings for Contextual Paraphrasing. 6841-6851 - Tianyu Liu, Xin Zheng, Baobao Chang, Zhifang Sui:
HypoNLI: Exploring the Artificial Patterns of Hypothesis-only Bias in Natural Language Inference. 6852-6860 - Masato Yoshinaka, Tomoyuki Kajiwara, Yuki Arase:
SAPPHIRE: Simple Aligner for Phrasal Paraphrase with Hierarchical Representation. 6861-6867 - Yves Scherrer:
TaPaCo: A Corpus of Sentential Paraphrases for 73 Languages. 6868-6873 - Aalok Sathe, Salar Ather, Tuan Manh Le, Nathan Perry, Joonsuk Park:
Automated Fact-Checking of Claims from Wikipedia. 6874-6882 - Mithun Paul Panenghat, Sandeep Suntwal, Faiz Rafique, Rebecca Sharp, Mihai Surdeanu:
Towards the Necessity for Debiasing Natural Language Inference Datasets. 6883-6888 - Rémi Cardon, Natalia Grabar:
A French Corpus for Semantic Similarity. 6889-6894 - Takuto Watarai, Masatoshi Tsuchiya:
Developing Dataset of Japanese Slot Filling Quizzes Designed for Evaluation of Machine Reading Comprehension. 6895-6901 - Salud María Jiménez-Zafra, Roser Morante, Eduardo Blanco, María Teresa Martín Valdivia, Luis Alfonso Ureña López:
Detecting Negation Cues and Scopes in Spanish. 6902-6911 - Jan Wira Gotama Putra, Simone Teufel, Kana Matsumura, Takenobu Tokunaga:
TIARA: A Tool for Annotating Discourse Relations and Sentence Reordering. 6912-6920 - Mahmoud El-Haj, Nathan Rutherford, Matthew Coole, Ignatius Ezeani, Sheryl Prentice, Nancy Ide, Jo Knight, Scott Piao, John Mariani, Paul Rayson, Keith Suderman:
Infrastructure for Semantic Annotation in the Genomics Domain. 6921-6929 - Kshitij Shah, Gerard de Melo:
Correcting the Autocorrect: Context-Aware Typographical Error Correction via Training Data Augmentation. 6930-6936 - Brody Downs, Oghenemaro Anuyah, Aprajita Shukla, Jerry Alan Fails, Maria Soledad Pera, Katherine Landau Wright, Casey Kennington:
KidSpell: A Child-Oriented, Rule-Based, Phonetic Spellchecker. 6937-6946 - Suteera Seeha, Ivan Bilan, Liliana Mamani Sánchez, Johannes Huber, Michael Matuschek, Hinrich Schütze:
ThaiLMCut: Unsupervised Pretraining for Thai Word Segmentation. 6947-6957 - Reem Alatrash, Dominik Schlechtweg, Jonas Kuhn, Sabine Schulte im Walde:
CCOHA: Clean Corpus of Historical American English. 6958-6966 - Vilém Zouhar, Ondrej Bojar:
Outbound Translation User Interface Ptakopet: A Pilot Study. 6967-6975 - Hadrien Titeux, Rachid Riad, Xuan-Nga Cao, Nicolas Hamilakis, Kris Madden, Alejandrina Cristià, Anne-Catherine Bachoud-Lévi, Emmanuel Dupoux:
Seshat: a Tool for Managing and Verifying Annotation Campaigns of Audio Data. 6976-6982 - Cash Costello, Shelby Anderson, Caitlyn Bishop, James Mayfield, Paul McNamee:
Dragonfly: Advances in Non-Speaker Annotation for Low Resource Languages. 6983-6987 - Svetla Koeva, Nikola Obreshkov, Martin Yalamov:
Natural Language Processing Pipeline to Annotate Bulgarian Legislative Documents. 6988-6994 - Robert Forkel, Johann-Mattis List:
CLDFBench: Give Your Cross-Linguistic Data a Lift. 6995-7002 - Tomás Machálek:
KonText: Advanced and Flexible Corpus Query Interface. 7003-7008 - Tomás Machálek:
Word at a Glance: Modular Word Profile Aggregator. 7009-7014 - Marc Kupietz, Nils Diewald, Eliza Margaretha:
RKorAPClient: An R Package for Accessing the German Reference Corpus DeReKo via KorAP. 7015-7021 - Ossama Obeid, Nasser Zalmout, Salam Khalifa, Dima Taji, Mai Oudah, Bashar Alhafni, Go Inoue, Fadhl Eryani, Alexander Erdmann, Nizar Habash:
CAMeL Tools: An Open Source Python Toolkit for Arabic Natural Language Processing. 7022-7032 - Antoni Oliver, Bojana Mikelenic:
ReSiPC: a Tool for Complex Searches in Parallel Corpora. 7033-7037 - Salvador Lima, Naiara Pérez, Laura García-Sardiña, Montse Cuadros:
HitzalMed: Anonymisation of Clinical Text in Spanish. 7038-7043 - Balázs Indig, Bálint Sass, Iván Mittelholcz:
The xtsv Framework and the Twelve Virtues of Pipelines. 7044-7052 - Tobias Daudert:
A Web-based Collaborative Annotation and Consolidation Tool. 7053-7059 - Stefan Larson, Eric Guldan, Kevin Leach:
Data Query Language and Corpus Tools for Slot-Filling and Intent Classification Data. 7060-7068 - Amrith Krishna, Shiv Vidhyut, Dilpreet Chawla, Sruti Sambhavi, Pawan Goyal:
SHR++: An Interface for Morpho-syntactic Annotation of Sanskrit Corpora. 7069-7076 - Teruaki Oka, Yuichi Ishimoto, Yutaka Yagi, Takenori Nakamura, Masayuki Asahara, Kikuo Maekawa, Toshinobu Ogiso, Hanae Koiso, Kumiko Sakoda, Nobuko Kibe:
KOTONOHA: A Corpus Concordance System for Skewer-Searching NINJAL Corpora. 7077-7083 - Haruna Ogawa, Hitoshi Nishikawa, Takenobu Tokunaga, Hikaru Yokono:
Gamification Platform for Collecting Task-oriented Dialogue Data. 7084-7093 - Ralph Rose:
Improving the Production Efficiency and Well-formedness of Automatically-Generated Multiple-Choice Cloze Vocabulary Questions. 7094-7101 - Ines Rehbein, Josef Ruppenhofer, Thomas Schmidt:
Improving Sentence Boundary Detection for Spoken Language Transcripts. 7102-7111 - Ramy Eskander, Francesca Callejas, Elizabeth Nichols, Judith Klavans, Smaranda Muresan:
MorphAGram, Evaluation and Framework for Unsupervised Morphological Segmentation. 7112-7122 - Nadezda Okinina, Jennifer-Carmen Frey, Zarah Weiss:
CTAP for Italian: Integrating Components for the Analysis of Italian into a Multilingual Linguistic Complexity Analysis Tool. 7123-7131 - Regina Stodden, Behrang QasemiZadeh, Laura Kallmeyer:
Do you Feel Certain about your Annotation? A Web-based Semantic Frame Annotation Tool Considering Annotators' Concerns and Behaviors. 7132-7139 - Raheel Qader, François Portet, Cyril Labbé:
Seq2SeqPy: A Lightweight and Customizable Toolkit for Neural Sequence-to-Sequence Modeling. 7140-7144 - Dominique Brunato, Andrea Cimino, Felice Dell'Orletta, Giulia Venturi, Simonetta Montemagni:
Profiling-UD: a Tool for Linguistic Profiling of Texts. 7145-7151 - Sven Laur, Siim Orasmaa, Dage Särg, Paul Tammo:
EstNLTK 1.6: Remastered Estonian NLP Pipeline. 7152-7160 - Christian Chiarcos, Luis Glaser:
A Tree Extension for CoNLL-RDF. 7161-7169 - Carola Trips, Michael Percillier:
Lemmatising Verbs in Middle English Corpora: The Benefit of Enriching the Penn-Helsinki Parsed Corpus of Middle English 2 (PPCME2), the Parsed Corpus of Middle English Poetry (PCMEP), and A Parsed Linguistic Atlas of Early Middle English (PLAEME). 7170-7178 - Sanja Stajner, Sergiu Nisioi, Ioana Hulpus:
CoCo: A Tool for Automatically Assessing Conceptual Complexity of Texts. 7179-7186 - Jonathan Verner, Anna Vernerová:
PyVallex: A Processing System for Valency Lexicon Data. 7187-7193 - Manuel Fiorelli, Armando Stellato, Tiziano Lorenzetti, Andrea Turbati, Peter Schmitz, Enrico Francesconi, Najeh Hajlaoui, Brahim Batouche:
Editing OntoLex-Lemon in VocBench 3. 7194-7203 - Luciana Forti, Giuliana Grego Bolli, Filippo Santarelli, Valentino Santucci, Stefania Spina:
MALT-IT2: A New Resource to Measure Text Difficulty in Light of CEFR Levels for Italian L2 Learning. 7204-7211 - Christian Fäth, Christian Chiarcos, Björn Ebbrecht, Maxim Ionov:
Fintan - Flexible, Integrated Transformation and Annotation eNgineering. 7212-7221 - Jakub Waszczuk, Ilaine Wang, Jean-Yves Antoine, Anaïs Lefeuvre-Halftermeyer:
Contemplata, a Free Platform for Constituency Treebank Annotation. 7222-7229 - Kyeongmin Rim, Kelley Lynch, Marc Verhagen, Nancy Ide, James Pustejovsky:
Interchange Formats for Visualization: LIF and MMIF. 7230-7237 - Sam Davidson, Aaron Yamada, Paloma Fernandez Mira, Agustina Carando, Claudia H. Sanchez Gutierrez, Kenji Sagae:
Developing NLP Tools with a New Corpus of Learner Spanish. 7238-7243 - Francisco Rodrigues, Rinaldo Lima, William Domingues, Robson do Nascimento Fidalgo, Adrian-Gabriel Chifu, Bernard Espinasse, Sébastien Fournier:
DeepNLPF: A Framework for Integrating Third Party NLP Tools. 7244-7251
manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.