default search action
13th LREC 2022: Marseille, France
- Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Jan Odijk, Stelios Piperidis:
Proceedings of the Thirteenth Language Resources and Evaluation Conference, LREC 2022, Marseille, France, 20-25 June 2022. European Language Resources Association 2022 - Alexandre Diniz da Costa, Mateus Coutinho Marim, Ely Edison Matos, Tiago Timponi Torrent:
Domain Adaptation in Neural Machine Translation using a Qualia-Enriched FrameNet. 1-12 - Serge Gladkoff, Lifeng Han:
HOPE: A Task-Oriented and Human-Centric Evaluation Framework Using Professional Post-Editing Towards More Effective MT Evaluation. 13-21 - Chanjun Park, Seolhwa Lee, Jaehyung Seo, Hyeonseok Moon, Sugyeong Eo, Heuiseok Lim:
Priming Ancient Korean Neural Machine Translation. 22-28 - Toon Colman, Margot Fonteyne, Joke Daems, Nicolas Dirix, Lieve Macken:
GECO-MT: The Ghent Eye-tracking Corpus of Machine Translation. 29-38 - Levi Remijnse, Piek Vossen, Antske Fokkens, Sam Titarsolej:
Introducing Frege to Fillmore: A FrameNet Dataset that Captures both Sense and Reference. 39-50 - Bolette S. Pedersen, Nathalie Carmen Hau Sørensen, Sanni Nimb, Ida Flørke, Sussi Olsen, Thomas Troelsgård:
Compiling a Suitable Level of Sense Granularity in a Lexicon for AI Purposes: The Open Source COR Lexicon. 51-60 - Francis Bond, Merrick Yeu Herng Choo:
Sense and Sentiment. 61-69 - Joanna Ut-Seong Sio, Luís Morgado da Costa:
Enriching Linguistic Representation in the Cantonese Wordnet and Building the New Cantonese Wordnet Corpus. 70-78 - Nizar Habash, David Palfreyman:
ZAEBUC: An Annotated Arabic-English Bilingual Writer Corpus. 79-88 - Necva Bölücü, Burcu Can:
Turkish Universal Conceptual Cognitive Annotation. 89-99 - Tamás Váradi, Bence Nyéki, Svetla Koeva, Marko Tadic, Vanja Stefanec, Maciej Ogrodniczuk, Bartlomiej Niton, Piotr Pezik, Verginica Barbu Mititelu, Elena Irimia, Maria Mitrofan, Dan Tufis, Radovan Garabík, Simon Krek, Andraz Repar:
Introducing the CURLICAT Corpora: Seven-language Domain Specific Annotated Corpora from Curated Sources. 100-108 - C. Anton Rytting, Valerie Novak, James R. Hull, Victor M. Frank, Paul Rodrigues, Jarrett G. W. Lee, Laurel Miller-Sims:
RU-ADEPT: Russian Anonymized Dataset with Eight Personality Traits. 109-118 - Quentin Brabant, Gwénolé Lecorvé, Lina Maria Rojas-Barahona:
CoQAR: Question Rewriting on CoQA. 119-126 - Annalena Aicher, Nadine Gerstenlauer, Wolfgang Minker, Stefan Ultes:
User Interest Modelling in Argumentative Dialogue Systems. 127-136 - Giancarlo A. Xompero, Michele Mastromattei, Samir Salman, Cristina Giannone, Andrea Favalli, Raniero Romagnoli, Fabio Massimo Zanzotto:
Every time I fire a conversational designer, the performance of the dialogue system goes down. 137-145 - Yuqiao Wen, Guoqing Luo, Lili Mou:
An Empirical Study on the Overlapping Problem of Open-Domain Dialogue Datasets. 146-153 - Federica Gamba, Francesca Frontini, Daan Broeder, Monica Monachini:
Language Technologies for the Creation of Multilingual Terminologies. Lessons Learned from the SSHOC Project. 154-163 - Marc Schulder, Thomas Hanke:
How to be FAIR when you CARE: The DGS Corpus as a Case Study of Open Science Resources for Minority Languages. 164-173 - Valerio Basile, Cristina Bosco, Michael Fell, Viviana Patti, Rossella Varvara:
Italian NLP for Everyone: Resources and Models from EVALITA to the European Language Grid. 174-180 - Michael Rosner, Sina Ahmadi, Elena Simona Apostol, Julia Bosque-Gil, Christian Chiarcos, Milan Dojchinovski, Katerina Gkirtzou, Jorge Gracia, Dagmar Gromann, Chaya Liebeskind, Giedre Valunaite Oleskeviciene, Gilles Sérasset, Ciprian-Octavian Truica:
Cross-Lingual Link Discovery for Under-Resourced Languages. 181-192 - Valentina Dragos, Delphine Battistelli, Aline Étienne, Yolène Constable:
Angry or Sad ? Emotion Annotation for Extremist Content Characterisation. 193-201 - Nicolas Zampieri, Carlos Ramisch, Irina Illina, Dominique Fohr:
Identification of Multiword Expressions in Tweets for Hate Speech Detection. 202-210 - Michael Jantscher, Roman Kern:
Causal Investigation of Public Opinion during the COVID-19 Pandemic via Social Media Text. 211-226 - Pakawat Nakwijit, Matthew Purver:
Misspelling Semantics in Thai. 227-236 - Véronique Moriceau, Farah Benamara, Abdelmoumene Boumadane:
Automatic Detection of Stigmatizing Uses of Psychiatric Terms on Twitter. 237-243 - Isabelle Mohr, Amelie Wührl, Roman Klinger:
CoVERT: A Corpus of Fact-checked Biomedical COVID-19 Tweets. 244-257 - Francesco Barbieri, Luis Espinosa Anke, José Camacho-Collados:
XLM-T: Multilingual Language Models in Twitter for Sentiment Analysis and Beyond. 258-266 - Areej Alhassan, Jinkai Zhang, Viktor Schlegel:
'Am I the Bad One'? Predicting the Moral Judgement of the Crowd Using Pre-trained Language Models. 267-276 - Kelvin Han, Thiago Castro Ferreira, Claire Gardent:
Generating Questions from Wikidata Triples. 277-290 - Matteo Muffo, Aldo Cocco, Enrico Bertino:
Evaluating Transformer Language Models on Arithmetic Operations Using Number Decomposition. 291-297 - Yuji Naraki, Tetsuya Sakai, Yoshihiko Hayashi:
Evaluating the Effects of Embedding with Speaker Identity Information in Dialogue Summarization. 298-304 - Julius Monsen, Evelina Rennes:
Perceived Text Quality and Readability in Extractive and Abstractive Summaries. 305-312 - Alex Mei, Anisha Kabir, Rukmini Bapat, John Judge, Tony Sun, William Yang Wang:
Learning to Prioritize: Precision-Driven Sentence Filtering for Long Text Summarization. 313-318 - Tatsuya Ishigaki, Suzuko Nishino, Sohei Washino, Hiroki Igarashi, Yukari Nagai, Yuichi Washida, Akihiko Murai:
Automating Horizon Scanning in Future Studies. 319-327 - Nguyen Phuc Minh, Tran Hoang Vu, Vu Hoang, Ta Duc Huy, Trung Huu Bui, Steven Quoc Hung Truong:
ViHealthBERT: Pre-trained Language Models for Vietnamese in Health Text Mining. 328-337 - Timour Igamberdiev, Ivan Habernal:
Privacy-Preserving Graph Convolutional Networks for Text Classification. 338-350 - Reem Alghamdi, Zhenwen Liang, Xiangliang Zhang:
ArMATH: a Dataset for Solving Arabic Math Word Problems. 351-362 - Benjamin Winter, Alexei Figueroa Rosero, Alexander Löser, Felix Alexander Gers, Amy Siu:
KIMERA: Injecting Domain Knowledge into Vacant Transformer Heads. 363-373 - Andrei-Marius Avram, Darius Catrina, Dumitru-Clementin Cercel, Mihai Dascalu, Traian Rebedea, Vasile Florian Pais, Dan Tufis:
Distilling the Knowledge of Romanian BERTs Using Multiple Teachers. 374-384 - Yuta Matsunaga, Takaaki Saeki, Shinnosuke Takamichi, Hiroshi Saruwatari:
Personalized Filled-pause Generation with Group-wise Prediction Models. 385-392 - Imran A. Sheikh, Emmanuel Vincent, Irina Illina:
Transformer versus LSTM Language Models trained on Uncertain ASR Hypotheses in Limited Data Scenarios. 393-399 - Boshko Koloski, Senja Pollak, Blaz Skrlj, Matej Martinc:
Out of Thin Air: Is Zero-Shot Cross-Lingual Keyword Detection Better Than Unsupervised? 400-409 - Anastasios Lamproudis, Aron Henriksson, Hercules Dalianis:
Evaluating Pretraining Strategies for Clinical BERT Models. 410-416 - Rustem Yeshpanov, Yerbolat Khassanov, Huseyin Atakan Varol:
KazNERD: Kazakh Named Entity Recognition Dataset. 417-426 - Michail Mersinias, Panagiotis Valvis:
Mitigating Dataset Artifacts in Natural Language Inference Through Automatic Contextual Data Augmentation and Learning Optimization. 427-435 - Mike Zhang, Kristian Nørgaard Jensen, Barbara Plank:
Kompetencer: Fine-grained Skill Classification in Danish Job Postings via Distant Supervision and Transfer Learning. 436-447 - Roos M. Bakker, Romy A. N. van Drie, Maaike de Boer, Robert van Doesburg, Tom M. van Engers:
Semantic Role Labelling for Dutch Law Texts. 448-457 - Kyle Goslin, Markus Hofmann:
English Language Spelling Correction as an Information Retrieval Task Using Wikipedia Search Statistics. 458-464 - Meisin Lee, Lay-Ki Soon, Eu-Gene Siew, Ly Fie Sugianto:
CrudeOilNews: An Annotated Crude Oil News Corpus for Event Extraction. 465-479 - Niklas Dehio, Malte Ostendorff, Georg Rehm:
Claim Extraction and Law Matching for COVID-19-related Legislation. 480-490 - Basit Ali, Sachin Pawar, Girish K. Palshikar, Rituraj Singh:
Constructing A Dataset of Support and Attack Relations in Legal Arguments in Court Judgements using Linguistic Rules. 491-500 - Teresa Paccosi, Alessio Palmero Aprosio:
KIND: an Italian Multi-Domain Dataset for Named Entity Recognition. 501-507 - Elena Mikhalkova, Alexander A. Khlyupin:
Russian Jeopardy! Data Set for Question-Answering Systems. 508-514 - Benjamin Hättasch, Carsten Binnig:
Know Better - A Clickbait Resolving Challenge. 515-523 - Dayne Freitag, John Cadigan, Robert Sasseen, Paul Kalmar:
Valet: Rule-Based Information Extraction for Rapid Deployment. 524-533 - Tom Sweers, Iris Hendrickx, Helmer Strik:
Negation Detection in Dutch Spoken Human-Computer Conversations. 534-542 - Christopher Cieri, Mark Liberman, Sunghye Cho, Stephanie M. Strassel, James Fiumara, Jonathan Wright:
Reflections on 30 Years of Language Resource Development and Sharing. 543-550 - Valérie Mapelli, Victoria Arranz, Khalid Choukri, Hélène Mazo:
Language Resources to Support Language Diversity - the ELRA Achievements. 551-558 - Pawel Kamocki, Andreas Witt:
Ethical Issues in Language Resources and Language Technology - Tentative Categorisation. 559-563 - Fanny Ducel, Karën Fort, Gaël Lejeune, Yves Lepage:
Do we Name the Languages we Study? The #BenderRule in LREC and ACL articles. 564-573 - Luna De Bruyne, Akbar Karimi, Orphée De Clercq, Andrea Prati, Véronique Hoste:
Aspect-Based Emotion Analysis and Multimodal Coreference: A Case Study of Customer Comments on Adidas Instagram Posts. 574-580 - Gabriel Roccabruna, Steve Azzolin, Giuseppe Riccardi:
Multi-source Multi-domain Sentiment Analysis with BERT-based Models. 581-589 - Shamsuddeen Hassan Muhammad, David Ifeoluwa Adelani, Aremu Anuoluwapo, Idris Abdulmumin:
NaijaSenti: A Nigerian Twitter Sentiment Corpus for Multilingual Sentiment Analysis. 590-602 - Aline Étienne, Delphine Battistelli, Gwénolé Lecorvé:
A (Psycho-)Linguistically Motivated Scheme for Annotating and Exploring Emotions in a Genre-Diverse Corpus. 603-612 - Jean-Philippe Prost:
Integrating a Phrase Structure Corpus Grammar and a Lexical-Semantic Network: the HOLINET Knowledge Graph. 613-622 - Giorgio Ottolina, Matteo Luigi Palmonari, Manuel Vimercati, Mehwish Alam:
On the Impact of Temporal Representations on Metaphor Detection. 623-632 - Damien Sileo, Marie-Francine Moens:
Analysis and Prediction of NLP Models via Task Embeddings. 633-647 - Amir Hazem, Mérième Bouhandi, Florian Boudin, Béatrice Daille:
Cross-lingual and Cross-domain Transfer Learning for Automatic Term Extraction from Low Resource Data. 648-662 - Lena Jurkschat, Gregor Wiedemann, Maximilian Heinrich, Mattes Ruckdeschel, Sunna Torge:
Few-Shot Learning for Argument Aspects of the Nuclear Energy Debate. 663-672 - Anik Jacobsen, Salar Mohtaj, Sebastian Möller:
MuLVE, A Multi-Language Vocabulary Evaluation Data Set. 673-679 - Leonardo Zilio, Hadeel Saadany, Prashant Sharma, Diptesh Kanojia, Constantin Orasan:
PLOD: An Abbreviation Detection Dataset for Scientific Documents. 680-688 - Tosin P. Adewumi, Roshanak Vadoodi, Aparajita Tripathy, Konstantina Nikolaidou, Foteini Liwicki, Marcus Liwicki:
Potential Idiomatic Expression (PIE)-English: Corpus for Classes of Idioms. 689-696 - Marie Bexte, Ronja Laarmann-Quante, Andrea Horbach, Torsten Zesch:
LeSpell - A Multi-Lingual Benchmark Corpus of Spelling Errors to Develop Spellchecking Methods for Learner Language. 697-706 - Laura Seiffe, Fares Kallel, Sebastian Möller, Babak Naderi, Roland Roller:
Subjective Text Complexity Assessment for German. 707-714 - Elena Frick, Thomas Schmidt, Henrike Helmer:
Querying Interaction Structure: Approaches to Overlap in Spoken Language Corpora. 715-722 - Piotr Pezik, Gosia Krawentek, Sylwia Karasinska, Pawel Wilk, Paulina Rybinska, Anna Cichosz, Angelika Peljak-Lapinska, Mikolaj Deckert, Michal Adamczyk:
DiaBiz - an Annotated Corpus of Polish Call Center Dialogs. 723-726 - Roberts Dargis, Ilze Auzina, Inga Kaija, Kristine Levane-Petrova, Kristine Pokratniece:
LaVA - Latvian Language Learner corpus. 727-731 - Kenneth Heafield, Elaine Farrow, Jelmer van der Linde, Gema Ramírez-Sánchez, Dion Wiggins:
The EuroPat Corpus: A Parallel Corpus of European Patent Data. 732-740 - Elisabeth Eder, Michael Wiegand, Ulrike Krieg-Holz, Udo Hahn:
"Beste Grüße, Maria Meyer" - Pseudonymization of Privacy-Sensitive Information in Emails. 741-752 - Wolfgang Schmeisser-Nieto, Montserrat Nofre, Mariona Taulé:
Criteria for the Annotation of Implicit Stereotypes. 753-762 - Philipp Klumpp, Tomás Arias-Vergara, Paula Andrea Pérez-Toro, Elmar Nöth, Juan Rafael Orozco-Arroyave:
Common Phone: A Multilingual Dataset for Robust Acoustic Modelling. 763-768 - Karim El Haff, Mustafa Jarrar, Tymaa Hammouda, Fadi A. Zaraket:
Curras + Baladi: Towards a Levantine Corpus. 769-778 - Hiroaki Yamada, Takenobu Tokunaga, Ryutaro Ohara, Keisuke Takeshita, Mihoko Sumida:
Annotation Study of Japanese Judgments on Tort for Legal Judgment Prediction with Rationales. 779-790 - Dana Ruiter, Liane Reiners, Ashwin Geet D'Sa, Thomas Kleinbauer, Dominique Fohr, Irina Illina, Dietrich Klakow, Christian Schemer, Angeliki Monnier:
Placing M-Phasis on the Plurality of Hate: A Feature-Based Corpus of Hate Online. 791-804 - Ekaterina Lapshinova-Koltunski, Pedro Augusto Ferreira, Elina Lartaud, Christian Hardmeier:
ParCorFull2.0: a Parallel Corpus Annotated with Full Coreference. 805-813 - Maria Boritchev, Maxime Amblard:
A Multi-Party Dialogue Ressource in French. 814-823 - Jaume Zaragoza-Bernabeu, Gema Ramírez-Sánchez, Marta Bañón, Sergio Ortiz-Rojas:
Bicleaner AI: Bicleaner Goes Neural. 824-831 - Anisia Katinskaia, Maria Lebedeva, Jue Hou, Roman Yangarber:
Semi-automatically Annotated Learner Corpus for Russian. 832-839 - Khuyagbaatar Batsuren, Omer Goldman, Salam Khalifa, Nizar Habash, Witold Kieras, Gábor Bella, Brian Leonard, Garrett Nicolai, Kyle Gorman, Yustinus Ghanggo Ate, Maria Ryskina, Sabrina J. Mielke, Elena Budianskaya, Charbel El-Khaissi, Tiago Pimentel, Michael Gasser, William Abbott Lane, Mohit Raj, Matt Coler, Jaime Rafael Montoya Samame, Delio Siticonatzi Camaiteri, Esaú Zumaeta Rojas, Didier López Francis, Arturo Oncevay, Juan López Bautista, Gema Celeste Silva Villegas, Lucas Torroba Hennigen, Adam Ek, David Guriel, Peter Dirix, Jean-Philippe Bernardy, Andrey Scherbakov, Aziyana Bayyr-ool, Antonios Anastasopoulos, Roberto Zariquiey, Karina Sheifer, Sofya Ganieva, Hilaria Cruz, Ritván Karahóga, Stella Markantonatou, George Pavlidis, Matvey Plugaryov, Elena Klyachko, Ali Salehi, Candy Angulo, Jatayu Baxi, Andrew Krizhanovsky, Natalia Krizhanovskaya, Elizabeth Salesky, Clara Vania, Sardana Ivanova, Jennifer C. White, Rowan Hall Maudslay, Josef Valvoda, Ran Zmigrod, Paula Czarnowska, Irene Nikkarinen, Aelita Salchak, Brijesh Bhatt, Christopher Straughn, Zoey Liu, Jonathan North Washington, Yuval Pinter, Duygu Ataman, Marcin Wolinski, Totok Suhardijanto, Anna Yablonskaya, Niklas Stoehr, Hossep Dolatian, Zahroh Nuriah, Shyam Ratan, Francis M. Tyers, Edoardo M. Ponti, Grant Aiton, Aryaman Arora, Richard J. Hatcher, Ritesh Kumar, Jeremiah Young, Daria Rodionova, Anastasia Yemelina, Taras Andrushko, Igor Marchenko, Polina Mashkovtseva, Alexandra Serova, Emily Prud'hommeaux, Maria Nepomniashchaya, Fausto Giunchiglia, Eleanor Chodroff, Mans Hulden, Miikka Silfverberg, Arya D. McCarthy, David Yarowsky, Ryan Cotterell, Reut Tsarfaty, Ekaterina Vylomova:
UniMorph 4.0: Universal Morphology. 840-855 - Dmytro Kalpakchi, Johan Boye:
Textinator: an Internationalized Tool for Annotation and Human Evaluation in Natural Language Processing and Generation. 856-866 - Anaïs Ollagnier, Elena Cabrio, Serena Villata, Catherine Blaya:
CyberAgressionAdo-v1: a Dataset of Annotated Online Aggressions in French Collected through a Role-playing Game. 867-875 - Md Saroar Jahan, Mourad Oussalah, Nabil Arhab:
Finnish Hate-Speech Detection on Social Media Using CNN and FinBERT. 876-882 - Hyeonseok Moon, Chanjun Park, Seolhwa Lee, Jaehyung Seo, Jungseob Lee, Sugyeong Eo, Heuiseok Lim:
Empirical Analysis of Noising Scheme based Synthetic Data Generation for Automatic Post-editing. 883-891 - Daniel Edmiston, Phillip Keung, Noah A. Smith:
Domain Mismatch Doesn't Always Prevent Cross-lingual Transfer Learning. 892-899 - Jens-Michalis Papaioannou, Paul Grundmann, Betty van Aken, Athanasios Samaras, Ilias Kyparissidis, George Giannakoulas, Felix A. Gers, Alexander Löser:
Cross-Lingual Knowledge Transfer for Clinical Phenotyping. 900-909 - Paul McNamee, Kevin Duh:
The Multilingual Microblog Translation Corpus: Improving and Evaluating Translation of User-Generated Text. 910-918 - Júlia Sato, Helena de Medeiros Caseli, Lucia Specia:
Multilingual and Multimodal Learning for Brazilian Portuguese. 919-927 - Pedro Jeuris, Jan Niehues:
LibriS2S: A German-English Speech-to-Speech Translation Corpus. 928-935 - Vivien Macketanz, Eleftherios Avramidis, Aljoscha Burchardt, He Wang, Renlong Ai, Shushen Manakhimova, Ursula Strohriegel, Sebastian Möller, Hans Uszkoreit:
A Linguistically Motivated Test Suite to Semi-Automatically Evaluate German-English Machine Translation Output. 936-947 - Evangelia Gogoulou, Ariel Ekgren, Tim Isbister, Magnus Sahlgren:
Cross-lingual Transfer of Monolingual Models. 948-955 - Fynn Petersen-Frey, Marcus Soll, Louis Kobras, Melf Johannsen, Peter Kling, Chris Biemann:
Dataset of Student Solutions to Algorithm and Data Structure Programming Assignments. 956-962 - Ishani Mondal, Kalika Bali, Mohit Jain, Monojit Choudhury, Jacki O'Neill, Millicent Ochieng, Kagonya Awori, Keshet Ronen:
Language Patterns and Behaviour of the Peer Supporters in Multilingual Healthcare Conversational Forums. 963-975 - Zheng Xin Yong, Patrick D. Watson, Tiago Timponi Torrent, Oliver Czulo, Collin F. Baker:
Frame Shift Prediction. 976-986 - Brigitte Bigi, Maryvonne Zimmermann, Carine André:
CLeLfPC: a Large Open Multi-Speaker Corpus of French Cued Speech. 987-994 - Carlos Daniel Hernandez Mena, David Erik Mollberg, Michal Borský, Jón Guðnason:
Samrómur Children: An Icelandic Speech Corpus. 995-1002 - Per Erik Solberg, Pablo Ortiz:
The Norwegian Parliamentary Speech Corpus. 1003-1008 - Martijn Bentum, Louis ten Bosch, Henk van den Heuvel, Simone Wills, Domenique van der Niet, Jelske Dijkstra, Hans Van de Velde:
A Speech Recognizer for Frisian/Dutch Council Meetings. 1009-1015 - Meiko Fukuda, Ryota Nishimura, Maina Umezawa, Kazumasa Yamamoto, Yurie Iribe, Norihide Kitaoka:
Elderly Conversational Speech Corpus with Cognitive Impairment Test and Pilot Dementia Detection Experiment Using Acoustic Characteristics of Speech in Japanese Dialects. 1016-1022 - Ali Can Kocabiyikoglu, François Portet, Prudence Gibert, Hervé Blanchon, Jean-Marc Babouchkine, Gaëtan Gavazzi:
A Spoken Drug Prescription Dataset in French for Spoken Language Understanding. 1023-1031 - Cristian Tejedor García, Berrie van der Molen, Henk van den Heuvel, Arjan van Hessen, Toine Pieters:
Towards an Open-Source Dutch Speech Recognition System for the Healthcare Domain. 1032-1039 - Maria Moutti, Sofia Eleftheriou, Panagiotis Koromilas, Theodoros Giannakopoulos:
A Dataset for Speech Emotion Recognition in Greek Theatrical Plays. 1040-1046 - Liisi Piits, Hille Pajupuu, Heete Sahkai, Rene Altrov, Liis Ermus, Kairi Tamuri, Indrek Hein, Meelis Mihkla, Indrek Kiissel, Egert Männisalu, Kristjan Suluste, Jaan Pajupuu:
Audiobook Dialogues as Training Data for Conversational Style Synthetic Voices. 1047-1053 - Yaru Wu, Fabian M. Suchanek, Ioana Vasilescu, Lori Lamel, Martine Adda-Decker:
Using a Knowledge Base to Automatically Annotate Speech Corpora and to Identify Sociolinguistic Variation. 1054-1060 - Xinjian Li, Florian Metze, David R. Mortensen, Alan W. Black, Shinji Watanabe:
Phone Inventories and Recognition for Every Language. 1061-1067 - Dimitrios Roussis, Vassilis Papavassiliou, Sokratis Sofianopoulos, Prokopis Prokopidis, Stelios Piperidis:
Constructing Parallel Corpora from COVID-19 News using MediSys Metadata. 1068-1072 - Dongxu Zhang, Sunil Mohan, Michaela Torkar, Andrew McCallum:
A Distant Supervision Corpus for Extracting Biomedical Relationships Between Chemicals, Diseases and Genes. 1073-1082 - Jayetri Bardhan, Anthony M. Colas, Kirk Roberts, Daisy Zhe Wang:
DrugEHRQA: A Question Answering Dataset on Structured and Unstructured Electronic Health Records For Medicine Related Queries. 1083-1097 - Stella Verkijk, Piek Vossen:
Efficiently and Thoroughly Anonymizing a Transformer Language Model for Dutch Electronic Health Records: a Two-Step Method. 1098-1103 - Loïc Grobol, Mathilde Regnault, Pedro Javier Ortiz Suárez, Benoît Sagot, Laurent Romary, Benoît Crabbé:
BERTrade: Using Contextual Embeddings to Parse Old French. 1104-1113 - Jenna Kanerva, Filip Ginter:
Out-of-Domain Evaluation of Finnish Dependency Parsing. 1114-1124 - Elisa Gugliotta, Marco Dinarelli:
TArC: Tunisian Arabish Corpus, First complete release. 1125-1136 - Zdenek Zabokrtský, Niyati Bafna, Jan Bodnár, Lukás Kyjánek, Emil Svoboda, Magda Sevcíková, Jonás Vidra:
Towards Universal Segmentations: UniSegments 1.0. 1137-1149 - Steven Moran, Christian Bentz, Ximena Gutierrez-Vasques, Olga Pelloni, Tanja Samardzic:
TeDDi Sample: Text Data Diversity Sample for Language Comparison and Multilingual NLP. 1150-1158 - Diego Bear, Paul Cook:
Leveraging a Bilingual Dictionary to Learn Wolastoqey Word Representations. 1159-1166 - Linda Wiechetek, Katri Hiovain-Asikainen, Inga Lill Sigga Mikkelsen, Sjur N. Moshagen, Flammie A. Pirinen, Trond Trosterud, Børre Gaup:
Unmasking the Myth of Effortless Big Data - Making an Open Source Multi-lingual Infrastructure and Building Language Resources from Scratch. 1167-1177 - Andreas Liesenfeld, Mark Dingemanse:
Building and curating conversational corpora for diversity-aware language science and technology. 1178-1192 - Heike Przybyl, Ekaterina Lapshinova-Koltunski, Katrin Menzel, Stefan Fischer, Elke Teich:
EPIC UdS - Creation and Applications of a Simultaneous Interpreting Corpus. 1193-1200 - Thomas Green, Diana Maynard, Chenghua Lin:
Development of a Benchmark Corpus to Support Entity Recognition in Job Descriptions. 1201-1208 - Michael Arrigo, Stephanie M. Strassel, Nolan King, Thao Tran, Lisa P. Mason:
CAMIO: A Corpus for OCR in Multiple Languages. 1209-1216 - Rodrigo Wilkens, David Alfter, Xiaoou Wang, Alice Pintard, Anaïs Tack, Kevin P. Yancey, Thomas François:
FABRA: French Aggregator-Based Readability Assessment toolkit. 1217-1233 - Annalena Aicher, Nadine Gerstenlauer, Isabel Feustel, Wolfgang Minker, Stefan Ultes:
Towards Building a Spoken Dialogue System for Argument Exploration. 1234-1241 - Chanjun Park, Yoonna Jang, Seolhwa Lee, Sungjin Park, Heuiseok Lim:
FreeTalky: Don't Be Afraid! Conversations Made Easier by a Humanoid Robot using Persona-based Dialogue. 1242-1248 - Yuta Hayashibe:
Self-Contained Utterance Description Corpus for Japanese Dialog. 1249-1255 - Jessica Huynh, Ting-Rui Chiang, Jeffrey P. Bigham, Maxine Eskénazi:
DialCrowd 2.0: A Quality-Focused Dialog System Crowdsourcing Toolkit. 1256-1263 - Hugo Gonçalo Oliveira, Patrícia Ferreira, Daniel Martins, Catarina Silva, Ana Alves:
A Brief Survey of Textual Dialogue Corpora. 1264-1274 - Ulrich Rückert, Srinivas Sunkara, Abhinav Rastogi, Sushant Prakash, Pranav Khaitan:
A Unified Approach to Entity-Centric Context Tracking in Social Conversations. 1275-1285 - Vojtech Hudecek, Léon-Paul Schaub, Daniel Stancl, Patrick Paroubek, Ondrej Dusek:
A Unifying View On Task-oriented Dialogue Annotation. 1286-1296 - Antonio Origlia, Martina Di Bratto, Maria Di Maro, Sabrina Mennella:
A Multi-source Graph Representation of the Movie Domain for Recommendation Dialogues Analysis. 1297-1306 - Flor Miriam Plaza del Arco, Ana Belén Parras Portillo, Pilar López-Úbeda, Beatriz Botella Gil, María Teresa Martín Valdivia:
SHARE: A Lexicon of Harmful Expressions by Spanish Speakers. 1307-1316 - Tatu Ylönen:
Wiktextract: Wiktionary as Machine-Readable Structured Data. 1317-1325 - Daniel Holmer, Evelina Rennes:
NyLLex: A Novel Resource of Swedish Words Annotated with Reading Proficiency Level. 1326-1331 - Zdenka Uresová, Karolina Zaczynska, Peter Bourgonje, Eva Fucíková, Georg Rehm, Jan Hajic:
Making a Semantic Event-type Ontology Multilingual. 1332-1343 - Veronika Kolárová, Anna Vernerová:
NomVallex: A Valency Lexicon of Czech Nouns and Adjectives. 1344-1352 - Izaskun Aldezabal, Jose Maria Arriola, Arantxa Otegi:
TZOS: an Online Terminology Database Aimed at Working on Basque Academic Terminology Collaboratively. 1353-1359 - Manfred Klenner, Anne Göhring:
Animacy Denoting German Nouns: Annotation and Classification. 1360-1364 - Enrica Troiano, Laura Oberländer, Maximilian Wegge, Roman Klinger:
x-enVENT: A Corpus of Event Descriptions with Experiencer-specific Emotion and Appraisal Annotations. 1365-1375 - Anne Göhring, Manfred Klenner:
Polar Quantification of Actor Noun Phrases for German. 1376-1380 - Pavel Pribán, Josef Steinberger:
Czech Dataset for Cross-lingual Subjectivity Classification. 1381-1391 - Alexandra Ciobotaru, Mihai Vlad Constantinescu, Liviu P. Dinu, Stefan Dumitrescu:
RED v2: Enhancing RED Dataset for Multi-Label Emotion Detection. 1392-1399 - Katrin Ortmann:
Fine-Grained Error Analysis and Fair Evaluation of Labeled Spans. 1400-1407 - Elena V. Epure, Romain Hennequin:
Probing Pre-trained Auto-regressive Language Models for Named Entity Typing and Recognition. 1408-1417 - Rob van der Goot, Max Müller-Eberstein, Barbara Plank:
Frustratingly Easy Performance Improvements for Low-resource Setups: A Tale on BERT and Segment Embeddings. 1418-1427 - Ashleigh Richardson, Janet Wiles:
A Systematic Study Reveals Unexpected Interactions in Pre-Trained Neural Machine Translation. 1437-1443 - Mustafa Ocal, Adrian Perez, Antonela Radas, Mark A. Finlayson:
Holistic Evaluation of Automatic TimeML Annotators. 1444-1453 - Serge Gladkoff, Irina Sorokina, Lifeng Han, Alexandra Alekseeva:
Measuring Uncertainty in Translation Quality Evaluation (TQE). 1454-1461 - Shatha Altammami, Eric Atwell:
Challenging the Transformer-based models with a Classical Arabic dataset: Quran and Hadith. 1462-1471 - William Britton, Somdeb Sarkhel, Deepak Venugopal:
Question Modifiers in Visual Question Answering. 1472-1479 - Jose Sosa, Serge Sharoff:
Multimodal Pipeline for Collection of Misinformation Data from Telegram. 1480-1489 - Xinyuan Xia, Lu Xiao, Kun Yang, Yueyue Wang:
Identifying Tension in Holocaust Survivors' Interview: Code-switching/Code-mixing as Cues. 1490-1495 - Kristian Nørgaard Jensen, Barbara Plank:
Fine-tuning vs From Scratch: Do Vision & Language Models Have Similar Capabilities on Out-of-Distribution Visual Question Answering? 1496-1508 - Svetla Koeva, Ivelina Stoyanova, Jordan Kralev:
Multilingual Image Corpus - Towards a Multimodal and Multilingual Dataset. 1509-1518 - Jung-Ho Kim, Eui Jun Hwang, Sukmin Cho, Du Hui Lee, Jong C. Park:
Sign Language Production With Avatar Layering: A Critical Use Case over Rare Words. 1519-1528 - Nikhil Krishnaswamy, William Pickard, Brittany Cates, Nathaniel Blanchard, James Pustejovsky:
The VoxWorld Platform for Multimodal Embodied Agents. 1529-1541 - Eftekhar Hossain, Omar Sharif, Mohammed Moshiul Hoque:
MemoSen: A Multimodal Dataset for Sentiment Analysis of Memes. 1542-1554 - Denis Ivanko, Alexandr Axyonov, Dmitry Ryumin, Alexey M. Kashevnik, Alexey Karpov:
RUSAVIC Corpus: Russian Audio-Visual Speech in Cars. 1555-1559 - Camille Challant, Michael Filhol:
A First Corpus of AZee Discourse Expressions. 1560-1565 - Luis Lebron, Yvette Graham, Kevin McGuinness, Konstantinos Kouramas, Noel E. O'Connor:
BERTHA: Video Captioning Evaluation Via Transfer-Learned Human Assessment. 1566-1575 - Richard Brutti, Lucia Donatelli, Kenneth Lai, James Pustejovsky:
Abstract Meaning Representation for Gesture. 1576-1583 - Taja Kuzman, Peter Rupnik, Nikola Ljubesic:
The GINCO Training Dataset for Web Genre Identification of Documents Out in the Wild. 1584-1594 - Gaëlle Laperrière, Valentin Pelloin, Antoine Caubrière, Salima Mdhaffar, Nathalie Camelin, Sahar Ghannay, Bassam Jabaian, Yannick Estève:
The Spoken Language Understanding MEDIA Benchmark Dataset in the Era of Deep Learning: data updates, training and evaluation tools. 1595-1602 - Gorka Urbizu, Iñaki San Vicente, Xabier Saralegi, Rodrigo Agerri, Aitor Soroa:
BasqueGLUE: A Natural Language Understanding Benchmark for Basque. 1603-1612 - Nicolas Stefanovitch, Jakub Piskorski, Sopho Kharazi:
Resources and Experiments on Sentiment Classification for Georgian. 1613-1621 - Nadhem Zmandar, Tobias Daudert, Sina Ahmadi, Mahmoud El-Haj, Paul Rayson:
CoFiF Plus: A French Financial Narrative Summarisation Corpus. 1622-1639 - Rémi Calizzano, Malte Ostendorff, Qian Ruan, Georg Rehm:
Generating Extended and Multilingual Summaries with Pre-trained Transformers. 1640-1650 - Louis Martin, Angela Fan, Éric de la Clergerie, Antoine Bordes, Benoît Sagot:
MUSS: Multilingual Unsupervised Sentence Simplification by Mining Paraphrases. 1651-1664 - Fabio Tamburini:
Combining ELECTRA and Adaptive Graph Encoding for Frame Identification. 1671-1679 - Aina Garí Soler, Matthieu Labeau, Chloé Clavel:
Polysemy in Spoken Conversations and Written Texts. 1680-1690 - Vuk Batanovic, Maja Milicevic Petrovic:
Cross-Level Semantic Similarity for Serbian Newswire Texts. 1691-1699 - Ishan Jindal, Alexandre Rademaker, Michal Ulewicz, Ha Linh, Huyen Nguyen, Khoi-Nguyen Tran, Huaiyu Zhu, Yunyao Li:
Universal Proposition Bank 2.0. 1700-1711 - Nora Hollenstein, Maria Barrett, Marina Björnsdóttir:
The Copenhagen Corpus of Eye Tracking Recordings from Natural Reading of Danish Texts. 1712-1720 - Andreas Weise, Matthew McNeill, Rivka Levitan:
The Brooklyn Multi-Interaction Corpus for Analyzing Variation in Entrainment Behavior. 1721-1731 - Aleksandra Miletic, Christophe Benzitoun, Georgeta Cislaru, Santiago Herrera-Yanez:
Pro-TEXT: an Annotated Corpus of Keystroke Logs. 1732-1739 - Federico Bonetti, Elisa Leonardelli, Daniela Trotta, Raffaele Guarasci, Sara Tonelli:
Work Hard, Play Hard: Collecting Acceptability Annotations through a 3D Game. 1740-1750 - Ekaterina Lapshinova-Koltunski, Maja Popovic, Maarit Koponen:
DiHuTra: a Parallel Corpus to Analyse Differences between Human Translations. 1751-1760 - Md Saroar Jahan, Djamila Romaissa Beddiar, Mourad Oussalah, Muhidin Mohamed:
Data Expansion Using WordNet-based Semantic Expansion and Word Disambiguation for Cyberbullying Detection. 1761-1770 - Peter Polák, Muskaan Singh, Anna Nedoluzhko, Ondrej Bojar:
ALIGNMEET: A Comprehensive Tool for Meeting Annotation, Alignment, and Evaluation. 1771-1779 - Sebastian P. Bayerl, Alexander Wolff von Gudenberg, Florian Hönig, Elmar Nöth, Korbinian Riedhammer:
KSoF: The Kassel State of Fluency Dataset - A Therapy Centered Dataset of Stuttering. 1780-1787 - Gaël Guibon, Luce Lefeuvre, Matthieu Labeau, Chloé Clavel:
EZCAT: an Easy Conversation Annotation Tool. 1788-1797 - Kaja Dobrovoljc:
Spoken Language Treebanks in Universal Dependencies: an Overview. 1798-1806 - Bram Vanroy, Lieve Macken:
LeConTra: A Learner Corpus of English-to-Dutch News Translation. 1807-1816 - Barbora Hladká, Jirí Mírovský, Matyás Kopp, Václav Moravec:
Annotating Attribution in Czech News Server Articles. 1817-1823 - Luke Gessler, Nathan Schneider, Joseph C. Ledford, Austin Blodgett:
Xposition: An Online Multilingual Database of Adpositional Semantics. 1824-1830 - Jennifer Tracey, Ann Bies, Jeremy Getman, Kira Griffitt, Stephanie M. Strassel:
A Study in Contradiction: Data and Annotation for AIDA Focusing on Informational Conflict in Russia-Ukraine Relations. 1831-1838 - Najet Hadj Mohamed, Chérifa Ben Khelil, Agata Savary, Iskandar Keskes, Jean-Yves Antoine, Lamia Hadrich Belguith:
Annotating Verbal Multiword Expressions in Arabic: Assessing the Validity of a Multilingual Annotation Procedure. 1839-1848 - Carol Figueroa, Adaeze Adigwe, Magalie Ochs, Gabriel Skantze:
Annotation of Communicative Functions of Short Feedback Tokens in Switchboard. 1849-1859 - Adem Ajvazi, Christian Hardmeier:
A Dataset of Offensive Language in Kosovo Social Media. 1860-1869 - Bashar Alhafni, Nizar Habash, Houda Bouamor:
The Arabic Parallel Gender Corpus 2.0: Extensions and Analyses. 1870-1884 - Daniel Cheng, Kyle Yan, Phillip Keung, Noah A. Smith:
The Engage Corpus: A Social Media Dataset for Text-Based Recommender Systems. 1885-1889 - Gil Rocha, Luís Trigo, Henrique Lopes Cardoso, Rui Sousa-Silva, Paula Carvalho, Bruno Martins, Miguel Won:
Annotating Arguments in a Corpus of Opinion Articles. 1890-1899 - Giuseppe Abrami, Mevlüt Bagci, Leon Hammerla, Alexander Mehler:
German Parliamentary Corpus (GerParCor). 1900-1906 - Attila Novák, Borbála Novák:
NerKor+Cars-OntoNotes++. 1907-1916 - Felix Burkhardt, Anabell Hacker, Uwe Reichel, Hagen Wierstorf, Florian Eyben, Björn W. Schuller:
A Comparative Cross Language View On Acted Databases Portraying Basic Emotions Utilising Machine Learning. 1917-1924 - Felix Burkhardt, Johannes Wagner, Hagen Wierstorf, Florian Eyben, Björn W. Schuller:
Nkululeko: A Tool For Rapid Speaker Characteristics Detection. 1925-1932 - Shi Yu, Clara Ponchard, Roland Trouville, Sergio Hassid, Didier Demolin:
Speech Aerodynamics Database, Tools and Visualisation. 1933-1938 - Cécile Fougeron, Nicolas Audibert, Cédric Gendrot, Estelle Chardenon, Louise Wohmann:
PATATRA and PATAFreq: two French databases for the documentation of within-speaker variability in speech. 1939-1944 - Jonathan Mukiibi, Andrew Katumba, Joyce Nakatumba-Nabende, Ali Hussein, Joshua Meyer:
The Makerere Radio Speech Corpus: A Luganda Radio Corpus for Automatic Speech Recognition. 1945-1954 - Mickael Rouvier, Mohammad MohammadAmini:
Far-Field Speaker Recognition Benchmark Derived From The DiPCo Corpus. 1955-1959 - Siyang Wang, Joakim Gustafson, Éva Székely:
Evaluating Sampling-based Filler Insertion with Spontaneous TTS. 1960-1969 - Péter Mihajlik, András Balog, Tekla Etelka Gráczi, Anna Kohári, Balázs Tarján, Katalin Mády:
BEA-Base: A Benchmark for ASR of Spontaneous Hungarian. 1970-1977 - Emma Barker, Jon Barker, Robert J. Gaizauskas, Ning Ma, Monica Lestari Paramita:
SNuC: The Sheffield Numbers Spoken Language Corpus. 1978-1984 - Liang Zhao, Eleanor Chodroff:
The ManDi Corpus: A Spoken Corpus of Mandarin Regional Dialects. 1985-1990 - Francesca Carbone, Gilles Bouchet, Alain Ghio, Thierry Legou, Carine André, Muriel Lalain, Sabrina Kadri, Caterina Petrone, Federica Procino, Antoine Giovanni:
The Speed-Vel Project: a Corpus of Acoustic and Aerodynamic Data to Measure Droplets Emission During Speech Interaction. 1991-1999 - Annalena Aicher, Alisa Gazizullina, Aleksei Gusev, Yuri Matveev, Wolfgang Minker:
Towards Speech-only Opinion-level Sentiment Analysis. 2000-2006 - Goya van Boven, Stephanie Hirmer, Costanza Conforti:
At the Intersection of NLP and Sustainable Development: Exploring the Impact of Demographic-Aware Text Representations in Modeling Value on a Corpus of Interviews. 2007-2021 - Michael Gref, Nike Matthiesen, Sreenivasa Hikkal Venugopala, Shalaka Satheesh, Aswinkumar Vijayananth, Duc Bach Ha, Sven Behnke, Joachim Köhler:
A Study on the Ambiguity in Human Annotation of German Oral History Interviews for Perceived Emotion Recognition and Sentiment Analysis. 2022-2031 - Stefan Cobeli, Ioan-Bogdan Iordache, Shweta Yadav, Cornelia Caragea, Liviu P. Dinu, Dragos Iliescu:
Detecting Optimism in Tweets using Knowledge Distillation and Linguistic Analysis of Optimism. 2032-2041 - Missaka Herath, Kushan Chamindu, Hashan Maduwantha, Surangika Ranathunga:
Dataset and Baseline for Automatic Student Feedback Analysis. 2042-2049 - Alexey Tikhonov, Alex Malkhasov, Andrey Manoshin, George-Andrei Dima, Réka Cserháti, Md. Sadek Hossain Asif, Matt Sárdi:
EENLP: Cross-lingual Eastern European NLP Index. 2050-2057 - Ales Zagar, Marko Robnik-Sikonja:
Slovene SuperGLUE Benchmark: Translation and Evaluation. 2058-2065 - Marcely Zanon Boito, Fethi Bougares, Florentin Barbier, Souhir Gahbiche, Loïc Barrault, Mickael Rouvier, Yannick Estève:
Speech Resources in the Tamasheq Language. 2066-2071 - Elena Knyazeva, Philippe Boula de Mareüil, Frédéric Vernier:
Aesop's fable "The North Wind and the Sun" Used as a Rosetta Stone to Extract and Map Spoken Words in Under-resourced Languages. 2072-2079 - Chester Palen-Michel, June Kim, Constantine Lignos:
Multilingual Open Text Release 1: Public Domain News in 44 Languages. 2080-2089 - Megan Herrera, Ankit Aich, Natalie Parde:
TweetTaglish: A Dataset for Investigating Tagalog-English Code-Switching. 2090-2097 - Luis Chiruzzo, Santiago Góngora, Aldo Alvarez, Gustavo Giménez Lugo, Marvin M. Agüero-Torales, Yliana Rodríguez:
Jojajovai: A Parallel Guarani-Spanish Corpus for MT Benchmarking. 2098-2107 - Rinalds Viksna, Inguna Skadina, Raivis Skadins, Andrejs Vasiljevs, Roberts Rozis:
Assessing Multilinguality of Publicly Accessible Websites. 2108-2116 - David Kletz, Philippe Langlais, François Lareau, Patrick Drouin:
A Methodology for Building a Diachronic Dataset of Semantic Shifts and its Application to QC-FR-Diac-V1.0, a Free Reference for French. 2117-2125 - Jörg Frohberg, Frank Binder:
CRASS: A Novel Data Set and Benchmark to Test Counterfactual Reasoning of Large Language Models. 2126-2140 - Marta R. Costa-jussà, Christine Basta, Gerard I. Gállego:
Evaluating Gender Bias in Speech Translation. 2141-2147 - Merel C. J. Scholman, Valentina Pyatkin, Frances Yung, Ido Dagan, Reut Tsarfaty, Vera Demberg:
Design Choices in Crowdsourcing Discourse Relation Annotations: The Effect of Worker Selection and Training. 2148-2156 - Hrishikesh Kulkarni, Sean MacAvaney, Nazli Goharian, Ophir Frieder:
TBD3: A Thresholding-Based Dynamic Depression Detection from Social Media for Low-Resource Users. 2157-2165 - Sayontan Ghosh, Amanpreet Singh, Alex Merenstein, Wei Su, Scott A. Smolka, Erez Zadok, Niranjan Balasubramanian:
SpecNFS: A Challenge Dataset Towards Extracting Formal Models from Natural Language Specifications. 2166-2176 - Xiaoyu Bai, Manfred Stede:
Argument Similarity Assessment in German for Intelligent Tutoring: Crowdsourced Dataset and First Experiments. 2177-2187 - Nishtha Jain, Declan Groves, Lucia Specia, Maja Popovic:
Leveraging Pre-trained Language Models for Gender Debiasing. 2188-2195 - Gretel Liz De la Peña Sarracén, Paolo Rosso:
Unsupervised Embeddings with Graph Auto-Encoders for Multi-domain and Multilingual Hate Speech Detection. 2196-2204 - Quentin Heinrich, Gautier Viaud, Wacim Belblidia:
FQuAD2.0: French Question Answering and Learning When You Don't Know. 2205-2214 - Cagri Toraman, Furkan Sahinuç, Eyup Halit Yilmaz:
Large-Scale Hate Speech Detection with Cross-Domain Transfer. 2215-2225 - Selina Meyer, David Elsweiler:
GLoHBCD: A Naturalistic German Dataset for Language of Health Behaviour Change on Online Support Forums. 2226-2235 - Iris Hendrickx:
Creating a Data Set of Abstractive Summaries of Turn-labeled Spoken Human-Computer Conversations. 2236-2244 - Wen Cui, Leanne Rolston, Marilyn A. Walker, Beth Ann Hockey:
OpenEL: An Annotated Corpus for Entity Linking and Discourse in Open Domain Dialogue. 2245-2256 - Bram Willemsen, Dmytro Kalpakchi, Gabriel Skantze:
Collecting Visually-Grounded Dialogue with A Game Of Sorts. 2257-2268 - Diana Constantina Hoefels, Çagri Çöltekin, Irina Diana Madroane:
CoRoSeOf - An Annotated Corpus of Romanian Sexist and Offensive Tweets. 2269-2281 - Dina Almanea, Massimo Poesio:
ArMIS - The Arabic Misogyny and Sexism Corpus with Annotator Subjective Disagreements. 2282-2291 - Liu Yang, Catherine Achard, Catherine Pelachaud:
Annotating Interruption in Dyadic Human Interaction. 2292-2297 - Fiona Anting Tan, Ali Hürriyetoglu, Tommaso Caselli, Nelleke Oostdijk, Tadashi Nomoto, Hansi Hettiarachchi, Iqra Ameer, Onur Uca, Farhana Ferdousi Liza, Tiancheng Hu:
The Causal News Corpus: Annotating Causal Relations in Event Sentences from News. 2298-2310 - Staffan Hedström, David Erik Mollberg, Ragnheiðhur Thórhallsdóttir, Jón Guðhnason:
Samrómur: Crowd-sourcing large amounts of data. 2311-2316 - Roland Roller, Aljoscha Burchardt, Nils Feldhus, Laura Seiffe, Klemens Budde, Simon Ronicke, Bilgin Osmanodja:
An Annotated Corpus of Textual Explanations for Clinical Decision Support. 2317-2326 - Tatiana Passali, Thanassis Mavropoulos, Grigorios Tsoumakas, Georgios Meditskos, Stefanos Vrochidis:
LARD: Large-scale Artificial Disfluency Generation. 2327-2336 - Yuru Jiang, Yang Xu, Yuhang Zhan, Weikai He, Yilin Wang, Zixuan Xi, Meiyun Wang, Xinyu Li, Yu Li, Yanchao Yu:
The CRECIL Corpus: a New Dataset for Extraction of Relations between Characters in Chinese Multi-party Dialogues. 2337-2344 - Dana Abdulrahim, Go Inoue, Latifa Shamsan, Salam Khalifa, Nizar Habash:
The Bahrain Corpus: A Multi-genre Corpus of Bahraini Arabic. 2345-2352 - Daniel G. Swanson, Francis M. Tyers:
A Universal Dependencies Treebank of Ancient Hebrew. 2353-2361 - Paula Carvalho, Bernardo Cunha Matos, Raquel Bento Santos, Fernando Batista, Ricardo Ribeiro:
Hate Speech Dynamics Against African descent, Roma and LGBTQI Communities in Portugal. 2362-2370 - Starkaður Barkarson, Steinthór Steingrímsson, Hildur Hafsteinsdóttir:
Evolving Large Text Corpora: Four Versions of the Icelandic Gigaword Corpus. 2371-2381 - Damien Sileo, Philippe Muller, Tim Van de Cruys, Camille Pradel:
A Pragmatics-Centered Evaluation Framework for Natural Language Understanding. 2382-2394 - Chandrakant Bothe, Stefan Wermter:
Conversational Analysis of Daily Dialog Data using Polite Emotional Dialogue Acts. 2395-2400 - Christian Chiarcos:
Inducing Discourse Marker Inventories from Lexical Knowledge Graphs. 2401-2412 - Pantea Haghighatkhah, Antske Fokkens, Pia Sommerauer, Bettina Speckmann, Kevin Verbeek:
Story Trees: Representing Documents using Topological Persistence. 2413-2429 - Ana Zwitter Vitez, Mojca Brglez, Marko Robnik-Sikonja, Tadej Skvorc, Andreja Vezovnik, Senja Pollak:
Extracting and Analysing Metaphors in Migration Media Discourse: towards a Metaphor Annotation Scheme. 2430-2439 - Linea Flansmose Mikkelsen, Oliver Kinch, Anders Jess Pedersen, Ophélie Lacroix:
DDisCo: A Discourse Coherence Dataset for Danish. 2440-2445 - Farjana Sultana Mim, Naoya Inoue, Shoichi Naito, Keshav Singh, Kentaro Inui:
LPAttack: A Feasible Annotation Scheme for Capturing Logic Pattern of Attacks in Arguments. 2446-2459 - Jennifer Tracey, Owen Rambow, Claire Cardie, Adam Dalton, Hoa Trang Dang, Mona T. Diab, Bonnie J. Dorr, Louise Guthrie, Magdalena Markowska, Smaranda Muresan, Vinodkumar Prabhakaran, Samira Shaikh, Tomek Strzalkowski:
BeSt: The Belief and Sentiment Corpus. 2460-2467 - Xintong Wang, Florian Schneider, Özge Alaçam, Prateek Chaudhury, Chris Biemann:
MOTIF: Contextualized Images for Complex Words to Improve Human Reading. 2468-2477 - Mirella De Sisto, Vincent Vandeghinste, Santiago Egea Gómez, Mathieu De Coster, Dimitar Shterionov, Horacio Saggion:
Challenges with Sign Language Datasets for Sign Language Recognition and Translation. 2478-2487 - Clémence Mertz, Vincent Barreaud, Thibaut Le Naour, Damien Lolive, Sylvie Gibet:
A Low-Cost Motion Capture Corpus in French Sign Language for Interpreting Iconicity and Spatial Referencing Mechanisms. 2488-2497 - Marc Verhagen, Kelley Lynch, Kyeongmin Rim, James Pustejovsky:
The CLAMS Platform at Work: Processing Audiovisual Data from the American Archive of Public Broadcasting. 2498-2506 - Carley Reardon, Sejin Paik, Ge Gao, Meet Parekh, Yanling Zhao, Lei Guo, Margrit Betke, Derry Tanti Wijaya:
BU-NEmo: an Affective Dataset of Gun Violence News. 2507-2516 - Justine Reverdy, Sam O'Connor Russell, Louise Duquenne, Diego Garaialde, Benjamin R. Cowan, Naomi Harte:
RoomReader: A Multimodal Corpus of Online Multiparty Conversational Interactions. 2517-2527 - Antonio F. G. Sevilla, Alberto Díaz Esteban, José María Lahoz-Bengoechea:
Quevedo: Annotation and Processing of Graphical Languages. 2528-2535 - Debjoy Saha, Shravan Nayak, Timo Baumann:
Merkel Podcast Corpus: A Multimodal Dataset Compiled from 16 Years of Angela Merkel's Weekly Video Podcasts. 2536-2540 - Medet Mukushev, Aigerim Kydyrbekova, Alfarabi Imashev, Vadim Kimmelman, Anara Sandygulova:
Crowdsourcing Kazakh-Russian Sign Language: FluentSigners-50. 2541-2547 - Pierre Nugues:
Connecting a French Dictionary from the Beginning of the 20th Century to Wikidata. 2548-2555 - Markus Egg, Valia Kordoni:
Metaphor annotation for German. 2556-2562 - Andrey Kutuzov, Samia Touileb, Petter Mæhlum, Tita Ranveig Enstad, Alexandra Wittemann:
NorDiaChange: Diachronic Semantic Change Dataset for Norwegian. 2563-2572 - Hugo Gonçalo Oliveira:
Exploring Transformers for Ranking Portuguese Semantic Relations. 2573-2582 - Olivier Ferret:
Building Static Embeddings from Contextual Ones: Is It Useful for Building Distributional Thesauri? 2583-2590 - Yixiao Wang, Zied Bouraoui, Luis Espinosa Anke, Steven Schockaert:
Sentence Selection Strategies for Distilling Word Embeddings from BERT. 2591-2600 - Gioia Baldissin, Dominik Schlechtweg, Sabine Schulte im Walde:
DiaWUG: A Dataset for Diatopic Lexical Semantic Variation in Spanish. 2601-2609 - Daniel Chen, Mans Hulden:
My Case, For an Adposition: Lexical Polysemy of Adpositions and Case Markers in Finnish and Latin. 2610-2616 - Anna Breit, Artem Revenko, Narayani Blaschke:
WiC-TSV-de: German Word-in-Context Target-Sense-Verification Dataset and Cross-Lingual Transfer Analysis. 2617-2625 - Hicham El Boukkouri, Olivier Ferret, Thomas Lavergne, Pierre Zweigenbaum:
Re-train or Train from Scratch? Comparing Pre-training Strategies of BERT in the Medical Domain. 2626-2633 - Riccardo Orlando, Simone Conia, Stefano Faralli, Roberto Navigli:
Universal Semantic Annotator: the First Unified API for WSD, SRL and Semantic Parsing. 2634-2641 - Jan Philip Wahle, Terry Ruas, Saif M. Mohammad, Bela Gipp:
D3: A Massive Dataset of Scholarly Metadata for Analyzing the State of Computer Science Research. 2642-2651 - Dimitrios Roussis, Vassilis Papavassiliou, Prokopis Prokopidis, Stelios Piperidis, Vassilis Katsouros:
SciPar: A Collection of Parallel Corpora from Scientific Abstracts. 2652-2657 - Martha Gavidia, Patrick Lee, Anna Feldman, Jing Peng:
CATs are Fuzzy PETs: A Corpus and Analysis of Potentially Euphemistic Terms. 2658-2671 - Nizar Habash, Muhammed AbuOdeh, Dima Taji, Reem Faraj, Jamila El Gizuli, Omar Kallas:
Camel Treebank: An Open Multi-genre Arabic Dependency Treebank. 2672-2681 - Sajad Sotudeh, Nazli Goharian, Zachary Young:
MentSum: A Resource for Exploring Summarization of Mental Health Online Posts. 2682-2692 - Dennis Aumiller, Michael Gertz:
Klexikon: A German Dataset for Joint Summarization and Simplification. 2693-2701 - Philipp Hartl, Udo Kruschwitz:
Applying Automatic Text Summarization for Fake News Detection. 2702-2713 - Nino Meisinger, Thorsten Trippel, Claus Zinn:
Increasing CMDI's Semantic Interoperability with schema.org. 2714-2720 - Herbert Lange, Jocelyn Aznar:
RefCo and its Checker: Improving Language Documentation Corpora's Reusability Through a Semi-Automatic Review Process. 2721-2729 - Gábor Simon:
Identification and Analysis of Personification in Hungarian: The PerSECorp project. 2730-2738 - Purificação Silvano, Mariana Damova, Giedre Valunaite Oleskeviciene, Chaya Liebeskind, Christian Chiarcos, Dimitar Trajanov, Ciprian-Octavian Truica, Elena Simona Apostol, Anna Baczkowska:
ISO-based Annotated Multilingual Parallel Corpus for Discourse Markers. 2739-2749 - David Gimeno-Gómez, Carlos D. Martínez-Hinarejos:
LIP-RTVE: An Audiovisual Database for Continuous Spanish in the Wild. 2750-2758 - Hyeongu Yun, Yongil Kim, Kyomin Jung:
Modality Alignment between Deep Representations for Effective Video-and-Language Learning. 2759-2770 - Anais Murat, Maria Koutsombogera, Carl Vogel:
Mutual Gaze and Linguistic Repetition in a Multimodal Corpus. 2771-2780 - Christophe Parisse, Marion Blondel, Stéphanie Caët, Claire Danet, Coralie Vincent, Aliyah Morgenstern:
Multidimensional Coding of Multimodal Languaging in Multi-Party Settings. 2781-2787 - Lukás Kyjánek, Olga Lyashevskaya, Anna Nedoluzhko, Daniil Vodolazsky, Zdenek Zabokrtský:
Constructing a Lexical Resource of Russian Derivational Morphology. 2788-2797 - Temuulen Khishigsuren, Gábor Bella, Khuyagbaatar Batsuren, Abed Alhakim Freihat, Nandu Chandran Nair, Amarsanaa Ganbold, Hadi Khalilia, Yamini Chandrashekar, Fausto Giunchiglia:
Using Linguistic Typology to Enrich Multilingual Lexicons: the Case of Lexical Gaps in Kinship. 2798-2807 - Peteris Paikens, Mikus Grasmanis, Agute Klints, Ilze Lokmane, Lauma Pretkalnina, Laura Rituma, Madara Stade, Laine Strankale:
Towards Latvian WordNet. 2808-2815 - Peng Liu, Cristina Marco, Jon Atle Gulla:
Building Sentiment Lexicons for Mainland Scandinavian Languages Using Machine Translation and Sentence Embeddings. 2816-2825 - Sanni Nimb, Sussi Olsen, Bolette S. Pedersen, Thomas Troelsgård:
A Thesaurus-based Sentiment Lexicon for Danish: The Danish Sentiment Lexicon. 2826-2832 - Nandu Chandran Nair, Rajendran Sankara Velayuthan, Yamini Chandrashekar, Gábor Bella, Fausto Giunchiglia:
IndoUKC: A Concept-Centered Indian Multilingual Lexical Resource. 2833-2840 - Hyeondey Kim, Seonhoon Kim, Inho Kang, Nojun Kwak, Pascale Fung:
Korean Language Modeling via Syntactic Guide. 2841-2849 - Ayah Zirikly, Bart Desmet, Julia Porcino, Jonathan Camacho Maldonado, Pei-Shu Ho, Rafael Jiménez Silva, Maryanne Sacco:
A Whole-Person Function Dictionary for the Mobility, Self-Care and Domestic Life Domains: a Seedset Expansion Approach. 2850-2855 - Voula Giouli, Anna Vacalopoulou, Nikolaos Sidiropoulos, Christina Flouda, Athanasios Doupas, Giorgos Giannopoulos, Nikos Bikakis, Vassilis Kaffes, Gregory Stainhaouer:
Placing multi-modal, and multi-lingual Data in the Humanities Domain on the Map: the Mythotopia Geo-tagged Corpus. 2856-2864 - Kazushi Ohya:
An Architecture of resolving a multiple link path in a standoff-style data format to enhance the mobility of language resources. 2865-2873 - Julia Romberg, Laura Mark, Tobias Escher:
A Corpus of German Citizen Contributions in Mobility Planning: Supporting Evaluation Through Multidimensional Classification. 2874-2883 - Jakob Lesage, Hannah J. Haynie, Hedvig Skirgård, Tobias Weber, Alena Witzlack-Makarevich:
Overlooked Data in Typological Databases: What Grambank Teaches Us About Gaps in Grammars. 2884-2890 - Arya D. McCarthy, Giovanna Maria Dora Dore:
Hong Kong: Longitudinal and Synchronic Characterisations of Protest News between 1998 and 2020. 2891-2900 - Martin Volk, Lukas Fischer, Patricia Scheurer, Bernard Silvan Schroffenegger, Raphael Schwitter, Phillip Ströbel, Benjamin Suter:
Nunc profana tractemus. Detecting Code-Switching in a Large Corpus of 16th Century Letters. 2901-2908 - Marie Mikulová, Milan Straka, Jan Stepánek, Barbora Stepánková, Jan Hajic:
Quality and Efficiency of Manual Annotation: Pre-annotation Bias. 2909-2918 - Mustafa Ocal, Antonela Radas, Jared Hummer, Karine Megerdoomian, Mark A. Finlayson:
A Comprehensive Evaluation and Correction of the TimeBank Corpus. 2919-2927 - Rocco Tripodi, Rexhina Blloshmi, Simon Levis Sullam:
Evaluating Multilingual Sentence Representation Models in a Real Case Scenario. 2928-2939 - Anaëlle Baledent, Yann Mathet, Antoine Widlöcher, Christophe Couronne, Jean-Luc Manguin:
Validity, Agreement, Consensuality and Annotated Data Quality. 2940-2948 - Salima Mdhaffar, Valentin Pelloin, Antoine Caubrière, Gaëlle Laperrière, Sahar Ghannay, Bassam Jabaian, Nathalie Camelin, Yannick Estève:
Impact Analysis of the Use of Speech and Language Models Pretrained by Self-Supersivion for Spoken Language Understanding. 2949-2956 - Kentaro Kurihara, Daisuke Kawahara, Tomohide Shibata:
JGLUE: Japanese General Language Understanding Evaluation. 2957-2966 - Elham Akhlaghi, Ingibjörg Iðha Auðhunardóttir, Anna Baczkowska, Branislav Bédi, Hakeem Beedar, Harald Berthelsen, Cathy Chua, Catia Cucchiarini, Hanieh Habibi, Ivana Horváthová, Junta Ikeda, Christèle Maizonniaux, Neasa Ní Chiaráin, Chadi Raheb, Manny Rayner, John Sloan, Nikos Tsourakis, Chunlin Yao:
Using the LARA Little Prince to compare human and TTS audio quality. 2967-2975 - Chris Emmery, Ákos Kádár, Grzegorz Chrupala, Walter Daelemans:
Cyberbullying Classifiers are Sensitive to Model-Agnostic Perturbations. 2976-2988 - T. Mark Ellison, Fahime Same:
Constructing Distributions of Variation in Referring Expression Type from Corpora for Model Evaluation. 2989-2997 - Aleksandr Perevalov, Xi Yan, Liubov Kovriguina, Longquan Jiang, Andreas Both, Ricardo Usbeck:
Knowledge Graph Question Answering Leaderboard: A Community Resource to Prevent a Replication Crisis. 2998-3007 - Sho Takase, Naoaki Okazaki:
Multi-Task Learning for Cross-Lingual Abstractive Summarization. 3008-3016 - Sheila Castilho:
How Much Context Span is Enough? Examining Context-Related Issues for Document-level MT. 3017-3025 - Harritxu Gete, Thierry Etchegoyhen, David Ponce, Gorka Labaka, Nora Aranberri, Ander Corral, Xabier Saralegi, Igor Ellakuria, Maite Martín:
TANDO: A Corpus for Document-level Machine Translation. 3026-3037 - Ona de Gibert Bonet, Iakes Goenaga, Jordi Armengol-Estapé, Olatz Perez-de-Viñaspre, Carla Parra Escartín, Marina Sanchez, Marcis Pinnis, Gorka Labaka, Maite Melero:
Unsupervised Machine Translation in Real-World Scenarios. 3038-3047 - Mana Ashida, Jin-Dong Kim, Seunghun Lee:
COVID-19 Mythbusters in World Languages. 3048-3055 - Jordi Armengol-Estapé, Ona de Gibert Bonet, Maite Melero:
On the Multilingual Capabilities of Very Large-Scale English Language Models. 3056-3068 - Alina Karakanta, François Buet, Mauro Cettolo, François Yvon:
Evaluating Subtitle Segmentation for End-to-end Generation Systems. 3069-3078 - Reinhard Rapp:
Using Semantic Role Labeling to Improve Neural Machine Translation. 3079-3083 - Dibyanayan Bandyopadhyay, Arkadipta De, Baban Gain, Tanik Saikh, Asif Ekbal:
A Deep Transfer Learning Method for Cross-Lingual Natural Language Inference. 3084-3092 - Matthew Shardlow, Fernando Alva-Manchego:
Simple TICO-19: A Dataset for Joint Translation and Simplification of COVID-19 Texts. 3093-3102 - Omar Adjali, Emmanuel Morin, Pierre Zweigenbaum:
Building Comparable Corpora for Assessing Multi-Word Term Alignment. 3103-3112 - Agnes Sólmundsdóttir, Dagbjört Guðhmundsdóttir, Lilja Björk Stefánsdóttir, Anton Ingason:
Mean Machine Translations: On Gender Bias in Icelandic Machine Translations. 3113-3121 - Ayesha Enayet, Gita Sukthankar:
An Analysis of Dialogue Act Sequence Similarity Across Multiple Domains. 3122-3130 - Taro Okahisa, Ribeka Tanaka, Takashi Kodama, Yin Jou Huang, Sadao Kurohashi:
Constructing a Culinary Interview Dialogue Corpus with Video Conferencing Tool. 3131-3139 - Zulipiye Yusupujiang, Jonathan Ginzburg:
UgChDial: A Uyghur Chat-based Dialogue Corpus for Response Space Classification. 3140-3149 - Saki Sudo, Kyoshiro Asano, Koh Mitsuda, Ryuichiro Higashinaka, Yugo Takeuchi:
A Speculative and Tentative Common Ground Handling for Efficient Composition of Uncertain Dialogue. 3150-3157 - Maia Aguirre, Laura García-Sardiña, Manex Serras, Ariane Méndez, Jacobo López:
BaSCo: An Annotated Basque-Spanish Code-Switching Corpus for Natural Language Understanding. 3158-3163 - Matthias Kraus, Nicolas Wagner, Wolfgang Minker:
ProDial - An Annotated Proactive Dialogue Act Corpus for Conversational Assistants using Crowdsourcing. 3164-3173 - Anna Nedoluzhko, Muskaan Singh, Marie Hledíková, Tirthankar Ghosal, Ondrej Bojar:
ELITR Minuting Corpus: A Novel Dataset for Automatic Minuting from Multi-Party Meetings in English and Czech. 3174-3182 - Kathleen C. Fraser, Svetlana Kiritchenko, Isar Nejadgholi:
Extracting Age-Related Stereotypes from Social Media Texts. 3183-3194 - Elena Álvarez Mellado, Constantine Lignos:
Borrowing or Codeswitching? Annotating for Finer-Grained Distinctions in Language Mixing. 3195-3201 - Ana Sabina Uban, Berta Chulvi, Paolo Rosso:
Multi-Aspect Transfer Learning for Detecting Low Resource Mental Disorders on Social Media. 3202-3219 - Hamdy Mubarak, Sabit Hassan, Shammur Absar Chowdhury, Firoj Alam:
ArCovidVac: Analyzing Arabic Tweets About COVID-19 Vaccination. 3220-3230 - Flora Sakketou, Joan Plepi, Riccardo Cervero, Henri-Jacques Geiss, Paolo Rosso, Lucie Flek:
FACTOID: A New Dataset for Identifying Misinformation Spreaders and Political Bias. 3231-3241 - Julia Pritzen, Michael Gref, Dietlind Zühlke, Christoph Andreas Schmidt:
Multitask Learning for Grapheme-to-Phoneme Conversion of Anglicisms in German Speech Recognition. 3242-3249 - Michel Plüss, Manuela Hürlimann, Marc Cuny, Alla Stöckli, Nikolaos Kapotis, Julia Hartmann, Malgorzata Anna Ulasik, Christian Scheller, Yanick Schraner, Amit Jain, Jan Deriu, Mark Cieliebak, Manfred Vogel:
SDS-200: A Swiss German Speech to Standard German Text Corpus. 3250-3256 - Yaru Wu, Mathilde Hutin, Ioana Vasilescu, Lori Lamel, Martine Adda-Decker:
Extracting Linguistic Knowledge from Speech: A Study of Stop Realization in 5 Romance Languages. 3257-3263 - Martin Lebourdais, Marie Tahon, Antoine Laurent, Sylvain Meignier, Anthony Larcher:
Overlaps and Gender Analysis in the Context of Broadcast Media. 3264-3270 - Rémi Uro, David Doukhan, Albert Rilliard, Laetitia Larcher, Anissa-Claire Adgharouamane, Marie Tahon, Antoine Laurent:
A Semi-Automatic Approach to Create Large Gender- and Age-Balanced Speaker Corpora: Usefulness of Speaker Diarization & Identification. 3271-3280 - Merel C. J. Scholman, Tianai Dong, Frances Yung, Vera Demberg:
DiscoGeM: A Crowdsourced Corpus of Genre-Mixed Implicit Discourse Relations. 3281-3290 - Annette Hautli-Janisz, Zlata Kikteva, Wassiliki Siskou, Kamila Gorska, Ray Becker, Chris Reed:
QT30: A Corpus of Argument and Conflict in Broadcast Debate. 3291-3300 - Neele Falk, Gabriella Lapesa:
Scaling up Discourse Quality Annotation for Political Science. 3301-3318 - Talita Anthonio, Anna Sauer, Michael Roth:
Clarifying Implicit and Underspecified Phrases in Instructional Text. 3319-3330 - Anton Buzanov, Polina Bychkova, Arina Molchanova, Anna Postnikova, Daria Ryzhova:
Multilingual Pragmaticon: Database of Discourse Formulae. 3331-3336 - Ranka Stankovic, Cvetana Krstev, Branislava Sandrih Todorovic, Dusko Vitas, Mihailo Skoric, Milica Ikonic Nesic:
Distant Reading in Digital Humanities: Case Study on the Serbian Part of the ELTeC Collection. 3337-3345 - Nils Reiter, Judith Sieker, Svenja Guhr, Evelyn Gius, Sina Zarrieß:
Exploring Text Recombination for Automatic Narrative Level Detection. 3346-3353 - Rachel Bawden, Jonathan Poinhos, Eleni Kogkitsidou, Philippe Gambette, Benoît Sagot, Simon Gabay:
Automatic Normalisation of Early Modern French. 3354-3366 - Simon Gabay, Pedro Ortiz Suarez, Alexandre Bartz, Alix Chagué, Rachel Bawden, Philippe Gambette, Benoît Sagot:
From FreEM to D'AlemBERT: a Large Corpus and a Language Model for Early Modern French. 3367-3374 - Nuette Heyns, Menno van Zaanen:
Detecting Multiple Transitions in Literary Texts. 3375-3381 - Nayla Escribano, Jon Ander González, Julen Orbegozo-Terradillos, Ainara Larrondo-Ureta, Simón Peña-Fernández, Olatz Perez-de-Viñaspre, Rodrigo Agerri:
BasqueParl: A Bilingual Corpus of Basque Parliamentary Transcriptions. 3382-3390 - Johanna Marie Poppek, Simon Masloch, Tibor Kiss:
GerEO: A Large-Scale Resource on the Syntactic Distribution of German Experiencer-Object Verbs. 3391-3397 - Suchetha Nambanoor Kunnath, Valentin Stauber, Ronin Wu, David Pride, Viktor Botev, Petr Knoth:
ACT2: A multi-disciplinary semi-structured dataset for importance and purpose classification of citations. 3398-3406 - Harry Bunt, Maxime Amblard, Johan Bos, Karën Fort, Bruno Guillaume, Philippe de Groote, Chuyuan Li, Pierre Ludmann, Michel Musiol, Siyana Pavlova, Guy Perrier, Sylvain Pogodalla:
Quantification Annotation in ISO 24617-12, Second Draft. 3407-3416 - Vandan Mujadia, Dipti Misra Sharma:
The LTRC Hindi-Telugu Parallel Corpus. 3417-3424 - Priya Rani, John P. McCrae, Theodorus Fransen:
MHE: Code-Mixed Corpora for Similar Language Identification. 3425-3433 - Paul Lerner, Juliette Bergoënd, Camille Guinaudeau, Hervé Bredin, Benjamin Maurice, Sharleyne Lefevre, Martin Bouteiller, Aman Berhe, Léo Galmant, Ruiqing Yin, Claude Barras:
Bazinga! A Dataset for Multi-Party Dialogues Structuring. 3434-3441 - Alexandros Fotios Ntogramatzis, Anna Gradou, Georgios Petasis, Marko Kokol:
The Ellogon Web Annotation Tool: Annotating Moral Values and Arguments. 3442-3450 - Karen Jones, Kevin Walker, Christopher Caruso, Jonathan Wright, Stephanie M. Strassel:
WeCanTalk: A New Multi-language, Multi-modal Resource for Speaker Recognition. 3451-3456 - Lenka Bajcetic, Thierry Declerck:
Using Wiktionary to Create Specialized Lexical Resources and Datasets. 3457-3460 - Nan Zhang, Shomir Wilson, Prasenjit Mitra:
STAPI: An Automatic Scraper for Extracting Iterative Title-Text Structure from Web Documents. 3461-3470 - Péter Horváth, Péter Kundráth, Balázs Indig, Zsófia Fellegi, Eszter Szlávich, Tímea Borbála Bajzát, Zsófia Sárközi-Lindner, Bence Vida, Aslihan Karabulut, Mária Timári, Gábor Palkó:
ELTE Poetry Corpus: A Machine Annotated Database of Canonical Hungarian Poetry. 3471-3478 - Harshita Sharma, Pruthwik Mishra, Dipti Misra Sharma:
HAWP: a Dataset for Hindi Arithmetic Word Problem Solving. 3479-3490 - Petya Osenova, Kiril Simov, Iva Marinova, Melania Berbatova:
The Bulgarian Event Corpus: Overview and Initial NER Experiments. 3491-3499 - Bingsheng Yao, Ethan Joseph, Julian Lioanag, Mei Si:
A Corpus for Commonsense Inference in Story Cloze Test. 3500-3508 - Ariel Ekgren, Amaru Cuba Gyllensten, Evangelia Gogoulou, Alice Heiman, Severine Verlinden, Joey Öhman, Fredrik Carlsson, Magnus Sahlgren:
Lessons Learned from GPT-SW3: Building the First Large-Scale Generative Language Model for Swedish. 3509-3518 - Andrei Popescu-Belis, Àlex R. Atrio, Valentin Minder, Aris Xanthos, Gabriel Luthier, Simon Mattei, Antonio Rodriguez:
Constrained Language Models for Interactive Poem Generation. 3519-3529 - Huije Lee, Young Ju Na, Hoyun Song, Jisu Shin, Jong C. Park:
ELF22: A Context-based Counter Trolling Dataset to Combat Internet Trolls. 3530-3541 - Isaac Ampomah, James Burton, Amir Enshaei, Noura Al Moubayed:
Generating Textual Explanations for Machine Learning Models Performance: A Table-to-Text Task. 3542-3551 - Iza Skrjanec, Muhammad Salman Edhi, Vera Demberg:
Barch: an English Dataset of Bar Chart Summaries. 3552-3560 - Matej Martinc, Syrielle Montariol, Lidia Pivovarova, Elaine Zosa:
Effectiveness of Data Augmentation and Pretraining for Improving Neural Headline Generation in Low-Resource Settings. 3561-3570 - Yongxin Zhou, François Portet, Fabien Ringeval:
Effectiveness of French Language Models on Abstractive Dialogue Summarization Task. 3571-3581 - Daniel Ferrés, Horacio Saggion:
ALEXSIS: A Dataset for Lexical Simplification in Spanish. 3582-3594 - Timothy Mckinnon, Carl Rubino:
The IARPA BETTER Program Abstract Task Four New Semantically Annotated Corpora from IARPA's BETTER Program. 3595-3600 - Uyen Phan, Phuong N. V. Nguyen, Nhung Nguyen:
A Named Entity Recognition Corpus for Vietnamese Biomedical Texts to Support Tuberculosis Treatment. 3601-3609 - Erick Mendez Guzman, Viktor Schlegel, Riza Batista-Navarro:
RaFoLa: A Rationale-Annotated Corpus for Detecting Indicators of Forced Labour. 3610-3625 - Mustafa Jarrar, Mohammed Khalilia, Sana Ghanem:
Wojood: Nested Arabic Named Entity Corpus and Recognition using BERT. 3626-3636 - Lisa Raithel, Philippe Thomas, Roland Roller, Oliver Sapina, Sebastian Möller, Pierre Zweigenbaum:
Cross-lingual Approaches for the Detection of Adverse Drug Reactions in German from a Patient's Perspective. 3637-3649 - Florian Borchert, Christina Lohr, Luise Modersohn, Jonas Witt, Thomas Langer, Markus Follmann, Matthias Gietzelt, Bert Arnrich, Udo Hahn, Matthieu-P. Schapranow:
GGPONC 2.0 - The German Clinical Guideline Corpus for Oncology: Curation Workflow, Annotation Policy, Baseline NER Taggers. 3650-3660 - Elena Zotova, Montse Cuadros, German Rigau:
ClinIDMap: Towards a Clinical IDs Mapping for Data Interoperability. 3661-3669 - Corina Ceausu, Sergiu Nisioi:
Identifying Draft Bills Impacting Existing Legislation: a Case Study on Romanian. 3670-3674 - George Thomas Hudson, Noura Al Moubayed:
MuLD: The Multitask Long Document Benchmark. 3675-3685 - Surabhi Datta, Hio Cheng Lam, Atieh Pajouhi, Sunitha Mogalla, Kirk Roberts:
A Cross-document Coreference Dataset for Longitudinal Tracking across Radiology Reports. 3686-3695 - Hadjer Khaldi, Farah Benamara, Camille Pradel, Grégoire Sigel, Nathalie Aussenac-Gilles:
How's Business Going Worldwide ? A Multilingual Annotated Corpus for Business Relation Extraction. 3696-3705 - Mahdi Rahimi, Mihai Surdeanu:
Do Transformer Networks Improve the Discovery of Rules from Text? 3706-3714 - Marina Litvak, Natalia Vanetik, Chaya Liebeskind, Omar Hmdia, Rizek Abu Madeghem:
Offensive language detection in Hebrew: can other languages help? 3715-3723 - Fei Cheng, Shuntaro Yada, Ribeka Tanaka, Eiji Aramaki, Sadao Kurohashi:
JaMIE: A Pipeline Japanese Medical Information Extraction System with Novel Relation Annotation. 3724-3731 - Michael Strobl, Amine Trabelsi, Osmar Zaïane:
Enhanced Entity Annotations for Multilingual Corpora. 3732-3740 - Edmond Odhiambo Menya, Mathieu Roche, Roberto Interdonato, Dickson Owuor:
Enriching Epidemiological Thematic Features For Disease Surveillance Corpora Classification. 3741-3750 - Ona de Gibert Bonet, Aitor García Pablos, Montse Cuadros, Maite Melero:
Spanish Datasets for Sensitive Entity Detection in the Legal Domain. 3751-3760 - Bimal Bhattarai, Ole-Christoffer Granmo, Lei Jiao:
ConvTextTM: An Explainable Convolutional Tsetlin Machine Framework for Text Classification. 3761-3770 - Meriem Beloucif, Seid Muhie Yimam, Steffen Stahlhacke, Chris Biemann:
Elvis vs. M. Jackson: Who has More Albums? Classification and Identification of Elements in Comparative Questions. 3771-3779 - Hui-Syuan Yeh, Thomas Lavergne, Pierre Zweigenbaum:
Decorate the Examples: A Simple Method of Prompt Design for Biomedical Relation Extraction. 3780-3787 - Rositsa V. Ivanova, Marieke van Erp, Sabrina Kirrane:
Comparing Annotated Datasets for Named Entity Recognition in English Literature. 3788-3797 - Flora Sakketou, Allison Lahnala, Liane Vogel, Lucie Flek:
Investigating User Radicalization: A Novel Dataset for Identifying Fine-Grained Temporal Shifts in Opinion. 3798-3808 - Marco Antonio Stranisci, Simona Frenda, Eleonora Ceccaldi, Valerio Basile, Rossana Damiano, Viviana Patti:
APPReddit: a Corpus of Reddit Posts Annotated for Appraisal. 3809-3818 - Mateus Tarcinalli Machado, Thiago Alexandre Salgueiro Pardo:
Evaluating Methods for Extraction of Aspect Terms in Opinion Texts in Portuguese - the Challenges of Implicit Aspects. 3819-3828 - Erik Cambria, Qian Liu, Sergio Decherchi, Frank Xing, Kenneth Kwok:
SenticNet 7: A Commonsense-based Neurosymbolic AI Framework for Explainable Sentiment Analysis. 3829-3839 - Roberto Zariquiey, Claudia Alvarado, Ximena Echevarría, Luisa Gomez, Rosa Gonzales, Mariana Illescas, Sabina Oporto, Frederic Blum, Arturo Oncevay, Javier Vera:
Building an Endangered Language Resource in the Classroom: Universal Dependencies for Kakataibo. 3840-3851 - Per Egil Kummervold, Freddy Wetjen, Javier de la Rosa:
The Norwegian Colossal Corpus: A Text Corpus for Training Large Norwegian Language Models. 3852-3860 - Ligeia Lugli, Matej Martinc, Andraz Pelicon, Senja Pollak:
Embeddings models for Buddhist Sanskrit. 3861-3871 - Rolando Coto-Solano, Sally Akevai Nicholas, Samiha Datta, Victoria Quint, Piripi Wills, Emma Ngakuravaru Powell, Liam Koka'ua, Syed Tanveer, Isaac Feldman:
Development of Automatic Speech Recognition for the Documentation of Cook Islands Māori. 3872-3882 - Gregor Wiedemann, Jan Matti Dollbaum, Sebastian Haunss, Priska Daphi, Larissa Daria Meier:
A Generalized Approach to Protest Event Detection in German Local News. 3883-3891 - Ann-Sophie Gnehm, Eva Bühlmann, Simon Clematide:
Evaluation of Transfer Learning and Domain Adaptation for Analyzing German-Speaking Job Advertisements. 3892-3901 - Carla Pérez-Almendros, Luis Espinosa Anke, Steven Schockaert:
Pre-Training Language Models for Identifying Patronizing and Condescending Language: An Analysis. 3902-3911 - Tommi Jauhiainen, Heidi Jauhiainen, Krister Lindén:
HeLI-OTS, Off-the-shelf Language Identifier for Text. 3912-3922 - Silvia Severini, Ayyoob Imani, Philipp Dufter, Hinrich Schütze:
Towards a Broad Coverage Named Entity Resource: A Data-Efficient Approach for Many Diverse Languages. 3923-3933 - Anas Fahad Khan, Francisco J. Minaya Gómez, Rafael Cruz González, Harry Diakoff, Javier E. Díaz-Vera, John P. McCrae, Ciara O'Loughlin, William Michael Short, Sander Stolk:
Towards the Construction of a WordNet for Old English. 3934-3941 - Eckhard Bick:
A Framenet and Frame Annotator for German Social Media. 3942-3949 - Marco Bombieri, Marco Rospocher, Simone Paolo Ponzetto, Paolo Fiorini:
The Robotic Surgery Procedural Framebank. 3950-3959 - Jennifer Weber, Eliana Colunga:
Representing the Toddler Lexicon: Do the Corpus and Semantics Matter? 3960-3968 - Nyoman Juniarta, Olivier Bonami, Nabil Hathout, Fiammetta Namer, Yannick Toussaint:
Organizing and Improving a Database of French Word Formation Using Formal Concept Analysis. 3969-3976 - Thierry Declerck:
Towards a new Ontology for Sign Languages. 3977-3983 - Yoshihiko Hayashi:
Towards the Detection of a Semantic Gap in the Chain of Commonsense Knowledge Triples. 3984-3993 - Ana Brassard, Benjamin Heinzerling, Pride Kavumba, Kentaro Inui:
COPA-SSE: Semi-structured Explanations for Commonsense Reasoning. 3994-4000 - Ramona Kühn, Jelena Mitrovic, Michael Granitzer:
GRhOOT: Ontology of Rhetorical Figures in German. 4001-4010 - Christian Chiarcos, Christian Fäth, Maxim Ionov:
Querying a Dozen Corpora and a Thousand Years with Fintan. 4011-4021 - Francesco Mambrini, Marco Passarotti, Giovanni Moretti, Matteo Pellegrini:
The Index Thomisticus Treebank as Linked Data in the LiLa Knowledge Base. 4022-4029 - Stefano Menini, Teresa Paccosi, Serra Sinem Tekiroglu, Sara Tonelli:
Building a Multilingual Taxonomy of Olfactory Terms with Timestamps. 4030-4039 - Anastasia Chizhikova, Sanzhar Murzakhmetov, Oleg Serikov, Tatiana Shavrina, Mikhail Burtsev:
Attention Understands Semantic Relations. 4040-4050 - Takuma Ichikawa, Ryuichiro Higashinaka:
Analysis of Dialogue in Human-Human Collaboration in Minecraft. 4051-4059 - Sanae Yamashita, Ryuichiro Higashinaka:
Data Collection for Empirically Determining the Necessary Information for Smooth Handover in Dialogue. 4060-4068 - Jana Götze, Maike Paetzel-Prüsmann, Wencke Liermann, Tim Diekmann, David Schlangen:
The slurk Interaction Server Framework: Better Data for Better Dialog Models. 4069-4078 - Natalia Kalashnikova, Serge Pajak, Fabrice Le Guel, Ioana Vasilescu, Gemma Serrano, Laurence Devillers:
Corpus Design for Studying Linguistic Nudges in Human-Computer Spoken Interactions. 4079-4087 - Yuki Furuya, Koki Saito, Kosuke Ogura, Koh Mitsuda, Ryuichiro Higashinaka, Kazunori Takashio:
Dialogue Corpus Construction Considering Modality and Social Relationships in Building Common Ground. 4088-4095 - Shutong Feng, Nurul Lubis, Christian Geishauser, Hsien-Chin Lin, Michael Heck, Carel van Niekerk, Milica Gasic:
EmoWOZ: A Large-Scale Corpus and Labelling Scheme for Emotion Recognition in Task-Oriented Dialogue Systems. 4096-4113 - Eda Okur, Saurav Sahay, Lama Nachman:
Data Augmentation with Paraphrase Generation and Entity Extraction for Multimodal Dialogue System. 4114-4125 - Annalena Aicher, Wolfgang Minker, Stefan Ultes:
Towards Modelling Self-imposed Filter Bubbles in Argumentative Dialogue Systems. 4126-4134 - Ankit Aich, Natalie Parde:
Telling a Lie: Analyzing the Language of Information and Misinformation during Global Health Events. 4135-4141 - Arianna Muti, Francesco Fernicola, Alberto Barrón-Cedeño:
Misogyny and Aggressiveness Tend to Come Together and Together We Address Them. 4142-4148 - Ritesh Kumar, Shyam Ratan, Siddharth Singh, Enakshi Nandi, Laishram Niranjana Devi, Akash Bhagat, Yogesh Dawer, Bornini Lahiri, Akanksha Bansal, Atul Kr. Ojha:
The ComMA Dataset V0.2: Annotating Aggression and Bias in Multilingual Social Media Discourse. 4149-4161 - Krishnapriya Vishnubhotla, Saif M. Mohammad:
TUSC: Emotion Word Usage in Tweets from US and Canada. 4162-4176 - Fatih Beyhan, Buse Çarik, Inanç Arin, Aysecan Terzioglu, Berrin Yanikoglu, Reyyan Yeniterzi:
A Turkish Hate Speech Dataset and Detection System. 4177-4185 - Ana-Maria Bucur, Adrian Cosma, Liviu P. Dinu:
Life is not Always Depressing: Exploring the Happy Moments of People Diagnosed with Depression. 4186-4192 - Alexandra Benamar, Cyril Grouin, Meryl Bothua, Anne Vilnat:
Evaluating Tokenizers Impact on OOVs Representation with Transformers Models. 4193-4204 - Giuseppina Morza, Raffaele Manna, Johanna Monti:
Assessing the Quality of an Italian Crowdsourced Idiom Corpus: the Dodiom Experiment. 4205-4211 - Anton Alekseev, Zulfat Miftahutdinov, Elena Tutubalina, Artem Shelmanov, Vladimir Ivanov, Vladimir Kokh, Alexandr Nesterov, Manvel Avetisian, Andrey Chertok, Sergey I. Nikolenko:
Medical Crossing: a Cross-lingual Evaluation of Clinical Entity Linking. 4212-4220 - Shreyas Sharma, Kareem Darwish, Lucas Pavanelli, Thiago Castro Ferreira, Mohamed Al-Badrashiny, Kamer Ali Yuksel, Hassan Sawaf:
MTLens: Machine Translation Output Debugging. 4221-4226 - Steinunn Rut Friðriksdóttir, Hjalti Daníelsson, Steinþór Steingrímsson, Einar Freyr Sigurðsson:
IceBATS: An Icelandic Adaptation of the Bigger Analogy Test Set. 4227-4234 - Farhad Akhbardeh, Marcos Zampieri, Cecilia Ovesdotter Alm, Travis Desell:
Transfer Learning Methods for Domain Adaptation in Technical Logbook Datasets. 4235-4244 - Thomas Vakili, Anastasios Lamproudis, Aron Henriksson, Hercules Dalianis:
Downstream Task Performance of BERT Models Pre-Trained Using Automatically De-Identified Clinical Data. 4245-4252 - Bálint Csanády, András Lukács:
Dilated Convolutional Neural Networks for Lightweight Diacritics Restoration. 4253-4259 - Vincent Claveau, Antoine Chaffin, Ewa Kijak:
Generating Artificial Texts as Substitution or Complement of Training Data. 4260-4269 - Mathias Coeckelbergs:
From Pattern to Interpretation. Using Colibri Core to Detect Translation Patterns in the Peshitta. 4270-4274 - Julien Launay, E. L. Tommasone, Baptiste Pannier, François Boniface, Amélie Chatelain, Alessandro Cappelli, Iacopo Poli, Djamé Seddah:
PAGnol: An Extra-Large French Generative Model. 4275-4284 - Mariano Felice, Shiva Taslimipoor, Øistein E. Andersen, Paula Buttery:
CEPOC: The Cambridge Exams Publishing Open Cloze dataset. 4285-4290 - José Cañete, Sebastian Donoso, Felipe Bravo-Marquez, Andrés Carvallo, Vladimir Araujo:
ALBETO and DistilBETO: Lightweight Spanish Language Models. 4291-4298 - Winston Wu, David Yarowsky:
On the Robustness of Cognate Generation Models. 4299-4305 - Nicolas Hiebel, Olivier Ferret, Karën Fort, Aurélie Névéol:
CLISTER : A Corpus for Semantic Textual Similarity in French Clinical Narratives. 4306-4315 - Shanshan Xu, Katja Markert:
The Chinese Causative-Passive Homonymy Disambiguation: an adversarial Dataset for NLI and a Probing Task. 4316-4323 - Teemu Vahtola, Eetu Sjöblom, Jörg Tiedemann, Mathias Creutz:
Modeling Noise in Paraphrase Detection. 4324-4332 - Enzo Laurenti, Nils Bourgon, Farah Benamara, Alda Mari, Véronique Moriceau, Camille Courgeon:
Give me your Intentions, I'll Predict our Actions: A Two-level Classification of Speech Acts for Crisis Management in Social Media. 4333-4343 - Julien Abadji, Pedro Javier Ortiz Suárez, Laurent Romary, Benoît Sagot:
Towards a Cleaner Document-Oriented Multilingual Crawled Corpus. 4344-4355 - Vésteinn Snæbjarnarson, Haukur Barri Símonarson, Pétur Orri Ragnarsson, Svanhvít Lilja Ingólfsdóttir, Haukur Páll Jónsson, Vilhjalmur Thorsteinsson, Hafsteinn Einarsson:
A Warm Start and a Clean Crawled Corpus - A Recipe for Good Language Models. 4356-4366 - M. A. Tugtekin Turan, Dietrich Klakow, Emmanuel Vincent, Denis Jouvet:
Adapting Language Models When Training on Privacy-Transformed Data. 4367-4373 - Aleksandra Chrabrowa, Lukasz Dragan, Karol Grzegorczyk, Dariusz Kajtoch, Mikolaj Koszowski, Robert Mroczkowski, Piotr Rybak:
Evaluation of Transfer Learning for Polish with a Text-to-Text Model. 4374-4394 - Phillip Benjamin Ströbel, Martin Volk, Simon Clematide, Raphael Schwitter, Tobias Hodel, David Schoch:
Evaluation of HTR models without Ground Truth Material. 4395-4404 - Tomasz Korybski, Elena Davitti, Constantin Orasan, Sabine Braun:
A Semi-Automated Live Interlingual Communication Workflow Featuring Intralingual Respeaking: Evaluation and Benchmarking. 4405-4413 - Thibault Prouteau, Nicolas Dugué, Nathalie Camelin, Sylvain Meignier:
Are Embedding Spaces Interpretable? Results of an Intrusion Detection Evaluation on a Large French Corpus. 4414-4419 - Prathamesh Kalamkar, Aman Tiwari, Astha Agarwal, Saurabh Karn, Smita Gupta, Vivek Raghavan, Ashutosh Modi:
Corpus for Automatic Structuring of Legal Documents. 4420-4429 - Claire Bonial, Austin Blodgett, Taylor Hudson, Stephanie M. Lukin, Jeffrey Micher, Douglas Summers-Stay, Peter Sutor Jr., Clare R. Voss:
The Search for Agreement on Logical Fallacy Annotation of an Infodemic. 4430-4438 - Amelie Wührl, Roman Klinger:
Recovering Patient Journeys: A Corpus of Biomedical Entities and Relations on Twitter (BEAR). 4439-4450 - Felix Giovanni Virgo, Fei Cheng, Sadao Kurohashi:
Improving Event Duration Question Answering by Leveraging Existing Temporal Information Extraction Data. 4451-4457 - Natalia V. Loukachevitch, Pavel Braslavski, Vladimir Ivanov, Tatiana Batura, Suresh Manandhar, Artem Shelmanov, Elena Tutubalina:
Entity Linking over Nested Named Entities for Russian. 4458-4466 - V. Rudra Murthy, Pallab Bhattacharjee, Rahul Sharnagat, Jyotsana Khatri, Diptesh Kanojia, Pushpak Bhattacharyya:
HiNER: A large Hindi Named Entity Recognition Dataset. 4467-4476 - Anthi Papadopoulou, Pierre Lison, Lilja Øvrelid, Ildikó Pilán:
Bootstrapping Text Anonymization Models with Distant Supervision. 4477-4487 - Vésteinn Snæbjarnarson, Hafsteinn Einarsson:
Natural Questions in Icelandic. 4488-4496 - Rafael Jiménez Silva, Kaushik Gedela, Alex Marr, Bart Desmet, Carolyn P. Rosé, Chunxiao Zhou:
QA4IE: A Quality Assurance Tool for Information Extraction. 4497-4503 - Miriam Schirmer, Udo Kruschwitz, Gregor Donabauer:
A New Dataset for Topic-Based Paragraph Classification in Genocide-Related Court Transcripts. 4504-4512 - Igor Nascimento, Rinaldo Lima, Adrian-Gabriel Chifu, Bernard Espinasse, Sébastien Fournier:
DeepREF: A Framework for Optimized Deep Learning-based Relation Classification. 4513-4522 - Ubaid Azam, Hammad Rizwan, Asim Karim:
Exploring Data Augmentation Strategies for Hate Speech Detection in Roman Urdu. 4523-4531 - Isil Yakut Kilic, Shimei Pan:
Incorporating LIWC in Neural Networks to Improve Human Trait and Behavior Analysis in Low Resource Scenarios. 4532-4539 - Ankan Mullick, Shubhraneel Pal, Tapas Nayak, Seung-Cheol Lee, Satadeep Bhattacharjee, Pawan Goyal:
Using Sentence-level Classification Helps Entity Extraction from Material Science Literature. 4540-4545 - Buse Çarik, Reyyan Yeniterzi:
A Twitter Corpus for Named Entity Recognition in Turkish. 4546-4551 - Fan Luo, Mihai Surdeanu:
A STEP towards Interpretable Multi-Hop Reasoning: Bridge Phrase Identification and Query Expansion. 4552-4560 - Frédéric Béchet, Elie Antoine, Jérémy Auguste, Géraldine Damnati:
Question Generation and Answering for exploring Digital Humanities collections. 4561-4568 - Nancy Ide, Keith Suderman, Jingxuan Tu, Marc Verhagen, Shanan Peters, Ian Ross, John Lawson, Andrew Borg, James Pustejovsky:
Evaluating Retrieval for Multi-domain Scientific Publications. 4569-4576 - Jenia Kim, Stella Verkijk, Edwin Geleijn, Marieke van der Leeden, Carel Meskers, Caroline Meskers, Sabina van der Veen, Piek Vossen, Guy Widdershoven:
Modeling Dutch Medical Texts for Detecting Functional Categories and Levels of COVID-19 Patients. 4577-4585 - Nurpeiis Baimukan, Houda Bouamor, Nizar Habash:
Hierarchical Aggregation of Dialectal Data for Arabic Dialect Identification. 4586-4596 - Lukas Wertz, Katsiaryna Mirylenka, Jonas Kuhn, Jasmina Bogojeska:
Investigating Active Learning Sampling Strategies for Extreme Multi Label Text Classification. 4597-4605 - Kristin Kutzner, Ralf Laue:
German Light Verb Constructions in Business Process Models. 4606-4610 - Jordan Meadows, Zili Zhou, André Freitas:
PhysNLU: A Language Resource for Evaluating Natural Language Understanding and Explanation Coherence in Physics. 4611-4619 - Amalia Todirascu, Rodrigo Wilkens, Eva Rolin, Thomas François, Delphine Bernhard, Núria Gala:
HECTOR: A Hybrid TExt SimplifiCation TOol for Raw Texts in French. 4620-4630 - Peter Juel Henrichsen, Stine Fuglsang Engmose:
AiRO - an Interactive Learning Tool for Children at Risk of Dyslexia. 4631-4636 - Annika Simonsen, Sandra Saxov Lamhauge, Iben Nyholm Debess, Peter Juel Henrichsen:
Creating a Basic Language Resource Kit for Faroese. 4637-4643 - Hulda Óladóttir, Thórunn Arnardóttir, Anton Karl Ingason, Vilhjalmur Thorsteinsson:
Developing a Spell and Grammar Checker for Icelandic using an Error Corpus. 4644-4653 - Abhijit Suresh, Jennifer Jacobs, Charis Harty, Margaret Perkoff, James H. Martin, Tamara Sumner:
The TalkMoves Dataset: K-12 Mathematics Lesson Transcripts Annotated for Teacher and Student Discursive Moves. 4654-4662 - Marcello Gecchele, Hiroaki Yamada, Takenobu Tokunaga, Yasuyo Sawaki, Mika Ishizuka:
Automating Idea Unit Segmentation and Alignment for Assessing Reading Comprehension via Summary Protocol Analysis. 4663-4673 - Keshav Singh, Naoya Inoue, Farjana Sultana Mim, Shoichi Naitoh, Kentaro Inui:
IRAC: A Domain-Specific Annotated Corpus of Implicit Reasoning in Arguments. 4674-4683 - Julian Linke, Philip N. Garner, Gernot Kubin, Barbara Schuppler:
Conversational Speech Recognition Needs Data? Experiments with Austrian German. 4684-4691 - Vijini Liyanage, Davide Buscaldi, Adeline Nazarenko:
A Benchmark Corpus for the Detection of Automatically Generated Text in Academic Publications. 4692-4700 - Ivano Lauriola, Kevin Small, Alessandro Moschitti:
Building a Dataset for Automatically Learning to Detect Questions Requiring Clarification. 4701-4707 - Thomas E. Kolb, Sekanina Katharina, Bettina Manuela Johanna Kern, Julia Neidhardt, Tanja Wissik, Andreas Baumann:
The ALPIN Sentiment Dictionary: Austrian Language Polarity in Newspapers. 4708-4716 - Minh-Quoc Nghiem, Paul Baylis, André Freitas, Sophia Ananiadou:
Text Classification and Prediction in the Legal Domain. 4717-4722 - Andy Lücking, Manuel Stoeckel, Giuseppe Abrami, Alexander Mehler:
I still have Time(s): Extending HeidelTime for German Texts. 4723-4728 - Gordana Hrzica, Chaya Liebeskind, Kristina S. Despot, Olga Dontcheva-Navratilova, Laura Kamandulyte-Merfeldiene, Sara Kosutar, Matea Kramaric, Giedre Valunaite Oleskeviciene:
Morphological Complexity of Children Narratives in Eight Languages. 4729-4738 - Ana-Maria Bucur, Madalina Chitez, Valentina Muresan, Andreea Dinca, Roxana Rogobete:
EXPRES Corpus for A Field-specific Automated Exploratory Study of L2 English Expert Scientific Writing. 4739-4746 - Ankan Mullick, Abhilash Nandy, Manav Nitin Kapadnis, Sohan Patnaik, R. Raghav, Roshni Kar:
An Evaluation Framework for Legal Document Summarization. 4747-4753 - Thibault Charmet, Inès Cherichi, Matthieu Allain, Urszula Czerwinska, Amaury Fouret, Benoît Sagot, Rachel Bawden:
Complex Labelling and Similarity Prediction in Legal Texts: Automatic Analysis of France's Court of Cassation Rulings. 4754-4766 - Bolat Tleubayev, Zhanel Zhexenova, Kenessary Koishybay, Anara Sandygulova:
Cyrillic-MNIST: a Cyrillic Version of the MNIST Dataset. 4767-4773 - James Barry, Joachim Wagner, Lauren Cassidy, Alan Cowap, Teresa Lynn, Abigail Walsh, Mícheál J. Ó Meachair, Jennifer Foster:
gaBERT - an Irish Language Model. 4774-4788 - Wilbert Heeringa, Gosse Bouma, Martha Hofman, Jelle Brouwer, Eduard Drenth, Jan Wijffels, Hans Van de Velde:
PoS Tagging, Lemmatization and Dependency Parsing of West Frisian. 4789-4798 - Melina Plakidis, Georg Rehm:
A Dataset of Offensive German Language Tweets Annotated for Speech Acts. 4799-4807 - Marie-Pauline Krielke, Luigi Talamo, Mahmoud Fawzi, Jörg Knappen:
Tracing Syntactic Change in the Scientific Genre: Two Universal Dependency-parsed Diachronic Corpora of Scientific English and German. 4808-4816 - Luís Morgado da Costa, Francis Bond, Roger Vivek Placidus Winder:
The Tembusu Treebank: An English Learner Treebank. 4817-4826 - Andre Kåsen, Kristin Hagen, Anders Nøklestad, Joel Priestley, Per Erik Solberg, Dag Trygve Truslew Haug:
The Norwegian Dialect Corpus Treebank. 4827-4832 - Tatiana Bladier, Kilian Evang, Valeria Generalova, Zahra Ghane, Laura Kallmeyer, Robin Möllemann, Natalia Moors, Rainer Osswald, Simon Petitjean:
RRGparbank: A Parallel Role and Reference Grammar Treebank. 4833-4841 - Christian Chiarcos, Christian Fäth, Maxim Ionov:
Unifying Morphology Resources with OntoLex-Morph. A Case Study in German. 4842-4850 - Takuto Asakura, Yusuke Miyao, Akiko Aizawa:
Building Dataset for Grounding of Formulae - Annotating Coreference Relations Among Math Identifiers. 4851-4858 - Anna Nedoluzhko, Michal Novák, Martin Popel, Zdenek Zabokrtský, Amir Zeldes, Daniel Zeman:
CorefUD 1.0: Coreference Meets Universal Dependencies. 4859-4872 - Juntao Yu, Sopan Khosla, Nafise Sadat Moosavi, Silviu Paun, Sameer Pradhan, Massimo Poesio:
The Universal Anaphora Scorer. 4873-4883 - Anastasia Zhukova, Felix Hamborg, Bela Gipp:
Towards Evaluation of Cross-document Coreference Resolution Models Using Datasets with Diverse Annotation Schemes. 4884-4893 - Bimal Bhattarai, Ole-Christoffer Granmo, Lei Jiao:
Explainable Tsetlin Machine Framework for Fake News Detection with Credibility Score Assessment. 4894-4903 - Ali L. Hatab, Caroline Sabty, Slim Abdennadher:
Enhancing Deep Learning with Embedded Features for Arabic Named Entity Recognition. 4904-4912 - Svitlana Vakulenko, Johannes Kiesel, Maik Fröbe:
SCAI-QReCC Shared Task on Conversational Question Answering. 4913-4922 - Michael Raring, Malte Ostendorff, Georg Rehm:
Semantic Relations between Text Segments for Semantic Storytelling: Annotation Tool - Dataset - Evaluation. 4923-4932 - Prajit Dhar, Arianna Bisazza, Gertjan van Noord:
Evaluating Pre-training Objectives for Low-Resource Translation into Morphologically Rich Languages. 4933-4943 - Abhidip Bhattacharyya, Cecilia Mauceri, Martha Palmer, Christoffer Heckman:
Aligning Images and Text with Semantic Role Labels for Fine-Grained Cross-Modal Understanding. 4944-4954 - Élise Bertin-Lemée, Annelies Braffort, Camille Challant, Claire Danet, Boris Dauriac, Michael Filhol, Emmanuella Martinod, Jérémie Segouat:
Rosetta-LSF: an Aligned Corpus of French Sign Language and French for Text-to-Sign Translation. 4955-4962 - Marina Fomicheva, Shuo Sun, Erick R. Fonseca, Chrysoula Zerva, Frédéric Blain, Vishrav Chaudhary, Francisco Guzmán, Nina Lopatina, Lucia Specia, André F. T. Martins:
MLQE-PE: A Multilingual Quality Estimation and Post-Editing Dataset. 4963-4974 - Sangwhan Moon, Won-Ik Cho, Hye Joo Han, Naoaki Okazaki, Nam Soo Kim:
OpenKorPOS: Democratizing Korean Tokenization with Voting-Based Open Corpus Annotation. 4975-4983 - Katerina Korre, John Pavlopoulos:
Enriching Grammatical Error Correction Resources for Modern Greek. 4984-4991 - David R. Mortensen, Xinyu Zhang, Chenxuan Cui, Katherine J. Zhang:
A Hmong Corpus with Elaborate Expression Annotations. 4992-5000 - Delphine Bernhard, Pablo Ruiz Fabo:
ELAL: An Emotion Lexicon for the Analysis of Alsatian Theatre Plays. 5001-5010 - Robert Pugh, Marivel Huerta Mendez, Mitsuya Sasaki, Francis M. Tyers:
Universal Dependencies for Western Sierra Puebla Nahuatl. 5011-5020 - Gregory Baker, Diego Mollá:
The Construction and Evaluation of the LEAFTOP Dataset of Automatically Extracted Nouns in 1480 Languages. 5021-5028 - Rodolfo Zevallos, Luis Camacho, Nelsi Melgarejo:
Huqariq: A Multilingual Speech Corpus of Native Languages of Peru forSpeech Recognition. 5029-5034 - Daan van Esch, Tamar Lucassen, Sebastian Ruder, Isaac Caswell, Clara Rivera:
Writing System and Speaker Metadata for 2, 800+ Language Varieties. 5035-5046 - Tjerk Hagemeijer, Amália Mendes, Rita Gonçalves, Catarina Cornejo, Raquel Madureira, Michel Généreux:
The PALMA Corpora of African Varieties of Portuguese. 5047-5053 - Büsra Marsan, Oguz Kerem Yildiz, Asli Kuzgun, Neslihan Cesur, Arife Betül Yenice, Ezgi Saniyar, Oguzhan Kuyrukçu, Bilge Nas Arican, Olcay Taner Yildiz:
A Learning-Based Dependency to Constituency Conversion Algorithm for the Turkish Language. 5054-5062 - Jonathan David Mutal, Pierrette Bouillon, Johanna Gerlach, Veronika Haberkorn:
Standard German Subtitling of Swiss German TV content: the PASSAGE Project. 5063-5070 - Hemant Yadav, Sunayana Sitaram:
A Survey of Multilingual Models for Automatic Speech Recognition. 5071-5079 - Cedric Lothritz, Bertrand Lebichot, Kevin Allix, Lisa Veiber, Tegawendé F. Bissyandé, Jacques Klein, Andrey Boytsov, Clément Lefebvre, Anne Goujon:
LuxemBERT: Simple and Practical Data Augmentation in Language Model Pre-Training for Luxembourgish. 5080-5089 - Salar Mohtaj, Fatemeh Tavakkoli, Habibollah Asghari:
PerPaDa: A Persian Paraphrase Dataset based on Implicit Crowdsourcing Data Collection. 5090-5096 - Ignatius Ezeani, Mahmoud El-Haj, Jonathan Morris, Dawn Knight:
Introducing the Welsh Text Summarisation Dataset and Baseline Systems. 5097-5106 - Disura Warusawithana, Nilmani Kulaweera, Lakshan Weerasinghe, Buddhika Karunarathne:
A Systematic Approach to Derive a Refined Speech Corpus for Sinhala. 5107-5113 - Chiamaka Chukwuneke, Ignatius Ezeani, Paul Rayson, Mahmoud El-Haj:
IgboBERT Models: Building and Training Transformer Models for the Igbo Language. 5114-5122 - Baiba Saulite, Roberts Dargis, Normunds Gruzitis, Ilze Auzina, Kristine Levane-Petrova, Lauma Pretkalnina, Laura Rituma, Peteris Paikens, Arturs Znotins, Laine Strankale, Kristine Pokratniece, Ilmars Poikans, Guntis Barzdins, Inguna Skadina, Anda Baklane, Valdis Saulespurens, Janis Ziedins:
Latvian National Corpora Collection - Korpuss.lv. 5123-5129 - Ioan-Bogdan Iordache, Ana Sabina Uban, Catalin Stoean, Liviu P. Dinu:
Investigating the Relationship Between Romanian Financial News and Closing Prices from the Bucharest Stock Exchange. 5130-5136 - Sardana Ivanova, Jonathan Washington, Francis M. Tyers:
A Free/Open-Source Morphological Analyser and Generator for Sakha. 5137-5142 - Joshua Holden, Christopher Cox, Antti Arppe:
An Expanded Finite-State Transducer for Tsuut'ina Verbs. 5143-5152 - Nauros Romim, Mosahed Ahmed, Md Saiful Islam, Arnab Sen Sharma, Hriteshwar Talukder, Mohammad Ruhul Amin:
BD-SHS: A Benchmark Dataset for Learning to Detect Online Bangla Hate Speech in Different Social Contexts. 5153-5162 - Mehdi Mirzapour, Waleed Ragheb, Mohammad Javad Saeedizade, Kévin Cousot, Hélène Jacquenet, Lawrence Carbon, Mathieu Lafourcade:
Introducing RezoJDM16k: a French KnowledgeGraph DataSet for Link Prediction. 5163-5169 - Philippe Blache, Salomé Antoine, Dorina De Jong, Lena-Marie Huttner, Emilia Kerr, Thierry Legou, Eliot Maës, Clément François:
The Badalona Corpus - An Audio, Video and Neuro-Physiological Conversational Dataset. 5170-5177 - Masayuki Asahara:
Reading Time and Vocabulary Rating in the Japanese Language: Large-Scale Japanese Reading Time Data Collection Using Crowdsourcing. 5178-5187 - Yuval Marton, Asad B. Sayeed:
Thematic Fit Bits: Annotation Quality and Quantity Interplay for Event Participant Representation. 5188-5197 - Francesco Cabiddu, Lewis Bott, Gary Jones, Chiara Gambi:
ChiSense-12: An English Sense-Annotated Child-Directed Speech Corpus. 5198-5205 - Beatrice Turano, Carlo Strapparava:
Making People Laugh like a Pro: Analysing Humor Through Stand-Up Comedy. 5206-5211 - Christoph Hesse, Maurice Langner, Ralf Klabunde, Anton Benz:
Testing Focus and Non-at-issue Frameworks with a Question-under-Discussion-Annotated Corpus. 5212-5219 - Tu-Anh Tran, Yusuke Miyao:
Development of a Multilingual CCG Treebank via Universal Dependencies Conversion. 5220-5233 - Gloria Gagliardi, Fabio Tamburini:
The Automatic Extraction of Linguistic Biomarkers as a Viable Solution for the Early Diagnosis of Mental Disorders. 5234-5242 - Siew Yeng Chow, Francis Bond:
Singlish Where Got Rules One? Constructing a Computational Grammar for Singlish. 5243-5250 - Jeanne Villaneau, Farida Saïd:
COSMOS: Experimental and Comparative Studies of Concept Representations in Schoolchildren. 5251-5260 - Prisca Piccirilli, Sabine Schulte im Walde:
Features of Perceived Metaphoricity on the Discourse Level: Abstractness and Emotionality. 5261-5273 - Sandhya Singh, Prapti Roy, Nihar Sahoo, Niteesh Mallela, Himanshu Gupta, Pushpak Bhattacharyya, Milind Savagaonkar, Nidhi, Roshni R. Ramnani, Anutosh Maitra, Shubhashis Sengupta:
Hollywood Identity Bias Dataset: A Context Oriented Bias Analysis of Movie Dialogues. 5274-5285 - Emily Ahn, Eleanor Chodroff:
VoxCommunis: A Corpus for Cross-linguistic Phonetic Analysis. 5286-5294 - Andrea Peverelli, Marieke van Erp, Jan Bloemendal:
Tracking Textual Similarities in Neo-Latin Drama Networks. 5295-5303 - Siim Orasmaa, Kadri Muischnek, Kristjan Poska, Anna Edela:
Named Entity Recognition in Estonian 19th Century Parish Court Records. 5304-5313 - Annerose Eichel, Gabriella Lapesa, Sabine Schulte im Walde:
Investigating Independence vs. Control: Agenda-Setting in Russian News Coverage on Social Media. 5314-5323 - Sara Stymne, Carin Östman:
SLäNDa version 2.0: Improved and Extended Annotation of Narrative and Dialogue in Swedish Literature. 5324-5333 - Evelien de Graaf, Silvia Stopponi, Jasper K. Bos, Saskia Peels-Matthey, Malvina Nissim:
AGILe: The First Lemmatizer for Ancient Greek Inscriptions. 5334-5344 - Nadja Schauffler, Toni Bernhart, André Blessing, Gunilla Eschenbach, Markus Gärtner, Kerstin Jung, Anna Kinder, Julia Koch, Sandra Richter, Gabriel Viehhauser, Ngoc Thang Vu, Lorenz Wesemann, Jonas Kuhn:
»textklang« - Towards a Multi-Modal Exploration Platform for German Poetry. 5345-5355 - Isabelle Nguyen, Shuly Wintner:
Predicting the Proficiency Level of Nonnative Hebrew Authors. 5356-5365 - Sowmya Vajjala:
Trends, Limitations and Open Challenges in Automatic Readability Assessment Research. 5366-5377 - Mithun Das, Punyajoy Saha, Binny Mathew, Animesh Mukherjee:
HateCheckHIn: Evaluating Hindi Hate Speech Detection Models. 5378-5387 - Irene Li, Alexander R. Fabbri, Rina Kawamura, Yixin Liu, Xiangru Tang, Jaesung Tae, Chang Shen, Sally Ma, Tomoe Mizutani, Dragomir Radev:
Surfer100: Generating Surveys From Web Resources, Wikipedia-style. 5388-5392 - Sujay Kumar Jauhar, Nirupama Chandrasekaran, Michael Gamon, Ryen White:
MS-LaTTE: A Dataset of Where and When To-do Tasks are Completed. 5393-5403 - Saida Mussakhojayeva, Yerbolat Khassanov, Huseyin Atakan Varol:
KazakhTTS2: Extending the Open-Source Kazakh TTS Corpus With More Data, Speakers, and Topics. 5404-5411 - Joel Oksanen, Abhilash Majumder, Kumar Saunack, Francesca Toni, Arun Dhondiyal:
A Graph-Based Method for Unsupervised Knowledge Discovery from Financial Texts. 5412-5417 - Sravani Boinepelli, Tathagata Raha, Harika Abburi, Pulkit Parikh, Niyati Chhaya, Vasudeva Varma:
Leveraging Mental Health Forums for User-level Depression Detection on Social Media. 5418-5427 - Benjamin Danielsson, Marina Santini, Peter Lundberg, Yosef Al-Abasse, Arne Jönsson, Emma Eneling, Magnus Stridsman:
Classifying Implant-Bearing Patients via their Medical Histories: a Pre-Study on Swedish EMRs with Semi-Supervised GanBERT. 5428-5435 - Saméh Kchaou, Rahma Boujelbane, Emna Fsih, Lamia Hadrich Belguith:
Standardisation of Dialect Comments in Social Networks in View of Sentiment Analysis : Case of Tunisian Dialect. 5436-5443 - Tiberiu Sosea, Cornelia Caragea:
EnsyNet: A Dataset for Encouragement and Sympathy Detection. 5444-5449 - Marcelo Yuji Himoro, Antonio Pareja-Lora:
Preliminary Results on the Evaluation of Computational Tools for the Analysis of Quechua and Aymara. 5450-5459 - Siddhant Arora, Henry Hosseini, Christine Utz, Vinayshekhar Bannihatti Kumar, Tristan Dhellemmes, Abhilasha Ravichander, Peter Story, Jasmine Mangat, Rex Chen, Martin Degeling, Thomas B. Norton, Thomas Hupperich, Shomir Wilson, Norman M. Sadeh:
A Tale of Two Regulatory Regimes: Creation and Analysis of a Bilingual Privacy Policy Corpus. 5460-5472 - Xindi Wang, Robert E. Mercer, Frank Rudzicz:
MeSHup: Corpus for Full Text Biomedical Document Indexing. 5473-5483 - Yanjun Gao, Dmitriy Dligach, Timothy A. Miller, Samuel Tesch, Ryan Laffin, Matthew M. Churpek, Majid Afshar:
Hierarchical Annotation for Building A Suite of Clinical Natural Language Processing Tasks: Progress Note Understanding. 5484-5493 - Vinh Van Nguyen, Ha Nguyen, Huong Thanh Le, Thai Phuong Nguyen, Tan Van Bui, Luan-Nghia Pham, Anh Tuan Phan, Cong Hoang-Minh Nguyen, Viet-Hong Tran, Anh Huu Tran:
KC4MT: A High-Quality Corpus for Multilingual Machine Translation. 5494-5502 - Kokil Jaidka:
Developing A Multilabel Corpus for the Quality Assessment of Online Political Talk. 5503-5510 - Irati Hurtado:
BILinMID: A Spanish-English Corpus of the US Midwest. 5511-5516 - Dheeraj Rajagopal, Xuchao Zhang, Michael Gamon, Sujay Kumar Jauhar, Diyi Yang, Eduard H. Hovy:
One Document, Many Revisions: A Dataset for Classification and Description of Edit Intents. 5517-5524 - Yue Cui, Junhui Zhu, Liner Yang, Xuezhi Fang, Xiaobin Chen, Yujie Wang, Erhong Yang:
CTAP for Chinese: A Linguistic Complexity Feature Automatic Calculation Platform. 5525-5538 - Dominik Pfütze, Eva Ritz, Julius Janda, Roman Rietsche:
A Corpus for Suggestion Mining of German Peer Feedback. 5539-5547 - Yi Li, Dong Yu, Pengyuan Liu:
CLGC: A Corpus for Chinese Literary Grace Evaluation. 5548-5556 - Özlem Çetinoglu, Antje Schweitzer:
Anonymising the SAGT Speech Corpus and Treebank. 5557-5564 - Daisuke Suzuki, Yujin Takahashi, Ikumi Yamashita, Taichi Aida, Tosho Hirasawa, Michitaka Nakatsuji, Masato Mita, Mamoru Komachi:
Construction of a Quality Estimation Dataset for Automatic Evaluation of Japanese Grammatical Error Correction. 5565-5572 - Jui Shah, Dongxu Zhang, Sam Brody, Andrew McCallum:
Enhanced Distant Supervision with State-Change Information for Relation Extraction. 5573-5579 - Chen Gafni, Anat Prior, Shuly Wintner:
The Hebrew Essay Corpus. 5580-5586 - Hanae Koiso, Haruka Amatani, Yasuharu Den, Yuriko Iseki, Yuichi Ishimoto, Wakako Kashino, Yoshiko Kawabata, Ken'ya Nishikawa, Yayoi Tanaka, Yasuyuki Usuda, Yuka Watanabe:
Design and Evaluation of the Corpus of Everyday Japanese Conversation. 5587-5594 - Arda Akdemir, Yeojoo Jeon, Tetsuo Shibuya:
Developing Language Resources and NLP Tools for the North Korean Language. 5595-5600 - Masatoshi Tsuchiya, Yasutaka Yokoi:
Developing a Dataset of Overridden Information in Wikipedia. 5601-5608 - Bernardo Scapini Consoli, Henrique D. P. dos Santos, Ana Helena D. P. S. Ulbrich, Renata Vieira, Rafael H. Bordini:
BRATECA (Brazilian Tertiary Care Dataset): a Clinical Information Dataset for the Portuguese Language. 5609-5616 - António Branco, João Ricardo Silva, Luís Gomes, João António Rodrigues:
Universal Grammatical Dependencies for Portuguese with CINTIL Data, LX Processing and CLARIN support. 5617-5626 - Gayatri Venugopal, Dhanya Pramod, Ravi Shekhar:
CWID-hi: A Dataset for Complex Word Identification in Hindi Text. 5627-5636 - Alla Rozovskaya:
Automatic Classification of Russian Learner Errors. 5637-5647 - Elzbieta Hajnicz:
Annotation of metaphorical expressions in the Basic Corpus of Polish Metaphors. 5648-5653 - Yuanhe Tian, Han Qin, Fei Xia, Yan Song:
ChiMST: A Chinese Medical Corpus for Word Segmentation and Medical Term Recognition. 5654-5664 - Sudipta Singha Roy, Robert E. Mercer:
Building a Synthetic Biomedical Research Article Citation Linkage Corpus. 5665-5672 - Keita Kobayashi, Kohei Koyama, Hiromi Narimatsu, Yasuhiro Minami:
Dataset Construction for Scientific-Document Writing Support by Extracting Related Work Section and Citations from PDF Papers. 5673-5682 - Nikita Martynov, Irina Krotova, Varvara Logacheva, Alexander Panchenko, Olga Kozlova, Nikita Semenov:
RuPAWS: A Russian Adversarial Dataset for Paraphrase Identification. 5683-5691 - Andressa Rodrigues Gomide, Conceição Carapinha, Cornelia Plag:
Atril: an XML Visualization System for Corpus Texts. 5692-5695 - Aryaman Arora, Nitin Venkateswaran, Nathan Schneider:
MASALA: Modelling and Analysing the Semantics of Adpositions in Linguistic Annotation of Hindi. 5696-5704 - Aryaman Arora:
Universal Dependencies for Punjabi. 5705-5711 - Ashok Urlana, Nirmal Surange, Pawan Baswani, Priyanka Ravva, Manish Shrivastava:
TeSum: Human-Generated Abstractive Summarization Corpus for Telugu. 5712-5722 - John Lee, Haley Fong, Lai Shuen Judy Wong, Chun Chung Mak, Chi Hin Yip, Ching Wah Larry Ng:
A Corpus of Simulated Counselling Sessions with Dialog Act Annotation. 5723-5730 - Shikib Mehri, Yulan Feng, Carla Gordon, Seyed Hossein Alavi, David R. Traum, Maxine Eskénazi:
Interactive Evaluation of Dialog Track at DSTC9. 5731-5738 - Josue Torres-Fonsesca, Casey Kennington:
HADREB: Human Appraisals and (English) Descriptions of Robot Emotional Behaviors. 5739-5748 - Koh Mitsuda, Ryuichiro Higashinaka, Yuhei Oga, Sen Yoshida:
Dialogue Collection for Recording the Process of Building Common Ground in a Collaborative Task. 5749-5758 - Michimasa Inaba, Yuya Chiba, Ryuichiro Higashinaka, Kazunori Komatani, Yusuke Miyao, Takayuki Nagai:
Collection and Analysis of Travel Agency Task Dialogues with Age-Diverse Speakers. 5759-5767 - Deepthi Karkada, Ramesh R. Manuvinakurike, Maike Paetzel-Prüsmann, Kallirroi Georgila:
Strategy-level Entrainment of Dialogue System Users in a Creative Visual Reference Resolution Task. 5768-5777 - Yinhe Zheng, Guanyi Chen, Xin Liu, Jian Sun:
MMChat: Multi-Modal Chat Dataset on Social Media. 5778-5786 - Meihuizi Jia, Ruixue Liu, Peiying Wang, Yang Song, Zexi Xi, Haobin Li, Xin Shen, Meng Chen, Jinhui Pang, Xiaodong He:
E-ConvRec: A Large-Scale Conversational Recommendation Dataset for E-Commerce Customer Service. 5787-5796 - Syed Mostofa Monsur, Sakib Chowdhury, Md Shahrar Fatemi, Shafayat Ahmed:
SHONGLAP: A Large Bengali Open-Domain Dialogue Corpus. 5797-5804 - Toshiki Onishi, Asahi Ogushi, Yohei Tahara, Ryo Ishii, Atsushi Fukayama, Takao Nakamura, Akihiro Miyata:
A Comparison of Praising Skills in Face-to-Face and Remote Dialogues. 5805-5812 - Ada Tur, David R. Traum:
Comparing Approaches to Language Understanding for Human-Robot Dialogue: An Error Taxonomy and Analysis. 5813-5820 - Hanfei Sun, Ziyuan Cao, Diyi Yang:
SPORTSINTERVIEW: A Large-Scale Sports Interview Benchmark for Entity-centric Dialogues. 5821-5828 - Gopendra Vikram Singh, Priyanshu Priya, Mauajama Firdaus, Asif Ekbal, Pushpak Bhattacharyya:
EmoInHindi: A Multi-label Emotion and Intensity Annotated Dataset in Hindi for Emotion Recognition in Dialogues. 5829-5837 - Krishnapriya Vishnubhotla, Adam Hammond, Graeme Hirst:
The Project Dialogism Novel Corpus: A Dataset for Quotation Attribution in Literary Texts. 5838-5848 - Ines Rehbein, Josef Ruppenhofer:
Who's in, who's out? Predicting the Inclusiveness or Exclusiveness of Personal Pronouns in Parliamentary Debates. 5849-5858 - Callum Booth, Robert Shoemaker, Robert J. Gaizauskas:
A Language Modelling Approach to Quality Assessment of OCR'ed Historical Text. 5859-5864 - Roser Morante, Eleanor L. T. Smith, Lianne Wilhelmus, Alie Lassche, Erika Kuijpers:
Identifying Copied Fragments in a 18th Century Dutch Chronicle. 5865-5878 - Konstantina Liagkou, John Pavlopoulos, Ewa Machotka:
A Study of Distant Viewing of ukiyo-e prints. 5879-5888 - Haining Wang, Allen Riddell:
CCTAA: A Reproducible Corpus for Chinese Authorship Attribution Research. 5889-5893 - Tariq Yousef, Chiara Palladino, Farnoosh Shamsian, Anise d'Orange Ferreira, Michel Ferreira dos Reis:
An automatic model and Gold Standard for translation alignment of Ancient Greek. 5894-5905 - Francielle Alves Vargas, Jonas D'Alessandro, Zohar Rabinovich, Fabrício Benevenuto, Thiago A. S. Pardo:
Rhetorical Structure Approach for Online Deception Detection: A Survey. 5906-5915 - Shoichi Naito, Shintaro Sawada, Chihiro Nakagawa, Naoya Inoue, Kenshi Yamaguchi, Iori Shimizu, Farjana Sultana Mim, Keshav Singh, Kentaro Inui:
TYPIC: A Corpus of Template-Based Diagnostic Comments on Argumentation. 5916-5928 - John Mendonça, Rui Correia, Mariana Lourenço, João Freitas, Isabel Trancoso:
Towards Speaker Verification for Crowdsourced Speech Collections. 5929-5937 - Liming Xiao, Bin Li, Zhixing Xu, Kairui Huo, Minxuan Feng, Junsheng Zhou, Weiguang Qu:
Align-smatch: A Novel Evaluation Method for Chinese Abstract Meaning Representation Parsing based on Alignment of Concept and Relation. 5938-5945 - Thórhildur Thorleiksdóttir, Cédric Renggli, Nora Hollenstein, Ce Zhang:
Dynamic Human Evaluation for Relative Model Comparisons. 5946-5955 - Yves Bestgen:
Please, Don't Forget the Difference and the Confidence Interval when Seeking for the State-of-the-Art Status. 5956-5962 - Xinran Zhao, Hongming Zhang, Yangqiu Song:
PCR4ALL: A Comprehensive Evaluation Benchmark for Pronoun Coreference Resolution in English. 5963-5973 - Mikhail Lepekhin, Serge Sharoff:
Estimating Confidence of Predictions of Individual Classifiers and TheirEnsembles for the Genre Classification Task. 5974-5982 - Sowmya Vajjala, Ramya Balasubramaniam:
What do we really know about State of the Art NER? 5983-5993 - Yujin Takahashi, Masahiro Kaneko, Masato Mita, Mamoru Komachi:
ProQE: Proficiency-wise Quality Estimation dataset for Grammatical Error Correction. 5994-6000 - Divya Tadimeti, Kallirroi Georgila, David R. Traum:
Evaluation of Off-the-shelf Speech Recognizers on Different Accents in a Dialogue Domain. 6001-6008 - Ramya Akula, Ivan Garibay:
Sentence Pair Embeddings Based Evaluation Metric for Abstractive and Extractive Summarization. 6009-6017 - Thierry Poibeau:
On "Human Parity" and "Super Human Performance" in Machine Translation Evaluation. 6018-6023 - Vladimir Araujo, Andrés Carvallo, Souvik Kundu, José Cañete, Marcelo Mendoza, Robert E. Mercer, Felipe Bravo-Marquez, Marie-Francine Moens, Alvaro Soto:
Evaluation Benchmarks for Spanish Sentence Representations. 6024-6034 - José Antonio García-Díaz, Pedro José Vivancos Vicente, Ángela Almela, Rafael Valencia-García:
UMUTextStats: A linguistic feature extraction tool for Spanish. 6035-6044 - Kevin Heffernan, Simone Teufel:
Problem-solving Recognition in Scientific Text. 6045-6058 - Yuxiang Zhang, Hayato Yamana:
HRCA+: Advanced Multiple-choice Machine Reading Comprehension Method. 6059-6068 - Maulik Parmar, Apurva Narayan:
HyperBox: A Supervised Approach for Hypernym Discovery using Box Embeddings. 6069-6076 - Zhengnan Xie, Alice Saebom Kwak, Enfa George, Laura W. Dozal, Hoang Van, Moriba K. Jah, Roberto Furfaro, Peter A. Jansen:
Extracting Space Situational Awareness Events from News Text. 6077-6082 - Naghme Jamali, Yadollah Yaghoobzadeh, Heshaam Faili:
PerCQA: Persian Community Question Answering Dataset. 6083-6092 - Piyawat Lertvittayakumjorn, Leshem Choshen, Eyal Shnarch, Francesca Toni:
GrASP: A Library for Extracting and Exploring Human-Interpretable Textual Patterns. 6093-6103 - Zhaoxin Luo, Michael Zhu:
Recurrent Neural Networks with Mixed Hierarchical Structures and EM Algorithm for Natural Language Processing. 6104-6113 - Changwook Jun, Jooyoung Choi, Myoseop Sim, Hyun Kim, Hansol Jang, Kyungkoo Min:
Korean-Specific Dataset for Table Question Answering. 6114-6120 - Robin Schaefer, Manfred Stede:
GerCCT: An Annotated Corpus for Mining Arguments in German Tweets on Climate Change. 6121-6130 - Yasutomo Kimura, Hokuto Ototake, Minoru Sasaki:
Budget Argument Mining Dataset Using Japanese Minutes from the National Diet and Local Assemblies. 6131-6138 - Do-Myoung Lee, Yeachan Kim, Chang-gyun Seo:
Context-based Virtual Adversarial Training for Text Classification with Noisy Labels. 6139-6146 - Chenying Li, Wenbo Ye, Yilun Zhao:
FinMath: Injecting a Tree-structured Solver for Question Answering over Financial Reports. 6147-6152 - Ilya Gusev, Alexey Tikhonov:
HeadlineCause: A Dataset of News Headlines for Detecting Causalities. 6153-6161 - Boyang Liu, Viktor Schlegel, Riza Batista-Navarro, Sophia Ananiadou:
Incorporating Zoning Information into Argument Mining from Biomedical Literature. 6162-6169 - Yash Verma, Anubhav Jangra, Sriparna Saha, Adam Jatowt, Dwaipayan Roy:
MAKED: Multi-lingual Automatic Keyword Extraction Dataset. 6170-6179 - Robert Vacareanu, Marco Antonio Valenzuela-Escárcega, George Caique Gouveia Barbosa, Rebecca Sharp, Gustave Hahn-Powell, Mihai Surdeanu:
From Examples to Rules: Neural Guided Rule Synthesis for Information Extraction. 6180-6189 - Han Qin, Yuanhe Tian, Yan Song:
Enhancing Relation Extraction via Adversarial Multi-task Learning. 6190-6199 - Danushka Bollegala, Tomoya Machide, Ken-ichi Kawarabayashi:
Query Obfuscation by Semantic Decomposition. 6200-6211 - Ruofan Hu, Dongyu Zhang, Dandan Tao, Thomas Hartvigsen, Hao Feng, Elke A. Rundensteiner:
TWEET-FID: An Annotated Dataset for Multiple Foodborne Illness Detection Tasks. 6212-6222 - Pawel Skórzewski, Mikolaj Pieniowski, Grazyna Demenko:
Named Entity Recognition to Detect Criminal Texts on the Web. 6223-6231 - Zhuoqun Xu, Liubo Ouyang, Yang Liu:
Task-Driven and Experience-Based Question Answering Corpus for In-Home Robot Application in the House3D Virtual Environment. 6232-6239 - Tom Vanallemeersch, Arne Defauw, Sara Szoc, Alina Kramchaninova, Joachim Van den Bogaert, Andrea Lösch:
ELRC Action: Covering Confidentiality, Correctness and Cross-linguality. 6240-6249 - Sarvesh Soni, Meghana Gudala, Atieh Pajouhi, Kirk Roberts:
RadQA: A Question Answering Dataset to Improve Comprehension of Radiology Reports. 6250-6259 - Ankush Agarwal, Raj Gite, Shreya Laddha, Pushpak Bhattacharyya, Satyanarayan Kar, Asif Ekbal, Prabhjit Thind, Rajesh Zele, Ravi Shankar:
Knowledge Graph - Deep Learning: A Case Study in Question Answering in Aviation Safety Domain. 6260-6270 - Justin Wood, Corey W. Arnold, Wei Wang:
A Bayesian Topic Model for Human-Evaluated Interpretability. 6271-6279 - Stefano Faralli, Andrea Lenzi, Paola Velardi:
A Large Interlinked Knowledge Graph of the Italian Cultural Heritage. 6280-6289 - Kenneth Church, Xingyu Cai, Yuchen Bian:
Training on Lexical Resources. 6290-6299 - Filip Cornell, Chenda Zhang, Jussi Karlgren, Sarunas Girdzijauskas:
Challenging the Assumption of Structure-based embeddings in Few- and Zero-shot Knowledge Graph Completion. 6300-6309 - Andis Lagzdins, Uldis Silins, Toms Bergmanis, Marcis Pinnis, Arturs Vasilevskis, Andrejs Vasiljevs:
Open Terminology Management and Sharing Toolkit for Federation of Terminology Databases. 6310-6316 - Annika Marie Schoene, Nina Dethlefs, Sophia Ananiadou:
RELATE: Generating a linguistically inspired Knowledge Graph for fine-grained emotion classification. 6317-6327 - Nina Markl, Stephen Joseph McNulty:
Language technology practitioners as language managers: arbitrating data bias and predictive bias in ASR. 6328-6339 - Zaid Alyafeai, Maraim Masoud, Mustafa Ghaleb, Maged Saeed AlShaibani:
Masader: Metadata Sourcing for Arabic Text and Speech Data Resources. 6340-6351 - Cécile Robin, Gautham Vadakkekara Suresh, Víctor Rodríguez-Doncel, John P. McCrae, Paul Buitelaar:
Linghub2: Language Resource Discovery Tool for Language Technologies. 6352-6360 - Yu-Hsiang Tseng, Cing-Fang Shih, Pin-Er Chen, Hsin-Yu Chou, Mao-Chang Ku, Shu-Kai Hsieh:
CxLM: A Construction and Context-aware Language Model. 6361-6369 - Oufan Hai, Matthew Sundberg, Katherine Trice, Rebecca Friedman, Scott Grimm:
The Lexometer: A Shiny Application for Exploratory Analysis and Visualization of Corpus Data. 6370-6376 - Frankie Robertson, Li-Hsin Chang, Sini Söyrinki:
TallVocabL2Fi: A Tall Dataset of 15 Finnish L2 Learners' Vocabulary. 6377-6386 - Muskan Garg, Chandni Saxena, Sriparna Saha, Veena Krishnan, Ruchi Joshi, Vijay Mago:
CAMS: An Annotated Corpus for Causal Analysis of Mental Health Issues in Social Media Posts. 6387-6396 - Xiaohan Zhang, Shaonan Wang, Chengqing Zong:
How Does the Experimental Setting Affect the Conclusions of Neural Encoding Models? 6397-6404 - Elma Kerz, Yu Qiao, Sourabh Zanwar, Daniel Wiechmann:
SPADE: A Big Five-Mturk Dataset of Argumentative Speech Enriched with Socio-Demographics for Personality Detection. 6405-6419 - Vishwa Gupta, Gilles Boulianne:
Progress in Multilingual Speech Recognition for Low Resource Languages Kurmanji Kurdish, Cree and Inuktut. 6420-6428 - Alberto García-Durán, Akhil Arora, Robert West:
Efficient Entity Candidate Generation for Low-Resource Languages. 6429-6438 - Heather C. Lent, Kelechi Ogueji, Miryam de Lhoneux, Orevaoghene Ahia, Anders Søgaard:
What a Creole Wants, What a Creole Needs. 6439-6449 - Alexander Gutkin, Cibu Johny, Raiomond Doctor, Lawrence Wolf-Sonkin, Brian Roark:
Extensions to Brahmic script processing within the Nisaba library: new scripts, languages and utilities. 6450-6460 - Jonathan Dunn, Haipeng Li, Damian Sastre:
Predicting Embedding Reliability in Low-Resource Settings Using Corpus Similarity Measures. 6461-6470 - Idris Abdulmumin, Satya Ranjan Dash, Musa Abdullahi Dawud, Shantipriya Parida, Shamsuddeen Hassan Muhammad, Ibrahim Said Ahmad, Subhadarshi Panda, Ondrej Bojar, Bashir Shehu Galadanci, Bello Shehu Bello:
Hausa Visual Genome: A Dataset for Multi-Modal English to Hausa Machine Translation. 6471-6479 - Ebelechukwu Nwafor, Anietie Andy:
A Survey of Machine Translation Tasks on Nigerian Languages. 6480-6486 - Tiezheng Yu, Rita Frieske, Peng Xu, Samuel Cahyawijaya, Cheuk Tung Shadow Yiu, Holy Lovenia, Wenliang Dai, Elham J. Barezi, Qifeng Chen, Xiaojuan Ma, Bertram E. Shi, Pascale Fung:
Automatic Speech Recognition Datasets in Cantonese: A Survey and New Dataset. 6487-6494 - Ratchakrit Arreerard, Stephen Mander, Scott Piao:
Survey on Thai NLP Language Resources and Tools. 6495-6505 - Nankai Lin, Yingwen Fu, Chuwei Chen, Ziyu Yang, Shengyi Jiang:
LaoPLM: Pre-trained Language Models for Lao. 6506-6512 - Ghattas Eid, Esther Seyffarth, Ingo Plag:
The Maaloula Aramaic Speech Corpus (MASC): From Printed Material to a Lemmatized and Time-Aligned Corpus. 6513-6520 - Khang Le, Hien Nguyen, Tung Le Thanh, Minh Nguyen:
VIMQA: A Vietnamese Dataset for Advanced Reasoning and Explainable Multi-hop Question Answering. 6521-6529 - Jonathan Dunn, Wikke Nijhof:
Language Identification for Austronesian Languages. 6530-6539 - Andrés Chandía:
A Mapudüngun FST Morphological Analyser and its Web Interface. 6540-6547 - Jan Christian Blaise Cruz, Charibeth Cheng:
Improving Large-scale Language Models and Resources for Filipino. 6548-6555 - Shankar Mahadevan, Rahul Ponnusamy, Prasanna Kumar Kumaresan, Prabakaran Chandran, Ruba Priyadharshini, Sangeetha Sivanesan, Bharathi Raja Chakravarthi:
Thirumurai: A Large Dataset of Tamil Shaivite Poems and Classification of Tamil Pann. 6556-6562 - Sanjib Narzary, Maharaj Brahma, Mwnthai Narzary, Gwmsrang Muchahary, Pranav Kumar Singh, Apurbalal Senapati, Sukumar Nandi, Bidisha Som:
Generating Monolingual Dataset for Low Resource Language Bodo from old books using Google Keep. 6563-6570 - Dhrubajyoti Pathak, Sukumar Nandi, Priyankoo Sarmah:
AsNER - Annotated Dataset and Baseline for Assamese Named Entity recognition. 6571-6577 - Fitsum Gaim, Wonsuk Yang, Jong C. Park:
GeezSwitch: Language Identification in Typologically Related Low-resourced East African Languages. 6578-6584 - Paraskevi Platanou, John Pavlopoulos, Georgios Papaioannou:
Handwritten Paleographic Greek Text Recognition: A Century-Based Approach. 6585-6589 - Hiroki Chida, Yohei Murakami, Mondheera Pituxcoosuvarn:
Quality Control for Crowdsourced Bilingual Dictionary in Low-Resource Languages. 6590-6596 - Bruce Harold Oliver, Clarissa Forbes, Changbing Yang, Farhan Samir, Edith Coates, Garrett Nicolai, Miikka Silfverberg:
An Inflectional Database for Gitksan. 6597-6606 - Jackson L. Lee, Litong Chen, Charles Lam, Chaak Ming Lau, Tsz-Him Tsui:
PyCantonese: Cantonese Linguistics and NLP in Python. 6607-6611 - Teshome Mulugeta Ababu, Michael Melese Woldeyohannis:
Afaan Oromo Hate Speech Detection and Classification on Social Media. 6612-6619 - Ryohei Sasano:
Cross-lingual Linking of Automatically Constructed Frames and FrameNet. 6620-6625 - Ana-Maria Barbu, Verginica Barbu Mititelu, Catalin Mititelu:
Aligning the Romanian Reference Treebank and the Valence Lexicon of Romanian Verbs. 6626-6634 - Lucelene Lopes, Magali Sanches Duran, Paulo Fernandes, Thiago A. S. Pardo:
PortiLexicon-UD: a Portuguese Lexical Resource according to Universal Dependencies Model. 6635-6643 - Andargachew Mekonnen Gezmu, Andreas Nürnberger, Tesfaye Bayu Bati:
Extended Parallel Corpus for Amharic-English Machine Translation. 6644-6653 - Cheikh M. Bamba Dione, Alla Lo, Elhadji Mamadou Nguer, Sileye O. Ba:
Low-resource Neural Machine Translation: Benchmarking State-of-the-art Transformer for Wolof ↔ French. 6654-6661 - Isin Demirsahin, Cibu Johny, Alexander Gutkin, Brian Roark:
Criteria for Useful Automatic Romanization in South Asian Languages. 6662-6673 - Yuqian Dai, Marc de Kamps, Serge Sharoff:
BERTology for Machine Translation: What BERT Knows about Linguistic Difficulties for Translation. 6674-6690 - Ye Jia, Michelle Tadmor Ramanovich, Quan Wang, Heiga Zen:
CVSS Corpus and Massively Multilingual Speech-to-Speech Translation. 6691-6703 - Makoto Morishita, Katsuki Chousa, Jun Suzuki, Masaaki Nagata:
JParaCrawl v3.0: A Large-scale English-Japanese Parallel Corpus. 6704-6710 - Hwichan Kim, Sangwhan Moon, Naoaki Okazaki, Mamoru Komachi:
Learning How to Translate North Korean through South Korean. 6711-6718 - Wenhao Zhu, Shujian Huang, Tong Pu, Pingxuan Huang, Xu Zhang, Jian Yu, Wei Chen, Yanfeng Wang, Jiajun Chen:
FGraDA: A Dataset and Benchmark for Fine-Grained Domain Adaptation in Machine Translation. 6719-6727 - Sebastian Nehrdich:
SansTib, a Sanskrit - Tibetan Parallel Corpus and Bilingual Sentence Embedding Model. 6728-6734 - Yihang Li, Shuichiro Shimizu, Weiqi Gu, Chenhui Chu, Sadao Kurohashi:
VISA: An Ambiguous Subtitles Dataset for Visual Scene-aware Machine Translation. 6735-6743 - Kazuki Tani, Ryoya Yuasa, Kazuki Takikawa, Akihiro Tamura, Tomoyuki Kajiwara, Takashi Ninomiya, Tsuneo Kato:
A Benchmark Dataset for Multi-Level Complexity-Controllable Machine Translation. 6744-6752 - Séamus Lankford, Haithem Afli, Orla Ni Loinsigh, Andy Way:
gaHealth: An English-Irish Bilingual Corpus of Health Data. 6753-6758 - Rebecca Knowles, Patrick Littell:
Translation Memories as Baselines for Low-Resource Machine Translation. 6759-6767 - Zhen Wang, Xu Shan, Xiangxie Zhang, Jie Yang:
N24News: A New Dataset for Multimodal News Classification. 6768-6775 - Josiah Wang, Josiel Figueiredo, Lucia Specia:
MultiSubs: A Large-scale Multimodal and Multilingual Dataset. 6776-6785 - Wenliang Dai, Samuel Cahyawijaya, Tiezheng Yu, Elham J. Barezi, Peng Xu, Cheuk Tung Yiu, Rita Frieske, Holy Lovenia, Genta Indra Winata, Qifeng Chen, Xiaojuan Ma, Bertram E. Shi, Pascale Fung:
CI-AVSR: A Cantonese Audio-Visual Speech Datasetfor In-car Command Recognition. 6786-6793 - Nobukatsu Hojo, Satoshi Kobashikawa, Saki Mizuno, Ryo Masumura:
Multimodal Negotiation Corpus with Various Subjective Assessments for Social-Psychological Outcome Prediction from Non-Verbal Cues. 6794-6801 - Shuo Xu, Yuxiang Jia, Changyong Niu, Hongying Zan:
MMDAG: Multimodal Directed Acyclic Graph Network for Emotion Recognition in Conversation. 6802-6807 - Jin Yea Jang, Han-Mu Park, Saim Shin, Suna Shin, Byungcheon Yoon, Gahgene Gweon:
Automatic Gloss-level Data Augmentation for Sign Language Translation. 6808-6813 - Kento Tanaka, Taichi Nishimura, Hiroaki Nanjo, Keisuke Shirai, Hirotaka Kameko, Masatake Dantsuji:
Image Description Dataset for Language Learners. 6814-6821 - Bruno Cardoso, Neil Cohn:
The Multimodal Annotation Software Tool (MAST). 6822-6828 - Gerald Schwiebert, Cornelius Weber, Leyuan Qu, Henrique Siqueira, Stefan Wermter:
A Multimodal German Dataset for Automatic Lip Reading Systems and Transfer Learning. 6829-6836 - Muskan Garg, Seema Wazarkar, Muskaan Singh, Ondrej Bojar:
Multimodality for NLP-Centered Applications: Resources, Advances and Frontiers. 6837-6847 - Fredrik Carlsson, Philipp Eisen, Faton Rekathati, Magnus Sahlgren:
Cross-lingual and Multilingual CLIP. 6848-6854 - Mohammad Faiyaz Khan, S. M. Sadiq-Ur-Rahman Shifath, Md Saiful Islam:
BAN-Cap: A Multi-Purpose English-Bangla Image Descriptions Dataset. 6855-6865 - Naoki Kimura, Zixiong Su, Takaaki Saeki, Jun Rekimoto:
SSR7000: A Synchronized Corpus of Ultrasound Tongue Imaging for End-to-End Silent Speech Recognition. 6866-6873 - Yang Zhao, Hiroshi Kanayama, Issei Yoshida, Masayasu Muraoka, Akiko Aizawa:
A Simple Yet Effective Corpus Construction Method for Chinese Sentence Compression. 6874-6883 - Han Huang, Tomoyuki Kajiwara, Yuki Arase:
JADE: Corpus for Japanese Definition Modelling. 6884-6888 - Jiashu Pu, Ziyi Huang, Yadong Xi, Guandan Chen, Weijie Chen, Rongsheng Zhang:
Unraveling the Mystery of Artifacts in Machine Generated Text. 6889-6898 - Ernie Chang, Alisa Kovtunova, Stefan Borgwardt, Vera Demberg, Kathryn Chapman, Hui-Syuan Yeh:
Logic-Guided Message Generation from Raw Real-Time Sensor Data. 6899-6908 - Ayush Kumar, Dhyey Jani, Jay Shah, Devanshu Thakar, Varun Jain, Mayank Singh:
The Bull and the Bear: Summarizing Stock Market Discussions. 6909-6913 - Kévin Espasa, Emmanuel Morin, Olivier Hamon:
Combination of Contextualized and Non-Contextualized Layers for Lexical Substitution in French. 6914-6921 - Mohaddeseh Bastan, Nishant Shankar, Mihai Surdeanu, Niranjan Balasubramanian:
SuMe: A Dataset Towards Summarizing Biomedical Mechanisms. 6922-6931 - Zheng Chen, Hongyu Lin:
CATAMARAN: A Cross-lingual Long Text Abstractive Summarization Dataset. 6932-6937 - Tiberiu Sosea, Chau Pham, Alexander Tekle, Cornelia Caragea, Junyi Jessy Li:
Emotion analysis and detection during COVID-19. 6938-6947 - Sabit Hassan, Shaden Shaar, Kareem Darwish:
Cross-lingual Emotion Detection. 6948-6958 - Yuanchi Zhang, Yang Liu:
DirectQuote: A Dataset for Direct Quotation Extraction and Attribution in News Articles. 6959-6966 - Maxwell A. Weinzierl, Sanda M. Harabagiu:
VaccineLies: A Natural Language Resource for Learning to Recognize Misinformation about the COVID-19 and HPV Vaccines. 6967-6975 - Christoph Turban, Udo Kruschwitz:
Tackling Irony Detection using Ensemble Classifiers. 6976-6984 - Aye Aye Mar, Kiyoaki Shirai:
Automatic Construction of an Annotated Corpus with Implicit Aspects. 6985-6991 - Anupama Ray, Shubham Mishra, Apoorva Nunna, Pushpak Bhattacharyya:
A Multimodal Corpus for Emotion Recognition in Sarcasm. 6992-7003 - Aniruddha Tammewar, Franziska Braun, Gabriel Roccabruna, Sebastian P. Bayerl, Korbinian Riedhammer, Giuseppe Riccardi:
Annotation of Valence Unfolding in Spoken Personal Narratives. 7004-7013 - Yuki Nakayama, Koji Murakami, Gautam Kumar, Sudha Bhingardive, Ikuko Hardaway:
A Large-Scale Japanese Dataset for Aspect-based Sentiment Analysis. 7014-7021 - Haruya Suzuki, Yuto Miyauchi, Kazuki Akiyama, Tomoyuki Kajiwara, Takashi Ninomiya, Noriko Takemura, Yuta Nakashima, Hajime Nagahara:
A Japanese Dataset for Subjective and Objective Sentiment Polarity Classification in Micro Blog Domain. 7022-7028 - Han Qin, Yuanhe Tian, Fei Xia, Yan Song:
Complementary Learning of Aspect Terms for Aspect-based Sentiment Analysis. 7029-7039 - Saugata Bose, Guoxin Su:
Deep One-Class Hate Speech Detection Model. 7040-7048 - Valentin Barrière, Slim Essid, Chloé Clavel:
Opinions in Interactions : New Annotations of the SEMAINE Database. 7049-7055 - Taha Shangipour Ataei, Kamyar Darvishi, Soroush Javdan, Behrouz Minaei-Bidgoli, Sauleh Eetemadi:
Pars-ABSA: a Manually Annotated Aspect-based Sentiment Analysis Benchmark on Farsi Product Reviews. 7056-7060 - Mamta, Asif Ekbal, Pushpak Bhattacharyya, Tista Saha, Alka Kumar, Shikha Srivastava:
HindiMD: A Multi-domain Corpora for Low-resource Sentiment Analysis. 7061-7070 - John Pavlopoulos, Alexandros Xenos, Davide Picca:
Sentiment Analysis of Homeric Text: The 1st Book of Iliad. 7071-7077 - Pegah Safari, Mohammad Sadegh Rasooli, Amirsaeid Moloodi, Alireza Nourian:
The Persian Dependency Treebank Made Universal. 7078-7087 - Jatayu Baxi, Brijesh Bhatt:
GujMORPH - A Dataset for Creating Gujarati Morphological Analyzer. 7088-7095 - Roya Kabiri, Simin Karimi, Mihai Surdeanu:
Informal Persian Universal Dependency Treebank. 7096-7105 - Andrew Zupon, Andrew Carnie, Michael Hammond, Mihai Surdeanu:
Automatic Correction of Syntactic Dependency Annotation Differences. 7106-7112 - Fumikazu Sato, Naoki Yoshinaga, Masaru Kitsuregawa:
Building Large-Scale Japanese Pronunciation-Annotated Corpora for Reading Heteronymous Logograms. 7113-7121 - Won-Ik Cho, Sangwhan Moon, Jong In Kim, Seok Min Kim, Nam Soo Kim:
StyleKQC: A Style-Variant Paraphrase Corpus for Korean Questions and Commands. 7122-7128 - Yuanhe Tian, Han Qin, Fei Xia, Yan Song:
Syntax-driven Approach for Semantic Role Labeling. 7129-7139 - Marcin Wolinski, Bartlomiej Niton, Witold Kieras, Jakub Szymanik:
HerBERT Based Language Model Detects Quantifiers and Their Semantic Properties in Polish. 7140-7146 - Hongchang Bao, Bradley Hauer, Grzegorz Kondrak:
Lexical Resource Mapping via Translations. 7147-7154 - Keigo Takahashi, Danushka Bollegala:
Unsupervised Attention-based Sentence-Level Meta-Embeddings from Contextualised Language Models. 7155-7163 - Sarthak Khanal, Maria Traskowsky, Doina Caragea:
Identification of Fine-Grained Location Mentions in Crisis Tweets. 7164-7173 - Francielle Alves Vargas, Isabelle Carvalho, Fabiana Rodrigues de Góes, Thiago A. S. Pardo, Fabrício Benevenuto:
HateBR: A Large Expert Annotated Corpus of Brazilian Instagram Comments for Offensive Language and Hate Speech Detection. 7174-7183 - Shaoxiong Ji, Tianlin Zhang, Luna Ansari, Jie Fu, Prayag Tiwari, Erik Cambria:
MentalBERT: Publicly Available Pretrained Language Models for Mental Healthcare. 7184-7190 - Yu Yun Liao:
Leveraging Hashtag Networks for Multimodal Popularity Prediction of Instagram Posts. 7191-7198 - Hang Jiang, Yining Hua, Doug Beeferman, Deb Roy:
Annotating the Tweebank Corpus on Named Entity Recognition and Building NLP Models for Social Media Analysis. 7199-7208 - Anietie Andy, Reno Kriz, Sharath Chandra Guntuku, Derry Tanti Wijaya, Chris Callison-Burch:
Did that happen? Predicting Social Media Posts that are Indicative of what happened in a scene: A case study of a TV show. 7209-7214 - Prashant Kodali, Akshala Bhatnagar, Naman Ahuja, Manish Shrivastava, Ponnurangam Kumaraguru:
HashSet - A Dataset For Hashtag Segmentation. 7215-7219 - Oanh Thi Tran, Anh Cong Phung, Ngo Xuan Bach:
Using Convolution Neural Network with BERT for Stance Detection in Vietnamese. 7220-7225 - Taichi Murayama, Shohei Hisada, Makoto Uehara, Shoko Wakamiya, Eiji Aramaki:
Annotation-Scheme Reconstruction for "Fake News" and Japanese Fake News Dataset. 7226-7234 - Juan Manuel Pérez, Damián Ariel Furman, Laura Alonso Alemany, Franco M. Luque:
RoBERTuito: a pre-trained language model for social media text in Spanish. 7235-7243 - Koichiro Ito, Masaki Murata, Tomohiro Ohno, Shigeki Matsubara:
Construction of Responsive Utterance Corpus for Attentive Listening Response Production. 7244-7252 - Christopher Song, David Harwath, Tuka Alhanai, James R. Glass:
Speak: A Toolkit Using Amazon Mechanical Turk to Collect and Validate Speech Audio Recordings. 7253-7258 - Holy Lovenia, Samuel Cahyawijaya, Genta Indra Winata, Peng Xu, Yan Xu, Zihan Liu, Rita Frieske, Tiezheng Yu, Wenliang Dai, Elham J. Barezi, Qifeng Chen, Xiaojuan Ma, Bertram E. Shi, Pascale Fung:
ASCEND: A Spontaneous Chinese-English Dataset for Code-switching in Multi-turn Conversation. 7259-7268 - Jalal Al-Tamimi, Florian Schiel, Ghada Khattab, Navdeep Sokhey, Djegdjiga Amazouz, Abdulrahman Dallak, Hajar Moussa:
A Romanization System and WebMAUS Aligner for Arabic Varieties. 7269-7276 - Claytone Sikasote, Antonios Anastasopoulos:
BembaSpeech: A Speech Recognition Corpus for the Bemba Language. 7277-7283 - Viet Dac Lai, Amir Pouran Ben Veyseh, Franck Dernoncourt, Thien Huu Nguyen:
BehanceCC: A ChitChat Detection Dataset For Livestreaming Video Transcripts. 7284-7290 - Sheng Li, Jiyi Li, Qianying Liu, Zhuo Gong:
Adversarial Speech Generation and Natural Speech Recovery for Speech Content Protection. 7291-7297 - Maria Forjó, Daniel Neto, Alberto Abad, H. Sofia Pinto, Joaquim Gago:
A new European Portuguese corpus for the study of Psychosis through speech analysis. 7298-7304 - Aghilas Sini, Damien Lolive, Nelly Barbot, Pierre Alain:
Investigating Inter- and Intra-speaker Voice Conversion using Audiobooks. 7305-7313 - Thomas Rolland, Alberto Abad, Catia Cucchiarini, Helmer Strik:
Multilingual Transfer Learning for Children Automatic Speech Recognition. 7314-7320 - Amir Pouran Ben Veyseh, Viet Dac Lai, Franck Dernoncourt, Thien Huu Nguyen:
BehanceQA: A New Dataset for Identifying Question-Answer Pairs in Video Transcripts. 7321-7327 - Konstantinos M. Dafnis, Evgenia Chroni, Carol Neidle, Dimitris N. Metaxas:
Bidirectional Skeleton-Based Isolated Sign Recognition using Graph Convolutional Networks. 7328-7338 - Woo Hyun Kang, Jahangir Alam, Abderrahim Fathan:
Deep learning-based end-to-end spoken language identification system for domain-mismatched scenario. 7339-7343 - Tomoki Kitagawa, Chee Siang Leow, Hiromitsu Nishizaki:
Handwritten Character Generation using Y-Autoencoder for Character Recognition Model Training. 7344-7351 - Lis Kanashiro Pereira:
Attention-Focused Adversarial Training for Robust Temporal Reasoning. 7352-7359 - Kornraphop Kawintiranon, Lisa Singh:
PoliBERTweet: A Pre-trained Language Model for Analyzing Political Content on Twitter. 7360-7367 - Irina Stenger, Philip Georgis, Tania Avgustinova, Bernd Möbius, Dietrich Klakow:
Modeling the Impact of Syntactic Distance and Surprisal on Cross-Slavic Text Comprehension. 7368-7376 - Vinura Dhananjaya, Piyumal Demotte, Surangika Ranathunga, Sanath Jayasena:
BERTifying Sinhala - A Comprehensive Analysis of Pre-trained Language Models for Sinhala Text Classification. 7377-7385 - Jón Guðhnason, Hrafn Loftsson:
Pre-training and Evaluating Transformer-based Language Models for Icelandic. 7386-7391
manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.