Automatic Compilation of Resources for Academic Writing and Evaluating with Informal Word Identification and Paraphrasing System

Seid Muhie Yimam, Gopalakrishnan Venkatesh, John Lee, Chris Biemann


Abstract
We present the first approach to automatically building resources for academic writing. The aim is to build a writing aid system that automatically edits a text so that it better adheres to the academic style of writing. On top of existing academic resources, such as the Corpus of Contemporary American English (COCA) academic Word List, the New Academic Word List, and the Academic Collocation List, we also explore how to dynamically build such resources that would be used to automatically identify informal or non-academic words or phrases. The resources are compiled using different generic approaches that can be extended for different domains and languages. We describe the evaluation of resources with a system implementation. The system consists of an informal word identification (IWI), academic candidate paraphrase generation, and paraphrase ranking components. To generate candidates and rank them in context, we have used the PPDB and WordNet paraphrase resources. We use the Concepts in Context (CoInCO) “All-Words” lexical substitution dataset both for the informal word identification and paraphrase generation experiments. Our informal word identification component achieves an F-1 score of 82%, significantly outperforming a stratified classifier baseline. The main contribution of this work is a domain-independent methodology to build targeted resources for writing aids.
Anthology ID:
2020.lrec-1.722
Volume:
Proceedings of the Twelfth Language Resources and Evaluation Conference
Month:
May
Year:
2020
Address:
Marseille, France
Editors:
Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
5896–5904
Language:
English
URL:
https://aclanthology.org/2020.lrec-1.722
DOI:
Bibkey:
Cite (ACL):
Seid Muhie Yimam, Gopalakrishnan Venkatesh, John Lee, and Chris Biemann. 2020. Automatic Compilation of Resources for Academic Writing and Evaluating with Informal Word Identification and Paraphrasing System. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 5896–5904, Marseille, France. European Language Resources Association.
Cite (Informal):
Automatic Compilation of Resources for Academic Writing and Evaluating with Informal Word Identification and Paraphrasing System (Yimam et al., LREC 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.lrec-1.722.pdf
Code
 uhh-lt/par4Acad