Abstract
Intrinsic plagiarism detection deals with the task of finding plagiarized sections of text documents without using a reference corpus. This paper describes a novel approach to this task by processing and analyzing the grammar of a suspicious document. The main idea is to split a text into single sentences and to calculate grammar trees. To find suspicious sentences, these grammar trees are compared in a distance matrix by using the pq-gram-distance, an alternative for the tree edit distance. Finally, significantly different sentences regarding their grammar and with respect to the Gaussian normal distribution are marked as suspicious.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Augsten, N., Böhlen, M., Gamper, J.: The pq-Gram Distance between Ordered Labeled Trees. ACM Transactions on Database Systems (2010)
Bille, P.: A survey on tree edit distance and related problems. Theoretical Computuer Science 337, 217–239 (2005)
Catherine De Marneffe, M., Maccartney, B., Manning, C.D.: Generating typed dependency parses from phrase structure parses. In: LREC (2006)
Karlgren, J.: Stylistic Experiments For Information Retrieval. PhD thesis, Swedish Institute for Computer Science (2000)
Kestemont, M., Luyckx, K., Daelemans, W.: Intrinsic Plagiarism Detection Using Character Trigram Distance Scores. In: CLEF 2011 Labs and Workshop, Notebook Papers, Amsterdam, The Netherlands (2011)
Klein, D., Manning, C.D.: Accurate unlexicalized parsing. In: Proceedings of the 41st Annual Meeting on Association for Computational Linguistics, ACL 2003, Stroudsburg, PA, USA, vol. 1, pp. 423–430 (2003)
Marcus, M.P., Marcinkiewicz, M.A., Santorini, B.: Building a large annotated corpus of English: The Penn Treebank. Comp. Linguistics Linguistics (June 1993)
Oberreuter, G., L’Huillier, G., Ríos, S.A., Velásquez, J.D.: Approaches for Intrinsic and External Plagiarism Detection. In: CLEF 2011 Labs and Workshop, Notebook Papers, Amsterdam, The Netherlands (2011)
Potthast, M., Eiselt, A., Barrón-Cedeño, A., Stein, B., Rosso, P.: Overview of the 3rd International Competition on Plagiarism Detection. In: Petras, V., Forner, P., Clough, P. (eds.) Notebook Papers of CLEF 11 Labs and Workshops (2011)
Potthast, M., Stein, B., Barrón-Cedeño, A., Rosso, P.: An Evaluation Framework for Plagiarism Detection. In: Proceedings of the 23rd International Conference on Computational Linguistics (COLING 2010), Beijing, China (August 2010)
Seaward, L., Matwin, S.: Intrinsic Plagiarism Detection using Complexity Analysis. In: CLEF (Notebook Papers/Labs/Workshop) (2009)
Stamatatos, E.: Intrinsic Plagiarism Detection Using Character n-gram Profiles. In: CLEF (Notebook Papers/Labs/Workshop) (2009)
Stamatatos, E., Kokkinakis, G., Fakotakis, N.: Automatic text categorization in terms of genre and author. Comput. Linguist. 26, 471–495 (2000)
Stevenson, M., Gaizauskas, R.: Experiments on sentence boundary detection. In: Proc. of the 6th Conference on Applied Natural Language Processing, ANLC 2000, Stroudsburg, PA, USA, pp. 84–89 (2000)
The Stanford Parser, http://nlp.stanford.edu/software/lex-parser.shtml (visited January 2012)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Tschuggnall, M., Specht, G. (2012). Plag-Inn: Intrinsic Plagiarism Detection Using Grammar Trees. In: Bouma, G., Ittoo, A., Métais, E., Wortmann, H. (eds) Natural Language Processing and Information Systems. NLDB 2012. Lecture Notes in Computer Science, vol 7337. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31178-9_35
Download citation
DOI: https://doi.org/10.1007/978-3-642-31178-9_35
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-31177-2
Online ISBN: 978-3-642-31178-9
eBook Packages: Computer ScienceComputer Science (R0)