One pass preprocessing for token-based source code clone detection | IEEE Conference Publication | IEEE Xplore

One pass preprocessing for token-based source code clone detection


Abstract:

Token-based source code clones detection provides a promising way to detect the source code duplication and re-dundancy. While preprocessing of clone detection plays an i...Show More

Abstract:

Token-based source code clones detection provides a promising way to detect the source code duplication and re-dundancy. While preprocessing of clone detection plays an important role in KDD for further processing as the old saying goes: well begun is half done. However, processing unstructured source code files of large software systems is really challenging and time or space consuming. This paper introduces a novel way to clean, tokenize and transform the source code into the appropriate form for mining. A tool called OPP (One Pass Preprocessor) has been developed to preprocess the source code files efficiently and flexibly. The paper experimented on three large open source projects like Wildfly1.02 Linux core-3.6, VTK of different host languages, and the result showed that our tool has great power and flexibility to preprocess the source code files and products high quality output.
Date of Conference: 29-31 October 2014
Date Added to IEEE Xplore: 11 December 2014
Electronic ISBN:978-1-4799-7373-6

ISSN Information:

Conference Location: Paris, France

References

References is not available for this document.