Abstract:
Document clustering is a significantly popularresearch, which aims to partition a corpus into many subgroupsof homogeneous documents. Traditional clustering approachescat...Show MoreMetadata
Abstract:
Document clustering is a significantly popularresearch, which aims to partition a corpus into many subgroupsof homogeneous documents. Traditional clustering approachescatholically lack of considerations of word weights with clusters. To address this problem, we propose an Adaptive CentroidbasedClustering (ACC) algorithm. As a successful supervisedcentroid-based classifier, Class-Feature-Centroid (CFC) algorithmtakes relationships among words into account. ACCattempts to employ this discriminative CFC vector to drive theclustering procedure. Since clustering is unsupervised, ACCbegins with hundreds of small clusters for acceptable CFCvectors, and then iteratively regroups clusters of documentsuntil convergence. As ACC is self-organized, it can determinethe number of clusters adaptively. The experimental resultsvalidate that ACC achieves competitive performance with thestate-of-art clustering approaches.
Published in: 2014 Sixth International Symposium on Parallel Architectures, Algorithms and Programming
Date of Conference: 13-15 July 2014
Date Added to IEEE Xplore: 07 October 2014
ISBN Information: