2012 Volume E95.D Issue 4 Pages 982-988
In Digital Library (DL) applications, digital book clustering is an important and urgent research task. However, it is difficult to conduct effectively because of the great length of digital books. To do the correct clustering for digital books, a novel method based on probabilistic topic model is proposed. Firstly, we build a topic model named LDAC. The main goal of LDAC topic modeling is to effectively extract topics from digital books. Subsequently, Gibbs sampling is applied for parameter inference. Once the model parameters are learned, each book is assigned to the cluster which maximizes the posterior probability. Experimental results demonstrate that our approach based on LDAC is able to achieve significant improvement as compared to the related methods.