Wang 2008
Wang 2008
                                                                                     307
summarization. Our proposed method including SLSS and                sentence similarity matrix, we perform the symmetric ma-
SNMF algorithm is described in Section 3. Various exper-             trix factorization to group these sentences into clusters in
iments are set up and the results are shown in Section 4.            the second phase. Full explanations of the proposed SNMF
Section 5 concludes.                                                 algorithm will be presented in section 3.3. Finally, in each
                                                                     cluster, we identify the most semantically important sen-
2.   RELATED WORK                                                    tence using a measure combining the internal information
                                                                     (e.g., the computed similarity between sentences) and the
   Multiple document summarization has been widely stud-             external information (e.g., the given topic information). Sec-
ied recently. In general, there are two types of summariza-          tion 3.4 will discuss the sentence selection phase in detail.
tion: extractive summarization and abstractive summariza-            These selected sentences finally form the summary.
tion [16, 15]. Extractive summarization usually ranks the
sentences in the documents according to their scores calcu-
lated by a set of predefined features, such as term frequency-
inverse sentence frequency (TF-ISF) [26, 20], sentence or
term position [20, 33], and number of keywords [33]. Ab-
stractive summarization involves information fusion, sen-
tence compression and reformulation [16, 15]. In this paper,
we study sentence-based extractive summarization.
   Gong et al. [12] propose a method using latent semantic
analysis (LSA) to select highly ranked sentences for summa-
rization. [11] proposes a maximal marginal relevance (MMR)
method to summarize documents based on the cosine simi-
larity between a query and a sentence and also the sentence
and previously selected sentences. MMR method tends to
remove redundancy, however the redundancy is controlled
by a parameterized model which actually can be automat-
                                                                         Figure 1: Overview of our proposed method
ically learned. Other methods include NMF-based topic
specific summarization [24], CRF-based summarization [29],
and hidden Markov model (HMM) based method [5]. Some
DUC2005 and DUC2006 participants achieve good perfor-
                                                                     3.2 Semantic Similarity Matrix Construction
mance such as Language Computer Corporation (LCC) [1],                  After removing stemming and stopping words, we trunk
that proposes a system combining the question-answering              the documents in the same topic into sentences. Simple
and summarization system and using k-Nearest Neighbor                word-matching types of similarity such as cosine can not
clustering based on cosine similarity for the sentence selec-        faithfully capture the content similarity. Also the sparseness
tion. In addition, some graph-ranking based methods are              of words between similar concepts make the similarity metric
also proposed [22, 9]. Most of these methods ignore the de-          uneven. Thus, in order to understand the semantic meanings
pendency syntax in the sentence level and just focus on the          of the sentences, we perform semantic role analysis on them
keyword co-occurrence. Thus the hidden relationships be-             and propose a method to calculate the semantic similarity
tween sentences need to be further discovered. The method            between any pair of sentences.
proposed in [13] groups sentences based on the semantic role
analysis, however the work does not make full use of clus-
                                                                     3.2.1 Semantic Role Parsing
tering algorithms.                                                      A semantic role is “a description of the relationship that
   In our work, we propose a new framework based on sentence-        a constituent plays with respect to the verb in the sen-
level semantic analysis (SLSS) and symmetric non-negative            tence” [3]. Semantic role analysis plays very important role
matrix factorization (SNMF). SLSS can better capture the             in semantic understanding. The semantic role labeler we use
relationships between sentences in a semantic manner and             in this work is based on PropBank semantic annotation [23].
SSNF can factorize the similarity matrix to obtain meaning-          The basic idea is that each verb in a sentence is labeled with
ful groups of sentences. Experimental results demonstrate            its propositional arguments, and the labeling for each par-
the effectiveness of our proposed framework.                         ticular verb is called a “frame”. Therefore, for each sentence,
                                                                     the number of frames generated by the parser equals to the
                                                                     number of verbs in the sentence. There is a set of abstract
3.   THE PROPOSED METHOD                                             arguments indicating the semantic role of each term in a
                                                                     frame. For example, Arg0 is typically the actor, and Arg1
3.1 Overview                                                         is the thing acted upon. The full representation of the ab-
  Figure 1 demonstrates the framework of our proposed ap-            stract arguments [23] and an illustrative example are shown
proach. Given a set of documents which need to be sum-               in Table 1.
marized, first of all, we clean these documents by removing
formatting characters. In the similarity matrix construction         3.2.2 Pairwise Semantic Similarity Calculation
phase, we decompose the set of documents into sentences,               Given sentence Si and Sj , now we calculate the similarity
and then parse each sentence into frame(s) using a semantic          between them. Suppose Si and Sj are parsed into frames
role parser. Pairwise sentence semantic similarity is cal-           respectively. For each pair of frames fm ∈ Si and fn ∈
culated based on both the semantic role analysis [23] and            Sj , we discover the semantic relations of terms in the same
word relation discovery using WordNet [10]. Section 3.2 will         semantic role using WordNet [10]. If two words in the same
describe this phase in detail. Once we have the pairwise             semantic role are identical or of the semantic relations such
                                                               308
 rel: the verb                      Arg0: causer of motion                                3.3.1 Problem Formulation and Algorithm Procedure
 Arg1: thing in motion              Arg2: distance moved
                                                                                            Given a pairwise similarity matrix W , we want to find H
 Arg3: start point                  Arg4: end point
                                                                                          such that
 Arg5: direction                    ArgM-LOC: location
 ArgM-EXT: extent                   ArgM-TMP: time                                                                    min J = ||W − HH T ||2 .                                             (4)
                                                                                                                      H≥0
 ArgM-DIS: discourse connectives ArgM-PNC: purpose
 ArgM-ADV: general-purpose          ArgM-MNR: manner                                      where the matrix norm ||X||2 =            2
                                                                                                                               ij Xij is the Frobenius
 ArbM-NEG: negation marker          ArgM-DIR: direction                                   norm. To derive the updating rule for Eq.(4) with non-
 ArgM-MOD: modal verb               ArgM-CAU: cause                                       negative constraints, hij ≥ 0, we introduce the Lagrangian
 Example:                                                                                 multipliers λij and let L = J + ij λij Hij . The first order
 Sentence: A number of marine plants are                                                  KKT condition for local minima is
 harvested commercially in Nova Scotia.
                                                                                                  ∂L       ∂J
 Label : A|Arg1 number|Arg1 of|Arg1 marine|Arg1                                                        =        + λij = 0, and λij Hij = 0, ∀i, j.
 plants|Arg1 are|- harvested|rel commercially|ArgM-MNR                                           ∂Hij     ∂Hij
 in|ArgM-LOC Nova|ArgM-LOC Scotia|ArgM-LOC .|-                                            Note that ∂H∂J
                                                                                                         = −4W H + 4HH T H. Hence the KKT condi-
                                                                                          tion leads to the fixed point relation:
Table 1: Representation of arguments and an illus-
trative example.                                                                                            (−4W H + 4HH T H)ij Hij = 0                                                    (5)
                                                                                            Using gradient descent method, we have
as synonym, hypernym, hyponym, meronym and holonym,                                                                                                     ∂J
the words are considered as “related”.                                                                                   Hij ← Hij − ij                                                   (6)
                                                                                                                                                       ∂Hij
   Let Rm and Rn be the semantic roles in fm and fn , re-
                                                                                                                         H
spectively. Assume Rm ≤ Rn . Let {r1 , r2 , ..., rk } be the set                          Setting ij = (8HH TijH) , we obtain the NMF style multi-
                                                                                                                  ij
of K common semantic roles between fm and fn , Tm (ri ) be                                plicative updating rule for SNMF:
the term set of fm in role ri , and Tn (ri ) be the term set of
fn in role ri . Let |Tm (ri )| ≤ |Tn (ri )|, then we compute the                                                             1            (W H)ij
                                                                                                            Hij ←              [Hij (1 +            ]                                      (7)
similarity between Tm (ri ) and Tn (ri ) as:                                                                                 2           (HH T H)ij
                                                        j   tsim(tm
                                                                  ij , ri )
                                                                                          Hence, the algorithm procedure for solving SNMF is: given
          rsim(Tm (ri ), Tn (ri )) =                                          (1)         an initial guess of H, iteratively update H using Eq.(7) until
                                                            |Tn (ri )|
                                                                                          convergence. This gradient descent method will converge to
where                                                                                     a local minima of the problem.
                                         tm                n
                              
pute pairwise sentence similarity for the given document col-                                                      wij =            (HH T )ij =                |hs |2
lection and construct the symmetric semantic similarity ma-                                                ij                    ij                       s=1
trix for further analysis.
                                                                                          where |h| is the L1 norm of vector h. Hence ||hs || ≥ 0.
3.3 Symmetric Non-negative Matrix                                                         Therefore, we have
    Factorization (SNMF)                                                                                                              0,               if s 6= t
                                                                                                                    hTs ht =
  Most document clustering algorithms deal with a rectan-                                                                             khs k2 ,         if s = t
                                                                                                                                 
                                                                                    309
   Another important property is that the simple SNMF is                                                 factorization is proposed to simultaneously cluster the rows
equivalent to sophisticated normalized cut spectral cluster-                                             and the columns of the input data matrix X [8]
ing. Spectral clustering is a principled and effective ap-
proach for solving Normalized Cuts [30], a NP-hard opti-                                                                          X ≈ F SGT .                       (13)
mization problem. Given the adjacent matrix W of a graph,                                                Note that S provides additional degrees of freedom such that
it can be easily seen that the following SNMF                                                            the low-rank matrix representation remains accurate while
                                                                                                         F gives row clusters and G gives column clusters. This form
                          min                      ||W − HH T ||2 .                          (8)
                    H T H=I,H≥0                                                                          gives a good framework for simultaneously clustering the
                                                                                                         rows and columns of X [6, 18]. An important special case is
where                                                                                                    that the input X contains a matrix of pairwise similarities:
W = D−1/2 W D−1/2 , D = diag(d1 , · · · , dn ), di =                                      wij .         X = X T = W . In this case, F = G = H, S = I. This
                                                                                       j                 reduces to the SNMF:
is equivalent to Normalized Cut spectral clustering.                                                                  min kX − HH T k2 , s.t. H T H = I.
                                                                                                                      H≥0
j k i
For example, the nonzero entries of h1 indicate the data                                                                            F2 (Si ) = Sim(Si , T )         (17)
points belonging to the first cluster. The objective function
of K-means clustering is                                                                                 where F1 (Si ) measures the average similarity score between
                                                                                                         sentence Si and all the other sentences in the cluster Ck ,
                                   κ
                                                                                                         and N is the number of sentences in Ck . F2 (Si ) represents
                       J=                                 kxi − fk k2                      (10)
                                                                                                         the similarity between sentence Si and the given topic T . λ
                                              
                               k=1 i∈Ck
                                                                                                         is the weight parameter.
where fk is the cluster centroid of the k-th cluster Ck of nk
points, i.e., fk = i∈Ck xi /nk . More generally, the objective                                           4. EXPERIMENTS
function of Kernel K-means with mapping xi → φ(xi ) is
                               κ                                                                         4.1 Data Set
                    Jφ =                  
                                                         kφ(xi ) − φ̄k ||2                 (11)             We use the DUC2005 and DUC2006 data sets to test our
                           k=1 i∈Ck                                                                      proposed method empirically, both of which are open bench-
                                                                                                         mark data sets from Document Understanding Conference
where φ̄k is the centroid in the feature space. Using cluster                                            (DUC) for automatic summarization evaluation. Each data
indicators, for K-means and Kernel K-means, the clustering                                               set consists of 50 topics, and Table 2 gives a brief description
problem can be solved via the optimization problem                                                       of the two data sets. The task is to create a summary of no
                           max                           Tr(H T W H),                      (12)          more than 250 words for each topic to answer the informa-
                     H T H=I, H≥0                                                                        tion expressed in the topic statement.
where H is the cluster indicator and Wij = φ(xi )T φ(xj ) is
                                                                                                                                     DUC2005            DUC2006
the kernel. For K-means, φ(xi ) = xi , Wij = xTi xj .
  Note that if we impose the orthogonality constraint on H,                                                 Number of topics            50                50
then                                                                                                      Number of documents        25 ∼ 50              25
                                                                                                          relevant to each topic
  J1    =       arg min        ||W − HH T ||2                                                                  Data source             TREC          AQUAINT corpus
             H T H=I,H≥0
                                                                                                             Summary length          250 words         250 words
        =       arg min        ||W ||2 − 2Tr(H T W H) + ||H T H||2
             H T H=I,H≥0
                                                                                                                  Table 2: Description of the data sets
        =      arg max Tr(H T W H)
             H T H=I,H≥0
                                                                                                         4.2 Implemented Summarization Systems
In other words, SNMF of W = HH T is equivalent to Kernel
                                                                                                            In order to compare our methods, first of all, we imple-
K-means clustering under the orthogonality constraints on
                                                                                                         ment four most widely used document summarization base-
H.
                                                                                                         line systems:
   Nonnegative Matrix Factorization (NMF): SNMF
can also be viewed as a special case of 3-factor nonnega-                                                   • LeadBase: returns the leading sentences of all the
tive matrix factorizations. The 3-factor nonnegative matrix                                                   documents for each topic.
                                                                                                   310
   • Random: selects sentences randomly for each topic.               in this paper, we only report the average F-measure scores
                                                                      generated by ROUGE-1, ROUGE-2, ROUGE-L, ROUGE-
   • LSA: conducts latent semantic analysis on terms by               W and ROUGE-SU to compare our proposed method with
     sentences matrix as proposed in [12].                            other implemented systems.
   • NMFBase: performs NMF on terms by sentences ma-                  4.4 Experimental Results
     trix and ranks the sentences by their weighted scores [17].
                                                                      4.4.1 Overall Performance Comparison
For better evaluating our proposed method, we also imple-
ment alternative solutions for each phase of the summariza-             First of all, we compare the overall performance between
tion procedure as listed in Table 3.                                  our proposed method (using SLSS and SNMF) and all the
                                                                      other implemented systems. Table 4 and Table 5 show
      Phase            Proposed          Alter-      Alter-           the ROUGE evaluation results on DUC2006 and DUC2005
                        method          native 1    native 2          data sets respectively. We clearly observe that our proposed
                                                                      method achieves the highest ROUGE scores and outper-
   Similarity          Semantic        Keyword-      N/A
                                                                      forms all the other systems. In section 4.4.2, 4.4.3 and 4.4.4,
  Measurement          Similarity        based
                                                                      we evaluate each phase of our proposed method and try to
                        (SLSS)         similarity
                                                                      analyze all the factors that our method benefits from.
  Clustering            SNMF           K-means       NMF
  Algorithm                              (KM)                         4.4.2 Evaluation on Methods in Similarity Matrix
 Within-Cluster     Mp = λF1 (Si )       M1 =        M2 =                   Construction
   Sentence         +(1 − λ)F2 (Si )     F1 (Si )    F2 (Si )            Actually, instead of using similarity matrix, many sum-
   Ranking                                                            marization methods directly perform on the terms by sen-
                                                                      tences matrix, such as the LSA and NMFBase which are
Table 3: Different methods implemented in each                        implemented as baseline systems in our experiments. In
phase. Remark: Si is the ith sentence in the clus-                    fact, LSA and NMF give continuous solutions to the same
ter, and the calculation of F1 (Si ) and F2 (Si ) is the              K-means clustering problem [7]. Their difference is the con-
same as described in section 3.4.                                     straints: LSA relaxes the non-negativity of H, while NMF
                                                                      relaxes the orthogonality of H. In NMFbase or LSA, we
   In Table 3, the keyword-based similarity between any pair
                                                                      treat sentence as vectors and clustering them using cosine
of sentences is calculated as the cosine similarity. The pa-
                                                                      similarity metric (since each document is normalized to 1,
rameter λ in Mp is set to 0.7 empirically, and the influence
                                                                      |d1 − d2|2 = 2 − 2cos(d1, d2)). From Table 4 and 5, we can
of λ will be discussed in Section 4.4.4. Note that in our
                                                                      see the results of LSA and NMFbase are similar and all of
experiments, both similarity matrix generation phase and
                                                                      these methods are not satisfactory. This indicates that sim-
sentence extraction phase use the same type of similarity
                                                                      ple word-matching types of similarity such as cosine can not
measurements. Thus, we have 22 implemented summariza-
                                                                      faithfully capture the content similarity.
tion systems: 18 by varying methods in each phase, and
                                                                         Therefore, we further analyze the sentence-level text and
4 baselines. In section 4.4, we will compare our proposed
                                                                      generate pairwise sentence similarity. In the experiments,
method with all the other systems.
                                                                      we compare the proposed sentence-level semantic similarity
4.3 Evaluation Metric                                                 with the traditional keyword-based similarity calculation. In
                                                                      order to better understand the results, we use Figure 2 and 3
  We use ROUGE [19] toolkit (version 1.5.5) to measure                to visually illustrate the comparison. Due to space limit, we
our proposed method, which is widely applied by DUC for               only show ROUGE-1 results in these figures.
performance evaluation. It measures the quality of a sum-
mary by counting the unit overlaps between the candidate
summary and a set of reference summaries. Several au-                                0.4
                                                                                              keyword
tomatic evaluation methods are implemented in ROUGE,                                0.39
                                                                                              SLSS
0.37
ROU GE − N =
                                                                          ROUGE−1
(18) 0.34
                                                                311
                     Systems         ROUGE-1    ROUGE-2    ROUGE-L    ROUGE-W     ROUGE-SU
                 Average-Human        0.45767    0.11249    0.42340    0.15919     0.17060
                DUC2006 Average       0.37959    0.07543    0.34756    0.13001     0.13206
                    LeadBase          0.32082    0.05267    0.29726    0.10993     0.10408
                    Random            0.31749    0.04892    0.29384    0.10779     0.10083
                    NMFBase           0.32374    0.05498    0.30062    0.11341     0.10606
                      LSA             0.33078    0.05022    0.30507    0.11220     0.10226
              KM + M1 (keyword)       0.33605    0.05481    0.31204    0.12450     0.11125
              KM + M2 (keyword)       0.33039    0.04689    0.30394    0.11240     0.10087
              KM + Mp (keyword)       0.33558    0.05920    0.32112    0.12614     0.11229
                KM + M1 (SLSS)        0.35908    0.06087    0.34074    0.12861     0.12328
                KM + M2 (SLSS)        0.35049    0.05931    0.33202    0.12801     0.11763
                KM + Mp (SLSS)        0.36371    0.06182    0.34114    0.12966     0.12503
              N M F + M1 (keyword)    0.33850    0.05851    0.31274    0.12637     0.11348
              N M F + M2 (keyword)    0.33833    0.05087    0.31260    0.11570     0.10662
              N M F + Mp (keyword)    0.33869    0.05891    0.31286    0.12719     0.11403
               N M F + M1 (SLSS)      0.34112    0.05902    0.32016    0.12951     0.11623
               N M F + M2 (SLSS)      0.33882    0.05897    0.31374    0.11650     0.10938
               N M F + Mp (SLSS)      0.34372    0.05941    0.32386    0.12973     0.11706
             SN M F + M1 (keyword)    0.37141    0.08147    0.35946    0.13214     0.13032
             SN M F + M2 (keyword)    0.36934    0.07527    0.34192    0.13011     0.12962
             SN M F + Mp (keyword)    0.38801    0.08304    0.36103    0.13361     0.13187
              SN M F + M1 (SLSS)      0.39012    0.08352    0.36218    0.13802     0.13713
              SN M F + M2 (SLSS)      0.38734    0.08295    0.36052    0.13416     0.13664
              SN M F + Mp (SLSS)      0.39551    0.08549    0.36803    0.13943     0.13981
Table 4: Overall performance comparison on DUC2006 using ROUGE evaluation methods. Remark:
“Average-Human” is the average results of summaries constructed by human summarizers and “DUC2006
Average” lists the average results of the 34 participating teams. The system names are the combinations of
the methods used in each phase. For example, “KM + M2 (keyword)” represents that keyword-based similar-
ity, K-means clustering and M2 ranking measurement are used. Candidate methods for each phase are listed
in Table 3.
                                                   312
              0.37
                        keyword
                        SLSS
                                                                                                                        0.36
              0.36                                                                                                                    NMF
                                                                                                                                      KM
                                                                                                                                      SNMF
                                                                                                                        0.35
              0.35
              0.34                                                                                                      0.34
    ROUGE−1
                                                                                                              ROUGE−1
              0.33
                                                                                                                        0.33
              0.32
                                                                                                                        0.32
0.31
0.31
               0.3
                     NMF+M1   NMF+M2   NMF+Mp   KM+M1    KM+M2    KM+Mp SNMF+M1 SNMF+M2 SNMF+Mp
                                                                                                                         0.3
                                                                                                                                   M1 (keyword) M2 (keyword) Mp (keyword)   M1 (SLSS)   M2 (SLSS)     Mp (SLSS)
rithm achieves the best results. K-means and NMF meth-                                                                  0.37
                                                                                                             ROUGE−1
                                                                                                                        0.33
                                                                                                                               0   0.1      0.2      0.3       0.4      0.5     0.6      0.7        0.8     0.9   1
               0.4                                                                                                                                               Weight Parameter
                         NMF
              0.39       KM
                         SNMF                                                                            Figure 6: Study of weight parameter λ using
              0.38                                                                                       ROUGE-1 on DUC2006 data set
              0.37
              0.36
    ROUGE−1
0.35
0.355
0.34
0.35
0.33
               0.3
                                                                                                           ROUGE−1
0.315
                                                                                                   313
gradually adjust the value of λ, and the results show that                  qa tasks. In Prodeedings of NAACL 2001 workshop on
combining both internal and external information leads to                   Automatic Summarization.
better performance.                                                  [15]   H. Jing and K. McKeown. Cut and paste based text
                                                                            summarization. In Prodeedings of NAACL 2000.
5.   CONCLUSIONS                                                     [16]   K. Knight and D. Marcu. Summarization beyond
   In this paper, we propose a new multi-document summa-                    sentence extraction: a probablistic approach to
rization framework based on sentence-level semantic analy-                  sentence compression. Artificial Intelligence, pages
sis (SLSS) and symmetric non-negative matrix factorization                  91–107, 2002.
(SNMF). SLSS is able to capture the semantic relationships           [17]   D. D. Lee and H. S. Seung. Algorithms for
between sentences and SNMF can divide the sentences into                    non-negative matrix factorization. In NIPS 2001.
groups for extraction. We conduct experiments on DUC2005             [18]   T. Li. A general model for clustering binary data. In
and DUC2006 data sets using ROUGE evaluation meth-                          Proceedings of SIGKDD 2005, pages 188–197.
ods, and the results show the effectiveness of our proposed          [19]   C.-Y. Lin and E.Hovy. Automatic evaluation of
method. The good performance of the proposed framework                      summaries using n-gram co-occurrence statistics. In
benefits from the sentence-level semantic understanding, the                Proceedings of NLT-NAACL 2003.
clustering over symmetric similarity matrix by the proposed          [20]   C.-Y. Lin and E. Hovy. From single to multi-document
SNMF algorithm, and the within-cluster sentence selection                   summarization: A prototype system and its
using both internal and external information.                               evaluation. In Proceedings of ACL 2002.
                                                                     [21]   I. Mani. Automatic summarization. John Benjamins
Acknowledgements                                                            Publishing Company, 2001.
The project is partially supported by a research grant from          [22]   R. Mihalcea and P. Tarau. A language independent
NEC Lab, NSF CAREER Award IIS-0546280, and IBM Fac-                         algorithm for single and multiple document
ulty Research Awards.                                                       summarization. In Proceedings of IJCNLP 2005.
                                                                     [23]   M. Palmer, P. Kingsbury, and D. Gildea. The
6.   REFERENCES                                                             proposition bank: An annotated corpus of semantic
 [1] http://www-nlpir.nist.gov/projects/duc/pubs/.                          roles. Computational Linguistics, pages 71–106, 2005.
 [2] M. Amini and P. Gallinari. The use of unlabeled data            [24]   S. Park, J.-H. Lee, D.-H. Kim, and C.-M. Ahn.
     to improve supervised learning for text                                Multi-document summarization based on cluster using
     summarization. In Prodeedings of SIGIR 2002.                           non-negtive matrix factorization. In Proceedings of
                                                                            SOFSEM 2007.
 [3] D. Arnold, L. Balkan, S. Meijer, R. Humphreys, and
     L. Sadler. Machine Translation: an Introductory                 [25]   D. Radev, E. Hovy, and K. Mckeown. Introduction to
     Guide. Blackwells-NCC, 1994.                                           the special issue on summarization. Computational
                                                                            Linguistics, pages 399–408, 2002.
 [4] C. M. Bishop. Pattern Recognition and Machine
     Learning. Springer, 2006.                                       [26]   D. Radev, H. Jing, M. Stys, and D. Tam.
                                                                            Centroid-based summarization of multiple documents.
 [5] J. Conroy and D. O’Leary. Text summarization via
                                                                            Information Processing and Management, pages
     hidden markov models. In Proceedings of SIGIR 2001.
                                                                            919–938, 2004.
 [6] I. S. Dhillon. Co-clustering documents and words
                                                                     [27]   B. Ricardo and R. Berthier. Modern information
     using bipartite spectral graph partitioning. In
                                                                            retrieval. ACM Press, 1999.
     Proceedings of KDD 2001.
                                                                     [28]   G. Sampathsampath and M. Martinovic. A Multilevel
 [7] C. Ding and X. He. K-means clustering and principal
                                                                            Text Processing Model of Newsgroup Dynamics. 2002.
     component analysis. In Prodeedings of ICML 2004.
                                                                     [29]   D. Shen, J.-T. Sun, H. Li, Q. Yang, and Z. Chen.
 [8] C. Ding, T. Li, W. Peng, and H. Park. Orthogonal
                                                                            Document summarization using conditional random
     nonnegative matrix t-factorizations for clustering. In
                                                                            fields. In Proceedings of IJCAI 2007.
     Proceedings of KDD 2006.
                                                                     [30]   J. Shi and J. Malik. Normalized cuts and image
 [9] G. Erkan and D. Radev. Lexpagerank: Prestige in
                                                                            segmentation. IEEE. Trans. on Pattern Analysis and
     multi-document text summarization. In Proceedings of
                                                                            Machine Intelligence, 22:888–905, 2000.
     EMNLP 2004.
                                                                     [31]   A. Turpin, Y. Tsegay, D. Hawking, and H. Williams.
[10] C. Fellbaum. WordNet: An Electronic Lexical
                                                                            Fast generation of result snippets in web search. In
     Database. MIT Press, 1998.
                                                                            Proceedings of SIGIR 2007.
[11] J. Goldstein, M. Kantrowitz, V. Mittal, and
                                                                     [32]   X. Wan, J. Yang, and J. Xiao. Manifold-ranking based
     J. Carbonell. Summarizing text documents: Sentence
                                                                            topic-focused multi-document summarization. In
     selection and evaluation metrics. In Proceedings of
                                                                            Proceedings of IJCAI 2007.
     SIGIR 1999.
                                                                     [33]   W.-T. Yih, J. Goodman, L. Vanderwende, and
[12] Y. Gong and X. Liu. Generic text summarization
                                                                            H. Suzuki. Multi-document summarization by
     using relevance measure and latent semantic analysis.
                                                                            maximizing informative content-words. In Proceedings
     In Proceedings of SIGIR 2001.
                                                                            of IJCAI 2007.
[13] S. Harabagiu and F. Lacatusu. Topic themes for
                                                                     [34]   H. Zha. Generic summarization and keyphrase
     multi-document summarization. In Prodeedings of
                                                                            extraction using mutual reinforcement principle and
     SIGIR 2005.
                                                                            sentence clustering. In Prodeedings of SIGIR 2005.
[14] T. Hirao, Y. Sasaki, and H. Isozaki. An extrinsic
     evaluation for question-biased text summarization on
314