0% found this document useful (0 votes)
21 views8 pages

Wang 2008

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views8 pages

Wang 2008

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Multi-Document Summarization via Sentence-Level

Semantic Analysis and Symmetric Matrix Factorization

Dingding Wang, Tao Li Shenghuo Zhu Chris Ding


School of Computer Science NEC Labs. America, Inc Department of CSE
Florida International University 10080 N. Wolfe Rd. SW3-350 University of Texas at Arlington
Miami, FL 33199 Cupertino, CA 95014 Arlington, TX 76019
dwang003,taoli@cs.fiu.edu zsh@sv.nec-labs.com chqding@uta.edu

ABSTRACT original documents [21, 27]. Since one of the problems of


Multi-document summarization aims to create a compressed data overload is caused by the fact that many documents
summary while retaining the main characteristics of the orig- share the same or similar topics, automatic multi-document
inal set of documents. Many approaches use statistics and summarization has attracted much attention in recent years.
machine learning techniques to extract sentences from doc- With the explosive increase of documents on the Internet,
uments. In this paper, we propose a new multi-document there are various summarization applications. For example,
summarization framework based on sentence-level seman- the informative snippets generation in web search can assist
tic analysis and symmetric non-negative matrix factoriza- users in further exploring [31], and in a Qestion/Answer sys-
tion. We first calculate sentence-sentence similarities using tem, a question-based summary is often required to provide
semantic analysis and construct the similarity matrix. Then information asked in the question [14]. Another example is
symmetric matrix factorization, which has been shown to be short summaries for news groups in news services, which can
equivalent to normalized spectral clustering, is used to group facilitate users to better understand the news articles in the
sentences into clusters. Finally, the most informative sen- group [28].
tences are selected from each group to form the summary. The major issues for multi-document summarization are
Experimental results on DUC2005 and DUC2006 data sets as follows [32]: first of all, the information contained in dif-
demonstrate the improvement of our proposed framework ferent documents often overlaps with each other, therefore,
over the implemented existing summarization systems. A it is necessary to find an effective way to merge the docu-
further study on the factors that benefit the high perfor- ments while recognizing and removing redundancy. In En-
mance is also conducted. glish to avoid repetition, we tend to use different word to
describe the same person, the same topic as a story goes on.
Thus simple word-matching types of similarity such as cosine
Categories and Subject Descriptors can not faithfully capture the content similarity. Also the
H.3.3 [Information Storage and Retrieval]: Information sparseness of words between similar concepts make the simi-
Search and Retrieval—Clustering; I.2.7 [Artificial Intelli- larity metric uneven. Another issue is identifying important
gence]: Natural Language Processing—Text clustering difference between documents and covering the informative
content as much as possible [25]. Current document summa-
General Terms rization methods usually involve natural language processing
and machine learning techniques [29, 2, 34], such as classifi-
Algorithms, Experimentation, Performance
cation, clustering, conditional random fields (CRF) [4], etc.
Section 2 will explicitly discuss these existing methods.
Keywords In this paper, to address the above two issues, we propose
Multi-document summarization, Sentence-level semantic anal- a new framework based on sentence-level semantic analy-
ysis, Symmetric non-negative matrix factorization sis (SLSS) and symmetric non-negative matrix factorization
(SNMF). Since SLSS can better capture the relationships be-
1. INTRODUCTION tween sentences in a semantic manner, we use it to construct
the sentence similarity matrix. Based on the similarity ma-
Multi-document summarization is the process of generat-
trix, we perform the proposed SNMF algorithm to cluster
ing a generic or topic-focused summary by reducing docu-
the sentences. The standard non-negative matrix factoriza-
ments in size while retaining the main characteristics of the
tion(NMF) deals with a rectangular matrix and is thus not
appropriate here. Finally we select the most informative
sentences in each cluster considering both internal and ex-
Permission to make digital or hard copies of all or part of this work for ternal information. We conduct experiments on DUC2005
personal or classroom use is granted without fee provided that copies are and DUC2006 data sets, and the results show the effective-
not made or distributed for profit or commercial advantage and that copies ness of our proposed method. The factors that benefit the
bear this notice and the full citation on the first page. To copy otherwise, to high performance are further studied.
republish, to post on servers or to redistribute to lists, requires prior specific
The rest of the paper is organized as follows. Section 2 dis-
permission and/or a fee.
SIGIR’08, July 20–24, 2008, Singapore. cusses the related work of current methods in multi-document
Copyright 2008 ACM 978-1-60558-164-4/08/07 ...$5.00.

307
summarization. Our proposed method including SLSS and sentence similarity matrix, we perform the symmetric ma-
SNMF algorithm is described in Section 3. Various exper- trix factorization to group these sentences into clusters in
iments are set up and the results are shown in Section 4. the second phase. Full explanations of the proposed SNMF
Section 5 concludes. algorithm will be presented in section 3.3. Finally, in each
cluster, we identify the most semantically important sen-
2. RELATED WORK tence using a measure combining the internal information
(e.g., the computed similarity between sentences) and the
Multiple document summarization has been widely stud- external information (e.g., the given topic information). Sec-
ied recently. In general, there are two types of summariza- tion 3.4 will discuss the sentence selection phase in detail.
tion: extractive summarization and abstractive summariza- These selected sentences finally form the summary.
tion [16, 15]. Extractive summarization usually ranks the
sentences in the documents according to their scores calcu-
lated by a set of predefined features, such as term frequency-
inverse sentence frequency (TF-ISF) [26, 20], sentence or
term position [20, 33], and number of keywords [33]. Ab-
stractive summarization involves information fusion, sen-
tence compression and reformulation [16, 15]. In this paper,
we study sentence-based extractive summarization.
Gong et al. [12] propose a method using latent semantic
analysis (LSA) to select highly ranked sentences for summa-
rization. [11] proposes a maximal marginal relevance (MMR)
method to summarize documents based on the cosine simi-
larity between a query and a sentence and also the sentence
and previously selected sentences. MMR method tends to
remove redundancy, however the redundancy is controlled
by a parameterized model which actually can be automat-
Figure 1: Overview of our proposed method
ically learned. Other methods include NMF-based topic
specific summarization [24], CRF-based summarization [29],
and hidden Markov model (HMM) based method [5]. Some
DUC2005 and DUC2006 participants achieve good perfor-
3.2 Semantic Similarity Matrix Construction
mance such as Language Computer Corporation (LCC) [1], After removing stemming and stopping words, we trunk
that proposes a system combining the question-answering the documents in the same topic into sentences. Simple
and summarization system and using k-Nearest Neighbor word-matching types of similarity such as cosine can not
clustering based on cosine similarity for the sentence selec- faithfully capture the content similarity. Also the sparseness
tion. In addition, some graph-ranking based methods are of words between similar concepts make the similarity metric
also proposed [22, 9]. Most of these methods ignore the de- uneven. Thus, in order to understand the semantic meanings
pendency syntax in the sentence level and just focus on the of the sentences, we perform semantic role analysis on them
keyword co-occurrence. Thus the hidden relationships be- and propose a method to calculate the semantic similarity
tween sentences need to be further discovered. The method between any pair of sentences.
proposed in [13] groups sentences based on the semantic role
analysis, however the work does not make full use of clus-
3.2.1 Semantic Role Parsing
tering algorithms. A semantic role is “a description of the relationship that
In our work, we propose a new framework based on sentence- a constituent plays with respect to the verb in the sen-
level semantic analysis (SLSS) and symmetric non-negative tence” [3]. Semantic role analysis plays very important role
matrix factorization (SNMF). SLSS can better capture the in semantic understanding. The semantic role labeler we use
relationships between sentences in a semantic manner and in this work is based on PropBank semantic annotation [23].
SSNF can factorize the similarity matrix to obtain meaning- The basic idea is that each verb in a sentence is labeled with
ful groups of sentences. Experimental results demonstrate its propositional arguments, and the labeling for each par-
the effectiveness of our proposed framework. ticular verb is called a “frame”. Therefore, for each sentence,
the number of frames generated by the parser equals to the
number of verbs in the sentence. There is a set of abstract
3. THE PROPOSED METHOD arguments indicating the semantic role of each term in a
frame. For example, Arg0 is typically the actor, and Arg1
3.1 Overview is the thing acted upon. The full representation of the ab-
Figure 1 demonstrates the framework of our proposed ap- stract arguments [23] and an illustrative example are shown
proach. Given a set of documents which need to be sum- in Table 1.
marized, first of all, we clean these documents by removing
formatting characters. In the similarity matrix construction 3.2.2 Pairwise Semantic Similarity Calculation
phase, we decompose the set of documents into sentences, Given sentence Si and Sj , now we calculate the similarity
and then parse each sentence into frame(s) using a semantic between them. Suppose Si and Sj are parsed into frames
role parser. Pairwise sentence semantic similarity is cal- respectively. For each pair of frames fm ∈ Si and fn ∈
culated based on both the semantic role analysis [23] and Sj , we discover the semantic relations of terms in the same
word relation discovery using WordNet [10]. Section 3.2 will semantic role using WordNet [10]. If two words in the same
describe this phase in detail. Once we have the pairwise semantic role are identical or of the semantic relations such

308
rel: the verb Arg0: causer of motion 3.3.1 Problem Formulation and Algorithm Procedure
Arg1: thing in motion Arg2: distance moved
Given a pairwise similarity matrix W , we want to find H
Arg3: start point Arg4: end point
such that
Arg5: direction ArgM-LOC: location
ArgM-EXT: extent ArgM-TMP: time min J = ||W − HH T ||2 . (4)
H≥0
ArgM-DIS: discourse connectives ArgM-PNC: purpose
ArgM-ADV: general-purpose ArgM-MNR: manner where the matrix norm ||X||2 = 2
ij Xij is the Frobenius
ArbM-NEG: negation marker ArgM-DIR: direction norm. To derive the updating rule for Eq.(4) with non-
ArgM-MOD: modal verb ArgM-CAU: cause negative constraints, hij ≥ 0, we introduce the Lagrangian
Example: multipliers λij and let L = J + ij λij Hij . The first order
Sentence: A number of marine plants are KKT condition for local minima is
harvested commercially in Nova Scotia.
∂L ∂J
Label : A|Arg1 number|Arg1 of|Arg1 marine|Arg1 = + λij = 0, and λij Hij = 0, ∀i, j.
plants|Arg1 are|- harvested|rel commercially|ArgM-MNR ∂Hij ∂Hij
in|ArgM-LOC Nova|ArgM-LOC Scotia|ArgM-LOC .|- Note that ∂H∂J
= −4W H + 4HH T H. Hence the KKT condi-
tion leads to the fixed point relation:
Table 1: Representation of arguments and an illus-
trative example. (−4W H + 4HH T H)ij Hij = 0 (5)
Using gradient descent method, we have
as synonym, hypernym, hyponym, meronym and holonym, ∂J
the words are considered as “related”. Hij ← Hij − ij (6)
∂Hij
Let Rm and Rn be the semantic roles in fm and fn , re-
H
spectively. Assume Rm ≤ Rn . Let {r1 , r2 , ..., rk } be the set Setting ij = (8HH TijH) , we obtain the NMF style multi-
ij
of K common semantic roles between fm and fn , Tm (ri ) be plicative updating rule for SNMF:
the term set of fm in role ri , and Tn (ri ) be the term set of
fn in role ri . Let |Tm (ri )| ≤ |Tn (ri )|, then we compute the 1 (W H)ij
Hij ← [Hij (1 + ] (7)
similarity between Tm (ri ) and Tn (ri ) as: 2 (HH T H)ij

j tsim(tm
ij , ri )
Hence, the algorithm procedure for solving SNMF is: given
rsim(Tm (ri ), Tn (ri )) = (1) an initial guess of H, iteratively update H using Eq.(7) until
|Tn (ri )|
convergence. This gradient descent method will converge to
where a local minima of the problem.
tm n


1, ij ∈ Tm (ri ), ∃tik ∈ Tn (ri )


 

3.3.2 Properties of SNMF


tsim(tm
ij , ri ) = m n
s.t. tij and tik are related.
0, else SNMF has several nice properties that make it a powerful
tool for clustering. First, one of the nice properties of the
Then the similarity between fm and fn is SNMF algorithm is its inherent ability for maintaining the
k near near-orthogonality of H. Note that
i=1 rsim(Tm (ri ), Tn (ri ))
f sim(fm , fn ) = (2)
Rn ||H T H||2 =  (H T H)2st =  (hTs ht )2 +  (hTt ht )2
st s6=t t
Therefore, the semantic similarity between Si and Sj can be
calculated as follows. Minimizing the first term is equivalent to enforcing the or-
Sim(Si , Sj ) = max f sim(fm , fn ) (3) thogonality among hs : hts ht ≈ 0. On the other hand, since
fm ∈Si ,fn ∈Sj W ≈ HH T ,
Each similarity score is between 0 and 1. Thus, we com- K

pute pairwise sentence similarity for the given document col-  wij =  (HH T )ij =  |hs |2
lection and construct the symmetric semantic similarity ma- ij ij s=1
trix for further analysis.
where |h| is the L1 norm of vector h. Hence ||hs || ≥ 0.
3.3 Symmetric Non-negative Matrix Therefore, we have
Factorization (SNMF) 0, if s 6= t
hTs ht =
Most document clustering algorithms deal with a rectan- khs k2 , if s = t


gular data matrix (e.g., document-term matrix, sentence-


The near-orthogonality of columns of H is important for
term matrix) and they are not suitable for clustering pair-
data clustering. An exact orthogonality implies that each
wise similarity matrix. In our work, we propose the SNMF
row of H can have only one non-zero element, which leads
algorithm to conduct the clustering in the second phase. It
to the hard clustering of data objects (i.e., each data ob-
can be shown that the simple symmetric nonnegative matrix
ject belongs to only 1 cluster). On the other hand, a non-
factorization approach is equivalent to normalized spectral
orthogonality of H does not have a cluster interpretation.
clustering.
The near-orthogonality conditions of SNMF allow for “soft
clustering”, i.e., each object can belong fractionally to mul-
tiple clusters. This usually leads to clustering performance
improvement [7].

309
Another important property is that the simple SNMF is factorization is proposed to simultaneously cluster the rows
equivalent to sophisticated normalized cut spectral cluster- and the columns of the input data matrix X [8]
ing. Spectral clustering is a principled and effective ap-
proach for solving Normalized Cuts [30], a NP-hard opti- X ≈ F SGT . (13)
mization problem. Given the adjacent matrix W of a graph, Note that S provides additional degrees of freedom such that
it can be easily seen that the following SNMF the low-rank matrix representation remains accurate while
F gives row clusters and G gives column clusters. This form
min ||W − HH T ||2 . (8)
H T H=I,H≥0 gives a good framework for simultaneously clustering the
rows and columns of X [6, 18]. An important special case is
where that the input X contains a matrix of pairwise similarities:
W = D−1/2 W D−1/2 , D = diag(d1 , · · · , dn ), di =  wij . X = X T = W . In this case, F = G = H, S = I. This
j reduces to the SNMF:
is equivalent to Normalized Cut spectral clustering. min kX − HH T k2 , s.t. H T H = I.
H≥0

3.3.3 Discussions and Relations


3.4 Within-Cluster Sentence Selection
It can also be shown that SNMF is equivalent to Kernel
K-means clustering and is a special case of 3-factor Nonneg- After grouping the sentences into clusters by the SNMF
ative matrix factorization. These results validate the clus- algorithm, in each cluster, we rank the sentences based on
tering ability of SNMF. the sentence score calculation as shown in Eqs.(15, 16, 17).
Kernel K-means Clustering: For clustering and clas- The score of a sentence measures how important a sentence
sification problems, the solution is represented by K non- is to be included in the summary.
negative cluster membership indicator matrix: H = (h1 , · · · , hK ),
where Score(Si ) = λF1 (Si ) + (1 − λ)F2 (Si ) (15)
nk
   1
1/2 F1 (Si ) = Sim(Si , Sj ) (16)
hk = (0, · · · , 0, 1, · · · , 1, 0, · · · , 0)T /nk (9) N − 1 S ∈C −S


j k i

For example, the nonzero entries of h1 indicate the data F2 (Si ) = Sim(Si , T ) (17)
points belonging to the first cluster. The objective function
of K-means clustering is where F1 (Si ) measures the average similarity score between
sentence Si and all the other sentences in the cluster Ck ,
κ
and N is the number of sentences in Ck . F2 (Si ) represents
J= kxi − fk k2 (10)
the similarity between sentence Si and the given topic T . λ
 

k=1 i∈Ck
is the weight parameter.
where fk is the cluster centroid of the k-th cluster Ck of nk
points, i.e., fk = i∈Ck xi /nk . More generally, the objective 4. EXPERIMENTS
function of Kernel K-means with mapping xi → φ(xi ) is
κ 4.1 Data Set
Jφ =  
kφ(xi ) − φ̄k ||2 (11) We use the DUC2005 and DUC2006 data sets to test our
k=1 i∈Ck proposed method empirically, both of which are open bench-
mark data sets from Document Understanding Conference
where φ̄k is the centroid in the feature space. Using cluster (DUC) for automatic summarization evaluation. Each data
indicators, for K-means and Kernel K-means, the clustering set consists of 50 topics, and Table 2 gives a brief description
problem can be solved via the optimization problem of the two data sets. The task is to create a summary of no
max Tr(H T W H), (12) more than 250 words for each topic to answer the informa-
H T H=I, H≥0 tion expressed in the topic statement.
where H is the cluster indicator and Wij = φ(xi )T φ(xj ) is
DUC2005 DUC2006
the kernel. For K-means, φ(xi ) = xi , Wij = xTi xj .
Note that if we impose the orthogonality constraint on H, Number of topics 50 50
then Number of documents 25 ∼ 50 25
relevant to each topic
J1 = arg min ||W − HH T ||2 Data source TREC AQUAINT corpus
H T H=I,H≥0
Summary length 250 words 250 words
= arg min ||W ||2 − 2Tr(H T W H) + ||H T H||2
H T H=I,H≥0
Table 2: Description of the data sets
= arg max Tr(H T W H)
H T H=I,H≥0
4.2 Implemented Summarization Systems
In other words, SNMF of W = HH T is equivalent to Kernel
In order to compare our methods, first of all, we imple-
K-means clustering under the orthogonality constraints on
ment four most widely used document summarization base-
H.
line systems:
Nonnegative Matrix Factorization (NMF): SNMF
can also be viewed as a special case of 3-factor nonnega- • LeadBase: returns the leading sentences of all the
tive matrix factorizations. The 3-factor nonnegative matrix documents for each topic.

310
• Random: selects sentences randomly for each topic. in this paper, we only report the average F-measure scores
generated by ROUGE-1, ROUGE-2, ROUGE-L, ROUGE-
• LSA: conducts latent semantic analysis on terms by W and ROUGE-SU to compare our proposed method with
sentences matrix as proposed in [12]. other implemented systems.
• NMFBase: performs NMF on terms by sentences ma- 4.4 Experimental Results
trix and ranks the sentences by their weighted scores [17].
4.4.1 Overall Performance Comparison
For better evaluating our proposed method, we also imple-
ment alternative solutions for each phase of the summariza- First of all, we compare the overall performance between
tion procedure as listed in Table 3. our proposed method (using SLSS and SNMF) and all the
other implemented systems. Table 4 and Table 5 show
Phase Proposed Alter- Alter- the ROUGE evaluation results on DUC2006 and DUC2005
method native 1 native 2 data sets respectively. We clearly observe that our proposed
method achieves the highest ROUGE scores and outper-
Similarity Semantic Keyword- N/A
forms all the other systems. In section 4.4.2, 4.4.3 and 4.4.4,
Measurement Similarity based
we evaluate each phase of our proposed method and try to
(SLSS) similarity
analyze all the factors that our method benefits from.
Clustering SNMF K-means NMF
Algorithm (KM) 4.4.2 Evaluation on Methods in Similarity Matrix
Within-Cluster Mp = λF1 (Si ) M1 = M2 = Construction
Sentence +(1 − λ)F2 (Si ) F1 (Si ) F2 (Si ) Actually, instead of using similarity matrix, many sum-
Ranking marization methods directly perform on the terms by sen-
tences matrix, such as the LSA and NMFBase which are
Table 3: Different methods implemented in each implemented as baseline systems in our experiments. In
phase. Remark: Si is the ith sentence in the clus- fact, LSA and NMF give continuous solutions to the same
ter, and the calculation of F1 (Si ) and F2 (Si ) is the K-means clustering problem [7]. Their difference is the con-
same as described in section 3.4. straints: LSA relaxes the non-negativity of H, while NMF
relaxes the orthogonality of H. In NMFbase or LSA, we
In Table 3, the keyword-based similarity between any pair
treat sentence as vectors and clustering them using cosine
of sentences is calculated as the cosine similarity. The pa-
similarity metric (since each document is normalized to 1,
rameter λ in Mp is set to 0.7 empirically, and the influence
|d1 − d2|2 = 2 − 2cos(d1, d2)). From Table 4 and 5, we can
of λ will be discussed in Section 4.4.4. Note that in our
see the results of LSA and NMFbase are similar and all of
experiments, both similarity matrix generation phase and
these methods are not satisfactory. This indicates that sim-
sentence extraction phase use the same type of similarity
ple word-matching types of similarity such as cosine can not
measurements. Thus, we have 22 implemented summariza-
faithfully capture the content similarity.
tion systems: 18 by varying methods in each phase, and
Therefore, we further analyze the sentence-level text and
4 baselines. In section 4.4, we will compare our proposed
generate pairwise sentence similarity. In the experiments,
method with all the other systems.
we compare the proposed sentence-level semantic similarity
4.3 Evaluation Metric with the traditional keyword-based similarity calculation. In
order to better understand the results, we use Figure 2 and 3
We use ROUGE [19] toolkit (version 1.5.5) to measure to visually illustrate the comparison. Due to space limit, we
our proposed method, which is widely applied by DUC for only show ROUGE-1 results in these figures.
performance evaluation. It measures the quality of a sum-
mary by counting the unit overlaps between the candidate
summary and a set of reference summaries. Several au- 0.4
keyword
tomatic evaluation methods are implemented in ROUGE, 0.39
SLSS

such as ROUGE-N, ROUGE-L, ROUGE-W and ROUGE-


SU. ROUGE-N is an n-gram recall computed as follows. 0.38

0.37

S∈{ref } gramn ∈S Countmatch (gramn ) 0.36

ROU GE − N =
ROUGE−1

S∈{ref } gramn ∈S Count(gramn ) 0.35

(18) 0.34

where n is the length of the n-gram, and ref stands for


0.33
the reference summaries. Countmatch (gramn ) is the maxi-
mum number of n-grams co-occuring in a candidatae sum- 0.32

mary and the reference summaries, and Count( gramn ) is 0.31

the number of n-grams in the reference summaries. ROUGE-


0.3
L uses the longest common subsequence (LCS) statistics, NMF+M1 NMF+M2 NMF+Mp KM+M1 KM+M2 KM+Mp SNMF+M1 SNMF+M2 SNMF+Mp

while ROUGE-W is based on weighted LCS and ROUGE-


SU is based on skip-bigram plus unigram. Each of these Figure 2: Methods comparison in similarity matrix
evaluation methods in ROUGE can generate three scores construction phase using ROUGE-1 on DUC2006
(recall, precision and F-measure). As we have similar con- data set
clusions in terms of any of the three scores, for simplicity,

311
Systems ROUGE-1 ROUGE-2 ROUGE-L ROUGE-W ROUGE-SU
Average-Human 0.45767 0.11249 0.42340 0.15919 0.17060
DUC2006 Average 0.37959 0.07543 0.34756 0.13001 0.13206
LeadBase 0.32082 0.05267 0.29726 0.10993 0.10408
Random 0.31749 0.04892 0.29384 0.10779 0.10083
NMFBase 0.32374 0.05498 0.30062 0.11341 0.10606
LSA 0.33078 0.05022 0.30507 0.11220 0.10226
KM + M1 (keyword) 0.33605 0.05481 0.31204 0.12450 0.11125
KM + M2 (keyword) 0.33039 0.04689 0.30394 0.11240 0.10087
KM + Mp (keyword) 0.33558 0.05920 0.32112 0.12614 0.11229
KM + M1 (SLSS) 0.35908 0.06087 0.34074 0.12861 0.12328
KM + M2 (SLSS) 0.35049 0.05931 0.33202 0.12801 0.11763
KM + Mp (SLSS) 0.36371 0.06182 0.34114 0.12966 0.12503
N M F + M1 (keyword) 0.33850 0.05851 0.31274 0.12637 0.11348
N M F + M2 (keyword) 0.33833 0.05087 0.31260 0.11570 0.10662
N M F + Mp (keyword) 0.33869 0.05891 0.31286 0.12719 0.11403
N M F + M1 (SLSS) 0.34112 0.05902 0.32016 0.12951 0.11623
N M F + M2 (SLSS) 0.33882 0.05897 0.31374 0.11650 0.10938
N M F + Mp (SLSS) 0.34372 0.05941 0.32386 0.12973 0.11706
SN M F + M1 (keyword) 0.37141 0.08147 0.35946 0.13214 0.13032
SN M F + M2 (keyword) 0.36934 0.07527 0.34192 0.13011 0.12962
SN M F + Mp (keyword) 0.38801 0.08304 0.36103 0.13361 0.13187
SN M F + M1 (SLSS) 0.39012 0.08352 0.36218 0.13802 0.13713
SN M F + M2 (SLSS) 0.38734 0.08295 0.36052 0.13416 0.13664
SN M F + Mp (SLSS) 0.39551 0.08549 0.36803 0.13943 0.13981

Table 4: Overall performance comparison on DUC2006 using ROUGE evaluation methods. Remark:
“Average-Human” is the average results of summaries constructed by human summarizers and “DUC2006
Average” lists the average results of the 34 participating teams. The system names are the combinations of
the methods used in each phase. For example, “KM + M2 (keyword)” represents that keyword-based similar-
ity, K-means clustering and M2 ranking measurement are used. Candidate methods for each phase are listed
in Table 3.

Systems ROUGE-1 ROUGE-2 ROUGE-L ROUGE-W ROUGE-SU


Average-Human 0.44170 0.10236 0.40632 0.15227 0.16221
DUC2005 Average 0.34347 0.06024 0.31296 0.11675 0.11488
LeadBase 0.29243 0.04320 0.27089 0.10046 0.09303
Random 0.29012 0.04143 0.26395 0.09802 0.09066
NMFBase 0.31107 0.04932 0.28716 0.10785 0.10094
LSA 0.30461 0.04079 0.26476 0.10883 0.09352
KM + M1 (keyword) 0.31762 0.04938 0.29107 0.10806 0.10329
KM + M2 (keyword) 0.30891 0.04902 0.28975 0.10373 0.09531
KM + Mp (keyword) 0.31917 0.04964 0.29763 0.10856 0.10538
KM + M1 (SLSS) 0.31966 0.05129 0.29882 0.10870 0.10623
KM + M2 (SLSS) 0.31857 0.04971 0.29186 0.10737 0.10594
KM + Mp (SLSS) 0.32149 0.05083 0.29910 0.10886 0.10741
N M F + M1 (keyword) 0.32026 0.05105 0.30012 0.11278 0.10921
N M F + M2 (keyword) 0.32003 0.05086 0.30117 0.11063 0.10205
N M F + Mp (keyword) 0.32218 0.05146 0.30213 0.11605 0.10628
N M F + M1 (SLSS) 0.32943 0.05177 0.30529 0.11743 0.10772
N M F + M2 (SLSS) 0.32231 0.05014 0.30357 0.11918 0.10753
N M F + Mp (SLSS) 0.32949 0.05241 0.30835 0.11807 0.10992
SN M F + M1 (keyword) 0.33118 0.05615 0.31529 0.11903 0.11141
SN M F + M2 (keyword) 0.33025 0.05573 0.31247 0.11839 0.11023
SN M F + Mp (keyword) 0.33402 0.05712 0.31773 0.11918 0.11453
SN M F + M1 (SLSS) 0.34856 0.05909 0.32574 0.11987 0.11641
SN M F + M2 (SLSS) 0.34309 0.05960 0.32381 0.11816 0.11427
SN M F + Mp (SLSS) 0.35006 0.06043 0.32956 0.12266 0.12298

Table 5: Overall performance comparison on DUC2005 using ROUGE evaluation methods.

312
0.37
keyword
SLSS
0.36

0.36 NMF
KM
SNMF
0.35
0.35

0.34 0.34
ROUGE−1

ROUGE−1
0.33
0.33

0.32
0.32

0.31

0.31

0.3
NMF+M1 NMF+M2 NMF+Mp KM+M1 KM+M2 KM+Mp SNMF+M1 SNMF+M2 SNMF+Mp

0.3
M1 (keyword) M2 (keyword) Mp (keyword) M1 (SLSS) M2 (SLSS) Mp (SLSS)

Figure 3: Methods comparison in similarity matrix


construction phase using ROUGE-1 on DUC2005 Figure 5: Different clustering algorithms using
data set ROUGE-1 on DUC2005 data set

The results clearly show that no matter which methods


are used in other phases, SLSS outperforms keyword-based
similarity. This is due to the fact that SLSS better captures 0.4
the semantic relationships between sentences.

4.4.3 Evaluation on Different Clustering Algorithms 0.39


NMF (SLSS)
KM (SLSS)
SNMF (SLSS)
Now we compare different clustering algorithms in Fig- 0.38 NMF (keyword)
KM (keyword)
ure 4 and 5. We observe that our proposed SNMF algo- SNMF (keyword)

rithm achieves the best results. K-means and NMF meth- 0.37
ROUGE−1

ods are generally designed to deal with a rectangular data


matrix and they are not suitable for clustering the similarity 0.36

matrix. Our SNMF method, which has been shown to be


equivalent normalized spectral clustering, can generate more 0.35

meaningful clustering results based on the input similarity


matrix. 0.34

0.33
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0.4 Weight Parameter
NMF
0.39 KM
SNMF Figure 6: Study of weight parameter λ using
0.38 ROUGE-1 on DUC2006 data set
0.37

0.36
ROUGE−1

0.35

0.355

0.34

0.35

0.33

0.345 NMF (SLSS)


0.32 KM (SLSS)
SNMF (SLSS)
0.34
NMF (keyword)
0.31 KM (keyword)
SNMF (keyword)
0.335

0.3
ROUGE−1

M1 (keyword) M2 (keyword) Mp (keyword) M1 (SLSS) M2 (SLSS) Mp (SLSS)


0.33

Figure 4: Different clustering algorithms using 0.325

ROUGE-1 on DUC2006 data set 0.32

0.315

4.4.4 Discussion on Parameter λ 0.31

Figure 6 and Figure 7 demonstrate the influence of the


0.305
weight parameter λ in the within-cluster sentence selection 0 0.1 0.2 0.3 0.4 0.5 0.6
Weight Parameter
0.7 0.8 0.9 1

phase. When λ = 1 (it is actually method M1 ), only internal


information counts, i.e. the similarity between sentences. Figure 7: Study of weight parameter λ using
And λ = 0 represents that only the similarity between the ROUGE-1 on DUC2005 data set
sentence and the given topic is considered (method M2 ). We

313
gradually adjust the value of λ, and the results show that qa tasks. In Prodeedings of NAACL 2001 workshop on
combining both internal and external information leads to Automatic Summarization.
better performance. [15] H. Jing and K. McKeown. Cut and paste based text
summarization. In Prodeedings of NAACL 2000.
5. CONCLUSIONS [16] K. Knight and D. Marcu. Summarization beyond
In this paper, we propose a new multi-document summa- sentence extraction: a probablistic approach to
rization framework based on sentence-level semantic analy- sentence compression. Artificial Intelligence, pages
sis (SLSS) and symmetric non-negative matrix factorization 91–107, 2002.
(SNMF). SLSS is able to capture the semantic relationships [17] D. D. Lee and H. S. Seung. Algorithms for
between sentences and SNMF can divide the sentences into non-negative matrix factorization. In NIPS 2001.
groups for extraction. We conduct experiments on DUC2005 [18] T. Li. A general model for clustering binary data. In
and DUC2006 data sets using ROUGE evaluation meth- Proceedings of SIGKDD 2005, pages 188–197.
ods, and the results show the effectiveness of our proposed [19] C.-Y. Lin and E.Hovy. Automatic evaluation of
method. The good performance of the proposed framework summaries using n-gram co-occurrence statistics. In
benefits from the sentence-level semantic understanding, the Proceedings of NLT-NAACL 2003.
clustering over symmetric similarity matrix by the proposed [20] C.-Y. Lin and E. Hovy. From single to multi-document
SNMF algorithm, and the within-cluster sentence selection summarization: A prototype system and its
using both internal and external information. evaluation. In Proceedings of ACL 2002.
[21] I. Mani. Automatic summarization. John Benjamins
Acknowledgements Publishing Company, 2001.
The project is partially supported by a research grant from [22] R. Mihalcea and P. Tarau. A language independent
NEC Lab, NSF CAREER Award IIS-0546280, and IBM Fac- algorithm for single and multiple document
ulty Research Awards. summarization. In Proceedings of IJCNLP 2005.
[23] M. Palmer, P. Kingsbury, and D. Gildea. The
6. REFERENCES proposition bank: An annotated corpus of semantic
[1] http://www-nlpir.nist.gov/projects/duc/pubs/. roles. Computational Linguistics, pages 71–106, 2005.
[2] M. Amini and P. Gallinari. The use of unlabeled data [24] S. Park, J.-H. Lee, D.-H. Kim, and C.-M. Ahn.
to improve supervised learning for text Multi-document summarization based on cluster using
summarization. In Prodeedings of SIGIR 2002. non-negtive matrix factorization. In Proceedings of
SOFSEM 2007.
[3] D. Arnold, L. Balkan, S. Meijer, R. Humphreys, and
L. Sadler. Machine Translation: an Introductory [25] D. Radev, E. Hovy, and K. Mckeown. Introduction to
Guide. Blackwells-NCC, 1994. the special issue on summarization. Computational
Linguistics, pages 399–408, 2002.
[4] C. M. Bishop. Pattern Recognition and Machine
Learning. Springer, 2006. [26] D. Radev, H. Jing, M. Stys, and D. Tam.
Centroid-based summarization of multiple documents.
[5] J. Conroy and D. O’Leary. Text summarization via
Information Processing and Management, pages
hidden markov models. In Proceedings of SIGIR 2001.
919–938, 2004.
[6] I. S. Dhillon. Co-clustering documents and words
[27] B. Ricardo and R. Berthier. Modern information
using bipartite spectral graph partitioning. In
retrieval. ACM Press, 1999.
Proceedings of KDD 2001.
[28] G. Sampathsampath and M. Martinovic. A Multilevel
[7] C. Ding and X. He. K-means clustering and principal
Text Processing Model of Newsgroup Dynamics. 2002.
component analysis. In Prodeedings of ICML 2004.
[29] D. Shen, J.-T. Sun, H. Li, Q. Yang, and Z. Chen.
[8] C. Ding, T. Li, W. Peng, and H. Park. Orthogonal
Document summarization using conditional random
nonnegative matrix t-factorizations for clustering. In
fields. In Proceedings of IJCAI 2007.
Proceedings of KDD 2006.
[30] J. Shi and J. Malik. Normalized cuts and image
[9] G. Erkan and D. Radev. Lexpagerank: Prestige in
segmentation. IEEE. Trans. on Pattern Analysis and
multi-document text summarization. In Proceedings of
Machine Intelligence, 22:888–905, 2000.
EMNLP 2004.
[31] A. Turpin, Y. Tsegay, D. Hawking, and H. Williams.
[10] C. Fellbaum. WordNet: An Electronic Lexical
Fast generation of result snippets in web search. In
Database. MIT Press, 1998.
Proceedings of SIGIR 2007.
[11] J. Goldstein, M. Kantrowitz, V. Mittal, and
[32] X. Wan, J. Yang, and J. Xiao. Manifold-ranking based
J. Carbonell. Summarizing text documents: Sentence
topic-focused multi-document summarization. In
selection and evaluation metrics. In Proceedings of
Proceedings of IJCAI 2007.
SIGIR 1999.
[33] W.-T. Yih, J. Goodman, L. Vanderwende, and
[12] Y. Gong and X. Liu. Generic text summarization
H. Suzuki. Multi-document summarization by
using relevance measure and latent semantic analysis.
maximizing informative content-words. In Proceedings
In Proceedings of SIGIR 2001.
of IJCAI 2007.
[13] S. Harabagiu and F. Lacatusu. Topic themes for
[34] H. Zha. Generic summarization and keyphrase
multi-document summarization. In Prodeedings of
extraction using mutual reinforcement principle and
SIGIR 2005.
sentence clustering. In Prodeedings of SIGIR 2005.
[14] T. Hirao, Y. Sasaki, and H. Isozaki. An extrinsic
evaluation for question-biased text summarization on

314

You might also like