article

Extracting Protein Interactions from Text with the Unified AkaneRE Event Extraction System

Authors:

Kazuhiro Yoshida,

Takuya Matsuzaki,

Yoshinobu Kano,

Jun'ichi TsujiiAuthors Info & Claims

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB), Volume 7, Issue 3

Pages 442 - 453

https://doi.org/10.1109/TCBB.2010.46

Published: 01 July 2010 Publication History

Abstract

Currently, relation extraction (RE) and event extraction (EE) are the two main streams of biological information extraction. In 2009, the majority of these RE and EE research efforts were centered around the BioCreative II.5 Protein-Protein Interaction (PPI) challenge and the “BioNLP event extraction shared task.” Although these challenges took somewhat different approaches, they share the same ultimate goal of extracting bio-knowledge from the literature. This paper compares the two challenge task definitions, and presents a unified system that was successfully applied in both these and several other PPI extraction task settings. The AkaneRE system has three parts: A core engine for RE, a pool of modules for specific solutions, and a configuration language to adapt the system to different tasks. The core engine is based on machine learning, using either Support Vector Machines or Statistical Classifiers and features extracted from given training data. The specific modules solve tasks like sentence boundary detection, tokenization, stemming, part-of-speech tagging, parsing, named entity recognition, generation of potential relations, generation of machine learning features for each relation, and finally, assignment of confidence scores and ranking of candidate relations. With these components, the AkaneRE system produces state-of-the-art results, and the system is freely available for academic purposes at http://www-tsujii.is.s.u-tokyo.ac.jp/satre/akane/.

References

[1]

R. Bunescu, R. Ge, R.J. Kate, E.M. Marcotte, R.J. Mooney, A.K. Ramani, and Y.W. Wong, "Comparative Experiments on Learning Information Extractors for Proteins and Their Interactions," J. Artificial Intelligence in Medicine, special issue on summarization and information extraction from medical documents, http:// www.ncbi.nlm.nih.gov/ /15811782, 2004.

Digital Library

[2]

R. Sætre, K. Sagae, and J. Tsujii, "Syntactic Features for Protein-Protein Interaction Extraction," Proc. Second Int'l Symp. Languages in Biology and Medicine (LBM '07), C.J. Baker and S. Jian, eds., CEUR Workshop Proc. (CEUR-WS.org), vol. 319, pp. 6.1-6.14, http://sunsite.informatik.rwth-aachen.de/Publications/CEURWS/ Vol-319/Paper6.pdf, Jan. 2008.

[3]

M. Miwa, R. Sætre, Y. Miyao, and J. Tsujii, "A Rich Feature Vector for Protein-Protein Interaction Extraction from Multiple Corpora," Proc. 2009 Conf. Empirical Methods in Natural Language Processing, pp. 121-130, http://www.aclweb.org/anthology/D/D09/ D09-1013.pdf, Aug. 2009.

Digital Library

[4]

R. Sætre, M. Miwa, K. Yoshida, and J. Tsujii, "From Protein-Protein Interaction to Molecular Event Extraction," Proc. Natural Language Processing in Biomedicine (BioNLP) NAACL 2009 Workshop, pp. 103-106, http://www-tsujii.is.s.u-tokyo.ac.jp/~satre/ papers/bioShared2009_satre.pdf, 2009.

Digital Library

[5]

R. Kabiljo, A. Clegg, and A. Shepherd, "A Realistic Assessment of Methods for Extracting Gene/Protein Interactions from Free Text," BMC Bioinformatics, vol. 10, no. 1, July 2009, http:// dx.doi.org/10.1186/1471-2105-10-233.

[6]

Y. Niu, D. Otasek, and I. Jurisica, "Evaluation of Linguistic Features Useful in Extraction of Interactions from Application to Annotating Known and High-Throughput, Predicted Interactions in I2D," Bioinformatics, vol. 26, no. 1, pp. 111-119, Jan. 2010, http://dx.doi.org/10.1093/bioinformatics/btp602.

Digital Library

[7]

T. Fayruzov, M. De Cock, C. Cornelis, and V. Hoste, "The Role of Syntactic Features in Protein Interaction Extraction," Proc. Second Int'l Workshop Data and Text Mining in Bioinformatics, http:// portal.acm.org/citation.cfm?id=1458463, 2008.

Digital Library

[8]

S. Van Landeghem, Y. Saeys, B. De Baets, and Y. Van de Peer, "Extracting Protein-Protein Interactions from Text Using Rich Feature Vectors and Feature Selection," Proc. Third Int'l Symp. Semantic Mining in Biomedicine (SMBM '08), T. Salakoski, D. Rebholz-Schuhmann, and S. Pyysalo, eds., pp. 77-84, http:// mars.cs.utu.fi/smbm2008/files/smbm2008proceedings/ smbmpaper_4.pdf, 2008.

[9]

F. Leitner, M. Krallinger, C. Rodriguez-Penagos, J. Hakenberg, C. Plake, C.-J. Kuo, C.-N. Hsu, R.T.-H. Tsai, H.-C. Hung, W.W. Lau, C.A. Johnson, R. Sætre, K. Yoshida, Y.H. Chen, S. Kim, S.-Y. Shin, B.-T. Zhang, W.A. Baumgartner, Jr., L. Hunter, B. Haddow, M. Matthew, X. Wang, P. Ruch, F. Ehrler, A. Ozgur, G. Erkan, D.R. Radev, M. Krauthammer, T. Luong, R. Hoffmann, C. Sander, and A. Valencia, "Introducing Meta-Services for Biomedical Information Extraction," Genome Biology, vol. 9, no. S2, special issue on the biocreative challenge evaluation, http://genomebiology.com/ 2008/9/S2/S6, 2008.

[10]

P. Roberts, A. Cohen, and W. Hersh, "Tasks, Topics Relevance Judging for the TREC Genomics Track: Five Years of Experience Evaluating Biomedical Text Information Retrieval Systems," Information Retrieval, vol. 12, no. 1, pp. 81-97, http://www. springerlink.com/content/940478r304656141/, 2009.

Digital Library

[11]

K. Fundel, R. Kuffner, and R. Zimmer, "RelEx-Relation Extraction Using Dependency Parse Tree," Bioinformatics, vol. 23, no. 3, pp. 365-371, Feb. 2007, http://dx.doi.org/ 10.1093/bioinformatics/btl616.

Digital Library

[12]

S. Kim, S.-Y. Shin, I.-H. Lee, S.-J. Kim, R. Sriram, and B.-T. Zhang, "Pie: An Online Prediction System for Protein-Protein Interactions from Text," Nucleic Acids Research, vol. 36, no. Suppl_2, pp. W411- W415, July 2008, http://dx.doi.org/10.1093/nar/gkn281.

[13]

P. Palaga, L. Nguyen, U. Leser, and J. Hakenberg, "High-Performance Information Extraction with AliBaba," Proc. 12th Int'l Conf. Extending Database Technology (EDBT '09), pp. 1140-1143, 2009, http://doi.acm.org/10.1145/1516360.1516498.

Digital Library

[14]

L. Hunter, Z. Lu, J. Firby, W. Baumgartner, H. Johnson, P. Ogren, and K.B. Cohen, "OpenDMAP: An Open Source Ontology-Driven Concept Analysis Engine with Applications to Capturing Knowledge Regarding Protein Transport Protein Interactions and Cell-Type-Specific Gene Expression," BMC Bioinformatics, vol. 9, no. 1, Jan. 2008, http://dx.doi.org/ 10.1186/1471-2105-9-78.

[15]

R. Chowdhary, J. Zhang, and J.S. Liu, "Bayesian Inference of Protein-Protein Interactions from Biological Literature," Bioinformatics, vol. 25, no. 12, pp. 1536-1542, June 2009, http:// dx.doi.org/10.1093/bioinformatics/btp245.

Digital Library

[16]

M. Krallinger, A. Morgan, L. Smith, F. Leitner, L. Tanabe, J. Wilbur, L. Hirschman, and A. Valencia, "Evaluation of Text-Mining Systems for Biology: Overview of the Second Biocreative Community Challenge," Genome Biology, vol. 9, no. S2, 2008, http://dx.doi.org/10.1186/gb-2008-9-s2-s1.

[17]

F. Leitner and A. Valencia, "A Text-Mining Perspective on the Requirements for Electronically Annotated Abstracts," FEBS Letters, vol. 582, no. 8, pp. 1178-1181, Apr. 2008, http:// dx.doi.org/10.1016/j.febslet.2008.02.072.

[18]

J.-D. Kim, T. Ohta, S. Pyysalo, Y. Kano, and J. Tsujii, "Overview of Bionlp '09 Shared Task on Event Extraction," Proc. Natural Language Processing in Biomedicine (BioNLP) 2009 Workshop Companion Volume for Shared Task, pp. 1-9, http://www. aclweb.org/anthology/W/W09/W09-1401.pdf, 2009.

Digital Library

[19]

S. Pyysalo, F. Ginter, J. Heimonen, J. Bjorne, J. Boberg, J. Jarvinen, and T. Salakoski, "BioInfer: A Corpus for Information Extraction in the Biomedical Domain," BMC Bioinformatics, vol. 8, no. 1, 2007, http://dx.doi.org/10.1186/1471-2105-8-50.

[20]

J. Ding, D. Berleant, D. Nettleton, and E. Wurtele, "Mining MEDLINE: Abstracts Sentences, or Phrases?" Proc. Pacific Symp. Biocomputing, pp. 326-337, http://view.ncbi.nlm.nih.gov/ /11928487, 2002.

[21]

C. Nédellec, "Learning Language in Logic--Genic Interaction Extraction Challenge," Proc. Fourth Learning Language in Logic Workshop (LLL '05), J. Cussens and C. Nédellec, eds., pp. 31-37, http://www.cs.york.ac.uk/aig/lll/lll05/lll05-nedellec.pdf, Aug. 2005.

[22]

S. Pyysalo, A. Airola, J. Heimonen, J. Bjorne, F. Ginter, and T. Salakoski, "Comparative Analysis of Five Protein-Protein Interaction Corpora," BMC Bioinformatics, vol. 9, no. Suppl 3, 2008, http://dx.doi.org/10.1186/1471-2105-9-S3-S6.

[23]

J.D. Kim, T. Ohta, and J. Tsujii, "Corpus Annotation for Mining Biomedical Events from Literature," BMC Bioinformatics, vol. 9, no. 1, 2008, http://dx.doi.org/10.1186/1471-2105-9-10.

[24]

A. Yakushiji, "Relation Information Extraction Using Deep Syntactic Analysis," PhD dissertation, Univ. of Tokyo, http:// www-tsujii.is.s.u-tokyo.ac.jp/~akane/papers/dissertation_ yakushiji.pdf, 2006.

[25]

R. Sætre, K. Yoshida, A. Yakushiji, Y. Miyao, Y. Matsubyashi, and T. Ohta, "AKANE System: Protein-Protein Interaction Pairs in BioCreAtIvE2 Challenge PPI-IPS Subtask," Proc. Second BioCreative Challenge Evaluation Workshop, L. Hirschman, M. Krallinger, and A. Valencia, eds., pp. 209-212, http://www-tsujii.is.s.uto kyo.ac.jp/~satre/papers/BC2_PPI_IPS_T19_BC2.pdf, Apr. 2007.

[26]

Y. Kano, N. Nguyen, R. Sætre, K. Yoshida, Y. Miyao, Y. Tsuruoka, Y. Matsubayashi, S. Ananiadou, and J. Tsujii, "Filling the Gaps between Tools Users: A Tool Comparator and Using Protein-Protein Interactions as an Example," Proc. Pacific Symp. Biocomputing (PSB), no. 13, pp. 616-627, http://psb.stanford.edu/psbonline/ proceedings/psb08/kano.pdf, Jan. 2008.

[27]

Y. Miyao, K. Sagae, R. Sætre, T. Matsuzaki, and J. Tsujii, "Evaluating Contributions of Natural Language Parsers to Protein-Protein Interaction Extraction," Bioinformatics, vol. 25, no. 3, pp. 394-400, http://bioinformatics.oxfordjournals.org/ cgi/content/abstract/25/3/394, 2009.

Digital Library

[28]

M. Miwa, R. Sætre, Y. Miyao, and J. Tsujii, "Protein-Protein Interaction Extraction by Leveraging Multiple Kernels and Parsers," Int'l J. Medical Informatics, Special Issue on Mining of Clinical and Biomedical Text and Data, vol. 78, no. 12, pp. e39-e46, http://www.ijmijournal.com/article/S1386-5056%2809%2900076- 8/, 2009.

[29]

D. Ferrucci and A. Lally, "UIMA: An Architectural Approach to Unstructured Information Processing in the Corporate Research Environment," Natural Language Eng., vol. 10, nos. 3/4, pp. 327- 348, http://portal.acm.org/citation.cfm?id=1030318.1030325, 2004.

Digital Library

[30]

R. Sætre, Akane System Home Page, http://www-tsujii.is.s.utokyo. ac.jp/~satre/akane/, 2009.

[31]

H. Hermjakob, L. Montecchi-Palazzi, G. Bader, R. Wojcik, L. Salwinski, A. Ceol, S. Moore, S. Orchard, U. Sarkans, C. von Mering, B. Roechert, S. Poux, E. Jung, H. Mersch, P. Kersey, M. Lappe, Y. Li, R. Zeng, D. Rana, M. Nikolski, H. Husi, C. Brun, K. Shanker, S. Grant, C. Sander, P. Bork, W. Zhu, A. Pandey, A. Brazma, B. Jacq, M. Vidal, D. Sherman, P. Legrain, G. Cesareni, L. Xenarios, D. Eisenberg, B. Steipe, C. Hogue, and R. Apweiler, "The HUPOPSI's Molecular Interaction Format--a Community Standard for the Representation of Protein Interaction Data," Nature Biotechnology, vol. 22, no. 2, pp. 177-183, http:// www.ncbi.nlm.nih.gov/ /14755292, Feb. 2004.

[32]

U. Hahn, E. Buyko, K. Tomanek, S. Piao, J. McNaught, Y. Tsuruoka, and S. Ananiadou, "An Annotation Type System for a Data-Driven NLP Pipeline," Proc. Linguistic Annotation Workshop, pp. 33-40, http://www.aclweb.org/anthology/W/W07/ W07-1505.pdf, June 2007.

Digital Library

[33]

W.A. Baumgartner, B.K. Cohen, and L. Hunter, "An Open-Source Framework for Large-Scale and Flexible Evaluation of Biomedical Text Mining Systems," J. Biomedical Discovery and Collaboration, vol. 3, Jan. 2008, http://dx.doi.org/10.1186/1747-5333-3-1.

[34]

Y. Kano, W.A. Baumgartner, L. McCrohon, S. Ananiadou, K.B. Cohen, L. Hunter, and J. Tsujii, "U-Compare: Share and Compare Text Mining Tools with Uima," Bioinformatics, vol. 25, no. 15, pp. 1997-1998, Aug. 2009, http://dx.doi.org/10.1093/ bioinformatics/btp289.

Digital Library

[35]

J.-D. Kim, T. Ohta, Y. Tateishi, and J. Tsujii, "GENIA Corpus--a Semantically Annotated Corpus for Bio-Textmining," Bioinformatics, vol. 19, no. Suppl. 1, pp. i180-i182, http://bioinformatics. oupjournals.org/cgi/content/abstract/19/suppl_1/i180, 2003.

[36]

T. Hara, Y. Miyao, and J. Tsujii, "Adapting a Probabilistic Disambiguation Model of an HPSG Parser to a New Domain," Proc. Int'l Joint Conf. Natural Language Processing (IJCNLP '05), R. Dale, K.-F. Wong, J. Su, and O.Y. Kwong, eds., pp. 199-210, http://www-tsujii.is.s.u-tokyo.ac.jp/~harasan/papers/harasan- IJCNLP2005.pdf, Oct. 2005.

Digital Library

[37]

R. Apweiler, A. Bairoch, C.H. Wu, W.C. Barker, B. Boeckmann, S. Ferro, E. Gasteiger, H. Huang, R. Lopez, M. Magrane, M.J. Martin, D.A. Natale, C. O'Donovan, N. Redaschi, and L.-S.L. Yeh, "UniProt: The Universal Protein Knowledgebase," Nucleic Acids Research, vol. 32, no. Suppl_1, pp. D115-D119, Jan. 2004, http:// dx.doi.org/10.1093/nar/gkh131.

[38]

D. Maglott, J. Ostell, K.D. Pruitt, and T. Tatusova, "Entrez Gene: Gene-Centered Information at NCBI," Nucleic Acids Research, vol. 33, no. Suppl_1, pp. D54-D58, Jan. 2005, http:// dx.doi.org/10.1093/nar/gki031.

[39]

A. Koike and T. Takagi, "Gene/Protein/Family Name Recognition in Biomedical Literature," Proc. BioLINK 2004: Linking Biological Literature, Ontologies, and Databases, pp. 9-16, http:// www.cs.brandeis.edu/~jamesp/biolink2004/papers/pdf/ BIO002.pdf, 2004.

[40]

T. Joachims, "Optimizing Search Engines Using Clickthrough Data," Proc. ACM SIGKDD, pp. 133-142, 2002, http://doi.acm. org/10.1145/775047.775067.

Digital Library

[41]

A. Moschitti, "Making Tree Kernels Practical for Natural Language Learning," Proc. Conf. European Chapter of the Assoc. for Computational Linguistics (EACL), http://acl.ldc.upenn.edu/E/ E06/E06-1015.pdf, 2006.

[42]

R.-E. Fan, K.-W. Chang, C.-J. Hsieh, X.-R. Wang, and C.-J. Lin, "LIBLINEAR: A Library for Large Linear Classification," J. Machine Learning Research, vol. 9, pp. 1871-1874, http:// www.csie.ntu.edu.tw/~cjlin/papers/liblinear.pdf, 2008.

Digital Library

[43]

L.A. Hirschman, S.A. Mardis, G. Cesareni, M. Krallinger, F. Leitner, and A. Valencia, "An Overview of BioCreative II.5," IEEE/ ACM Trans. Computational Biology and Bioinformatics, vol. 7, no. 3, pp. 385-399, July-Sept 2010.

Digital Library

Cited By

Hsiao JWei CKao H(2014)Gene name disambiguation using multi-scope species detectionIEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)10.1109/TCBB.2013.13911:1(55-62)Online publication date: 1-Jan-2014
https://dl.acm.org/doi/10.1109/TCBB.2013.139
Qian LZhou G(2012)Tree kernel-based protein-protein interaction extraction from biomedical literatureJournal of Biomedical Informatics10.1016/j.jbi.2012.02.00445:3(535-543)Online publication date: 1-Jun-2012
https://dl.acm.org/doi/10.1016/j.jbi.2012.02.004

Extracting Protein Interactions from Text with the Unified AkaneRE Event Extraction System

Recommendations

Extracting Coevolutionary Features from Protein Sequences for Predicting Protein-Protein Interactions

Knowing the ways proteins interact with each other are crucial to our understanding of the functional mechanisms of proteins. It is for this reason that different approaches have been developed in attempts to predict protein-protein interactions PPIs ...
Specificity and affinity quantification of protein–protein interactions

Motivation: Most biological processes are mediated by the protein–protein interactions. Determination of the protein–protein structures and insight into their interactions are vital to understand the mechanisms of protein functions. Currently, ...
Efficient Extraction of Protein-Protein Interactions from Full-Text Articles

Proteins and their interactions govern virtually all cellular processes, such as regulation, signaling, metabolism, and structure. Most experimental findings pertaining to such interactions are discussed in research papers, which, in turn, get curated ...

Comments

Information & Contributors

Information

Published In

IEEE/ACM Transactions on Computational Biology and Bioinformatics Volume 7, Issue 3

July 2010

192 pages

ISSN:1545-5963

Issue’s Table of Contents

Publisher

IEEE Computer Society Press

Washington, DC, United States

Publication History

Published: 01 July 2010

Published in TCBB Volume 7, Issue 3

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
279
Total Downloads

Downloads (Last 12 months)3
Downloads (Last 6 weeks)1

Reflects downloads up to 24 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Hsiao JWei CKao H(2014)Gene name disambiguation using multi-scope species detectionIEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)10.1109/TCBB.2013.13911:1(55-62)Online publication date: 1-Jan-2014
https://dl.acm.org/doi/10.1109/TCBB.2013.139
Qian LZhou G(2012)Tree kernel-based protein-protein interaction extraction from biomedical literatureJournal of Biomedical Informatics10.1016/j.jbi.2012.02.00445:3(535-543)Online publication date: 1-Jun-2012
https://dl.acm.org/doi/10.1016/j.jbi.2012.02.004

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents