skip to main content
article

Predicting protein relationships to human pathways through a relational learning approach based on simple sequence features

Published: 01 July 2014 Publication History

Abstract

Biological pathways are important elements of systems biology and in the past decade, an increasing number of pathway databases have been set up to document the growing understanding of complex cellular processes. Although more genome-sequence data are becoming available, a large fraction of it remains functionally uncharacterized. Thus, it is important to be able to predict the mapping of poorly annotated proteins to original pathway models. Results: We have developed a Relational Learning-based Extension (RLE) system to investigate pathway membership through a function prediction approach that mainly relies on combinations of simple properties attributed to each protein. RLE searches for proteins with molecular similarities to specific pathway components. Using RLE, we associated 383 uncharacterized proteins to 28 pre-defined human Reactome pathways, demonstrating relative confidence after proper evaluation. Indeed, in specific cases manual inspection of the database annotations and the related literature supported the proposed classifications. Examples of possible additional components of the Electron transport system, Telomere maintenance and Integrin cell surface interactions pathways are discussed in detail. Availability: All the human predicted proteins in the 2009 and 2012 releases 30 and 40 of Reactome are available at http://rle.bioinfo.cnio.es.

References

[1]
M. P. Cary, G. D. Bader, and C. Sander, "Pathway information for systems biology," FEBS Lett., vol. 579, no. 8, pp. 1815-1820, 2005.
[2]
H. S. Ooi, G. Schneider, T.-T. Lim, Y.-L. Chan, B. Eisenhaber, and F. Eisenhaber, Biomolecular Pathway Databases, New York, NY, USA, Humana Press, vol. 609, pp. 129-144, 2010.
[3]
M. K. Sakharkar, K. R. Sakharkar, and S. Pervaiz, "Druggability of human disease genes," Int. J. Biochem. Cell B., vol. 39, no. 6, pp. 1156-1164, 2007.
[4]
E. Demir, M. P. Cary, S. Paley, K. Fukuda, C. Lemer, I. Vastrik, G. Wu, P.-D Eustachio, C. Schaefer, J. Luciano, F. Schacherer, I. Martinez-Flores, Z. Hu, V. Jimenez-Jacinto, G. Joshi-Tope, K. Kandasamy, A. Lopez-Fuentes, H. Mi, E. Pichler, I. Rodchenkov, A. Splendiani, S. Tkachev, J. Zucker, G. Gopinath, H. Rajasimha, R. Ramakrishnan, I. Shah, M. Syed, N. Anwar, O. Babur, M. Blinov, E. Brauner, D. Corwin, S. Donaldson, F. Gibbons, R. Goldberg, P. Hornbeck, A. Luna, P. Murray-Rust, E. Neumann, O. Reubenacker, M. Samwald, M. van Iersel, S. Wimalaratne, K. Allen, B. Braun, M. Whirl-Carrillo, K.-H. Cheung, K. Dahlquist, A. Finney, M. Gillespie, E. Glass, L. Gong, R. Haw, M. Honig, O. Hubaut, D. Kane, S. Krupa, M. Kutmon, J. Leonard, D. Marks, D. Merberg, V. Petri, A. Pico, D. Ravenscroft, L. Ren, N. Shah, M. Sunshine, R. Tang, R. Whaley, S. Letovksy, K. H. Buetow, A. Rzhetsky, V. Schachter, B. S. Sobral, U. Dogrusoz, S. McWeeney, M. Aladjem, E. Birney, J. Collado-Vides, S. Goto, M. Hucka, N. L. Novere, N. Maltsev, A. Pandey, P. Thomas, E. Wingender, P. D. Karp, C. Sander, and G. D. Bader, "The BioPAX community standard for pathway data sharing," Nat. Biotech., vol. 28, no. 9, pp. 935-942, 2010.
[5]
E. G. Cerami, B. E. Gross, E. Demir, I. Rodchenkov, O. Babur, N. Anwar, N. Schultz, G. D. Bader, and C. Sander, "Pathway Commons, a web resource for biological pathway data," Nucleic Acids Res., vol. 39, no. suppl 1, pp. 685-690, 2011.
[6]
G. D. Bader, M. P. Cary, and C. Sander, "Pathguide: a pathway resource list," Nucleic Acids Res., vol. 34, no. suppl 1, pp. 504-506, 2006.
[7]
L. Matthews, G. Gopinath, M. Gillespie, M. Caudy, D. Croft, B. de Bono, P. Garapati, J. Hemish, H. Hermjakob, B. Jassal, A. Kanapin, S. Lewis, S. Mahajan, B. May, E. Schmidt, I. Vastrik, G. Wu, E. Birney, L. Stein, and P. D'Eustachio, "Reactome knowledgebase of human biological pathways and processes," Nucleic Acids Res., vol. 37, no. suppl 1, pp. 619-622, 2009.
[8]
M. Kanehisa, S. Goto, Y. Sato, M. Furumichi, and M. Tanabe, "KEGG for integration and interpretation of large-scale molecular data sets," Nucleic Acids Res., vol. 40, no. D1, pp. 109-114, 2012.
[9]
R. Caspi, T. Altman, J. M. Dale, K. Dreher, C. A. Fulcher, F. Gilham, P. Kaipa, A. S. Karthikeyan, A. Kothari, M. Krummenacker, M. Latendresse, L. A. Mueller, S. Paley, L. Popescu, A. Pujar, A. G. Shearer, P. Zhang, and P. D. Karp, "The MetaCyc database of metabolic pathways and enzymes and the BioCyc collection of pathway/genome databases," Nucleic Acids Res., vol. 38, no. suppl 1, pp. 473-479, 2010.
[10]
T. Kelder, M. P. van Iersel, K. Hanspers, M. Kutmon, B. R. Conklin, C. T. Evelo, and A. R. Pico. (2012) Wikipathways: Building research communities on biological pathways. Nucleic Acids Res. [Online]. 40(D1), pp. D1301-D1307. Available: http://nar.oxford-journals.org/content/40/D1/D1301.abstract
[11]
L. J. Lu, A. Sboner, Y. J. Huang, H. X. Lu, T. A. Gianoulis, K. Y. Yip, P. M. Kim, G. T. Montelione, and M. B. Gerstein, "Comparing classical pathways and modern networks: towards the development of an edge ontology," Trends Biochem. Sci., vol. 32, no. 7, pp. 320-331, 2007.
[12]
T. Korcsmaros, M. S. Szalay, P. Rovo, R. Palotai, D. Fazekas, K. Lenti, I. J. Farkas, P. Csermely, and T. Vellai, "Signalogs: Orthology-based identification of novel signaling pathway components in three metazoans," PLoS One, vol. 6, no. 5, p. 19240, 2011.
[13]
H. Frohlich, M. Fellmann, H. Sültmann, A. Poustka, and T. Beissbarth, "Predicting pathway membership via domain signatures," Bioinformatics, vol. 24, no. 19, pp. 2137-2142, 2008.
[14]
P. D. Karp, S. Paley, and P. Romero, "The pathway tools software," Bioinformatics, vol. 18, no. suppl 1, pp. 225-232, 2002.
[15]
M. E. Adriaens, M. Jaillard, A. Waagmeester, S. L. M. Coort, A. R. Pico, and C. T. A. Evelo, "The public road to high-quality curated biological pathways," Drug Disc. Today, vol. 13, no. 19-20, pp. 856-862, 2008.
[16]
K. L. J. Prather and C. H. Martin, "De novo biosynthetic pathways: Rational design of microbial chemical factories," Current Opinion Biotechnol., vol. 19, no. 5, pp. 468-474, 2008.
[17]
E. Glaab et al., "Extending pathways and processes using molecular interaction networks to analyse cancer genome data," BMC Bioinf., vol. 11, no. 1, article 597, 2010.
[18]
L. J. Jensen, R. Gupta, N. Blom, D. Devos, J. Tamames, C. Kesmir, H. Nielsen, H. H. Staerfeldt, K. Rapacki, C. Workman, C. A. Andersen, S. Knudsen, A. Krogh, A. Valencia, and S. Brunak, "Prediction of human protein function from post-translational modifications and localization features," J. Mol. Biol., vol. 319, no. 5, pp. 1257-1265, 2002.
[19]
L. J. Jensen, R. Gupta, H. H. Staerfeldt, and S. Brunak, "Prediction of human protein function according to Gene Ontology categories," Bioinformatics, vol. 19, no. 5, pp. 635-642, 2003.
[20]
J. D. Bendtsen, L. J. Jensen, N. Blom, G. von Heijne, and S. Brunak, "Feature-based prediction of non-classical and leaderless protein secretion," Protein Eng. Design Selection, vol. 17, no. 4, pp. 349-356, 2004.
[21]
A. Clare, A. Karwath, H. Ougham, and R. D. King, "Functional bioinformatics for Arabidopsis thaliana," Bioinformatics, vol. 22, no. 9, pp. 1130-1136, 2006.
[22]
C. Vens, J. Struyf, L. Schietgat, S. Dzeroski, and H. Blockeel, "Decision trees for hierarchical multi-label classification," Mach. Learn., vol. 73, no. 2, pp. 185-214, 2008.
[23]
G. Wu, X. Feng, and L. Stein, "A human functional protein interaction network and its application to cancer data analysis," Genome Biol., vol. 11, no. 5, p. 53, 2010.
[24]
B. Jassal, "Pathway annotation and analysis with Reactome: The solute carrier class of membrane transporters," Human Genomics, vol. 5, no. 4, pp. 310-315, 2011.
[25]
S. Dzeroski and N. Lavrac, Relational Data Mining. New York, NY, USA: Springer, 2001.
[26]
J. E. Gewehr, M. Szugat, and R. Zimmer, "BioWeka extending the Weka framework for bioinformatics," Bioinformatics, vol. 23, no. 5, pp. 651-653, 2007.
[27]
T. J. P. Hubbard, B. L. Aken, S. Ayling, B. Ballester, K. Beal, E. Bragin, S. Brent, Y. Chen, P. Clapham, L. Clarke, G. Coates, S. Fairley, S. Fitzgerald, J. Fernandez-Banet, L. Gordon, S. Graf, S. Haider, M. Hammond, R. Holland, K. Howe, A. Jenkinson, N. Johnson, A. Kahari, D. Keefe, S. Keenan, R. Kinsella, F. Kokocinski, E. Kulesha, D. Lawson, I. Longden, K. Megy, P. Meidl, B. Overduin, A. Parker, B. Pritchard, D. Rios, M. Schuster, G. Slater, D. Smedley, W. Spooner, G. Spudich, S. Trevanion, A. Vilella, J. Vogel, S. White, S. Wilder, A. Zadissa, E. Birney, F. Cunningham, V. Curwen, R. Durbin, X. M. Fernandez-Suarez, J. Herrero, A. Kasprzyk, G. Proctor, J. Smith, S. Searle, and P. Flicek, "Ensembl 2009," Nucleic Acids Res., vol. 37, no. suppl 1, pp. 690-697, 2009.
[28]
D. Smedley, S. Haider, B. Ballester, R. Holland, D. London, G. Thorisson, and A. Kasprzyk, "BioMart - Biological queries made easy," BMC Genomics, vol. 10, article 22, 2009.
[29]
C. Stark, B.-J. Breitkreutz, T. Reguly, L. Boucher, A. Breitkreutz, and M. Tyers, "BioGRID: A general repository for interaction datasets," Nucleic Acids Res., vol. 34, no. suppl 1, pp. 535-539, 2006.
[30]
A. Chatr-Aryamontri, A. Ceol, L. M. Palazzi, G. Nardelli, M. V. Schneider, L. Castagnoli, and G. Cesareni, "MINT: The Molecular INTeraction database," Nucleic Acids Res., vol. 35, no. suppl 1, pp. 572-574, 2007.
[31]
H. Hermjakob, L. Montecchi-Palazzi, C. Lewington, S. Mudali, S. Kerrien, S. Orchard, M. Vingron, B. Roechert, P. Roepstorff, A. Valencia, H. Margalit, J. Armstrong, A. Bairoch, G. Cesareni, D. Sherman, and R. Apweiler, "IntAct: An open source molecular interaction database," Nucleic Acids Res., vol. 32, no. database issue, pp. 452-455, 2004.
[32]
S. Peri, J. D. Navarro, R. Amanchy, T. Z. Kristiansen, C. K. Jonnalagadda, V. Surendranath, V. Niranjan, B. Muthusamy, T. K. B. Gandhi, M. Gronborg, N. Ibarrola, N. Deshpande, K. Shanker, H. N. Shivashankar, B. P. Rashmi, M. A. Ramya, Z. Zhao, K. N. Chandrika, N. Padma, H. C. Harsha, A. J. Yatish, M. P. Kavitha, M. Menezes, D. R. Choudhury, S. Suresh, N. Ghosh, R. Saravana, S. Chandran, S. Krishna, M. Joy, S. K. Anand, V. Madavan, A. Joseph, G. W. Wong, W. P. Schiemann, S. N. Constantinescu, L. Huang, R. Khosravi-Far, H. Steen, M. Tewari, S. Ghaffari, G. C. Blobe, C. V. Dang, J. G. N. Garcia, J. Pevsner, O. N. Jensen, P. Roepstorff, K. S. Deshpande, A. M. Chinnaiyan, A. Hamosh, A. Chakravarti, and A. Pandey, "Development of human protein reference database as an initial platform for approaching systems biology in humans," Genome Res., vol. 13, no. 10, pp. 2363-2371, 2003.
[33]
S. F. Altschul, W. Gish, W. Miller, E. W. Myers, and D. J. Lipman, "Basic local alignment search tool," J. Mol. Biol., vol. 215, no. 3, pp. 403-410, 1990.
[34]
U. Hobohm, M. Scharf, R. Schneider, and C. Sander, "Selection of representative protein data sets," Protein Sci., vol. 1, no. 3, pp. 409-417, 1992.
[35]
S. Griep and U. Hobohm, "PDBselect 1992-2009 and PDBfilter-select," Nucleic Acids Res., vol. 38, no. suppl 1, pp. 318-319, 2010.
[36]
O. Emanuelsson, H. Nielsen, and G. V. Heijne, "ChloroP, A neural network-based method for predicting chloroplast transit peptides and their cleavage sites," Protein Sci., vol. 8, no. 5, pp. 978-984, 1999.
[37]
K. Wang, D. W. Ussery, and S. Brunak, "Analysis and prediction of gene splice sites in four Aspergillus genomes," Fungal Genetics Biol., vol. 46, no. suppl 1, pp. 14-18, 2009.
[38]
T. F. Smith and M. S. Waterman, "Identification of common molecular subsequences," J. Mol. Biol., vol. 147, no. 1, pp. 195-197, 1981.
[39]
S. F. Altschul, T. L. Madden, A. A. Schaffer, J. Zhang, Z. Zhang, W. Miller, and D. J. Lipman, "Gapped BLAST and PSI-BLAST: A new generation of protein database search programs," Nucleic Acids Res., vol. 25, no. 17, pp. 3389-3402, 1997.
[40]
R. Kohavi. (1995). A study of cross-validation and bootstrap for accuracy estimation and model selection Proc. 14th Int. Joint Conf. Artif. Intell. - vol. 2, pp. 1137-1143. [Online]. Available: http://dl.acm.org/citation.cfm?id=1643031.1643047
[41]
L. Dehaspe and L. D. Raedt, "Mining association rules in multiple relations," in Proc. 7th Int. Workshop Inductive Logic Programm., 1997, pp. 125-132.
[42]
H. Blockeel, L. Dehaspe, J. Ramon, J. Struyf, A. V. Assche, C. Vens, and D. Fierens. (2006). The ACE Data Mining System. User's Manual. [Online]. Available: http://www.cs.kuleuven.be/~dtai/ACE.
[43]
R. Agrawal, H. Mannila, R. Srikant, H. Toivonen, and A. I. Verkamo, "Fast discovery of association rules," in Advances in Knowledge Discovery and Data Mining, Cambridge, MA, USA: MIT Press, 1996, pp. 307-328.
[44]
H. Blockeel, L. D. Raedt, and J. Ramon, "Top-down induction of clustering trees," in Proc. 15th Int. Conf. Mach. Learn., 1998, pp. 55-63.
[45]
J. R. Quinlan, C4.5: Programs for Machine Learning. San Mateo, CA, USA: Morgan Kaufmann, 1993.
[46]
J. Davis and M. Goadrich, "The relationship between Precision-Recall and ROC curves," in Proc. Int. Conf. Mach. Learn., 2006, pp. 233-240.
[47]
F. Sebastiani, "Machine learning in automated text categorization," ACM Comput. Surv., vol. 34, no. 1, pp. 1-47, 2002.
[48]
H. M. McBride, M. Neuspiel, and S. Wasiak, "Mitochondria: More than just a powerhouse," Current Biol., vol. 16, no. 14, pp. 551-560, 2006.
[49]
Y. Xu, "Chemistry in human telomere biology: Structure, function and targeting of telomere DNA/RNA," Chem. Soc. Rev., vol. 40, no. 5, pp. 2719-2740, 2011.
[50]
M. Hochstrasser, "Origin and function of Ubiquitin-like proteins," Nature, vol. 458, no. 7237, pp. 422-429, 2009.
[51]
D. Bailey and P. O'Hare, "Comparison of the SUMO1 and Ubiquitin conjugation pathways during the inhibition of proteasome activity with evidence of SUMO1 recycling," Biochem. J., vol. 392, no. 2, pp. 271-281, 2005.
[52]
R. Groisman, J. Polanowska, I. Kuraoka, J.-I. Sawada, M. Saijo, R. Drapkin, A. F. Kisselev, K. Tanaka, and Y. Nakatani, "The Ubiquitin ligase activity in the DDB2 and CSA complexes is differentially regulated by the COP9 signalosome in response to DNA damage," Cell, vol. 113, no. 3, pp. 357-367, 2003.
[53]
M. Ihara, H. Yamamoto, and A. Kikuchi, "SUMO-1 modification of PIASy, an E3 ligase, is necessary for PIASy-dependent activation of Tcf-4," Mol. Cell. Biol., vol. 25, no. 9, pp. 3506-3518, 2005.
[54]
K. Takeyama, R. C. T. Aguiar, L. Gu, C. He, G. J. Freeman, J. L. Kutok, J. C. Aster, and M. A. Shipp, "The BAL-binding protein BBAP and related Deltex family members exhibit Ubiquitin-protein Isopeptide ligase activity," J. Biol. Chem., vol. 278, no. 24, pp. 21 930-21 937, 2003.
[55]
J. H. Kim, S. M. Park, M. R. Kang, S. Y. Oh, T. H. Lee, M. T. Muller, and I. K. Chung, "Ubiquitin ligase MKRN1 modulates telomere length homeostasis through a proteolysis of hTERT," Genes Dev., vol. 19, no. 7, pp. 776-781, 2005.
[56]
S. Nagai, N. Davoodi, and S. M. Gasser, "Nuclear organization in genome stability: SUMO connections," Cell Res., vol. 21, no. 3, pp. 474-485, 2011.
[57]
J. J. Jiang and D. W. Conrath, "Semantic similarity based on corpus statistics and lexical taxonomy," in Proc. Int. Conf. Res. Comput. Linguistics, 1997, pp. 19-33.
[58]
C. Pesquita, D. Faria, A. O. Falcão, P. Lord, and F. M. Couto, "Semantic similarity in biomedical ontologies," PLoS Comput. Biol., vol. 5, no. 7, p. e1000443, 2009.

Recommendations

Comments

Information & Contributors

Information

Published In

IEEE/ACM Transactions on Computational Biology and Bioinformatics  Volume 11, Issue 4
July/August 2014
160 pages

Publisher

IEEE Computer Society Press

Washington, DC, United States

Publication History

Published: 01 July 2014
Accepted: 03 April 2014
Revised: 02 April 2014
Received: 21 December 2012
Published in TCBB Volume 11, Issue 4

Author Tags

  1. function prediction
  2. human reactome pathways
  3. knowledge relational representation
  4. machine learning
  5. pathway relationship prediction
  6. sequence-based prediction

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 56
    Total Downloads
  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)0
Reflects downloads up to 24 Dec 2024

Other Metrics

Citations

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media