skip to main content
article

Partitioning biological networks into highly connected clusters with maximum edge coverage

Published: 01 May 2014 Publication History

Abstract

A popular clustering algorithm for biological networks which was proposed by Hartuv and Shamir [5] identifies nonoverlapping highly connected components. We extend the approach taken by this algorithm by introducing the combinatorial optimization problem HIGHLY CONNECTED DELETION, which asks for removing as few edges as possible from a graph such that the resulting graph consists of highly connected components. We show that HIGHLY CONNECTED DELETION is NP-hard and provide a fixed-parameter algorithm and a kernelization. We propose exact and heuristic solution strategies, based on polynomial-time data reduction rules and integer linear programming with column generation. The data reduction typically identifies 75 percent of the edges that are deleted for an optimal solution; the column generation method can then optimally solve protein interaction networks with up to 6,000 vertices and 13,500 edges within five hours. Additionally, we present a new heuristic that finds more clusters than the method by Hartuv and Shamir.

References

[1]
R. Albert, "Scale-Free Networks in Cell Biology," J. Cell Science, vol. 118, no. 21, pp. 4947-4957, 2007.
[2]
R. Sharan, I. Ulitsky, and R. Shamir, "Network-Based Prediction of Protein Function," Molecular Systems Biology, vol. 3, article 88, 2007.
[3]
V. Spirin and L.A. Mirny, "Protein Complexes and Functional Modules in Molecular Networks," Proc. Nat'l Academy of Sciences USA, vol. 100, no. 21, pp. 12123-12128, 2003.
[4]
M.E.J. Newman, Networks: An Introduction. Oxford Univ. Press, 2010.
[5]
E. Hartuv and R. Shamir, "A Clustering Algorithm Based on Graph Connectivity," Information Processing Letters, vol. 76, no. 4-6, pp. 175-181, 2000.
[6]
E. Hartuv, A.O. Schmitt, J. Lange, S. Meier-Ewert, H. Lehrach, and R. Shamir, "An Algorithm for Clustering cDNA Fingerprints," Genomics, vol. 66, no. 3, pp. 249-256, 2000.
[7]
N. Pržulj, D.A. Wigle and I. Jurisica, "Functional Topology in a Network of Protein Interactions," Bioinformatics, vol. 20, no. 3, pp. 340-348, 2004.
[8]
W. Hayes, K. Sun, and N. Pržulj, "Graphlet-Based Measures are Suitable for Biological Network Comparison," Bioinformatics, vol. 29, no. 4, pp. 483-491, 2013.
[9]
A. Krause, J. Stoye, and M. Vingron, "Large Scale Hierarchical Clustering of Protein Sequences," BMC Bioinformatics, vol. 6, article 15, 2005.
[10]
B.J. Parker, I. Moltke, A. Roth, S. Washietl, J. Wen, M. Kellis, R. Breaker, and J.S. Pedersen, "New Families of Human Regulatory RNA Structures Identified by Comparative Analysis of Vertebrate Genomes," Genome Research, vol. 21, no. 11, pp. 1929-1943, 2011.
[11]
G. Chartrand, "A Graph-Theoretic Approach to a Communications Problem," SIAM J. Applied Math., vol. 14, no. 4, pp. 778-781, 1966.
[12]
H. Matsuda, T. Ishihara, and A. Hashimoto, "Classifying Molecular Sequences Using a Linkage Graph with Their Pairwise Similarities," Theoretical Computer Science, vol. 210, no. 2, pp. 305-325, 1999.
[13]
D. Jiang and J. Pei, "Mining Frequent Cross-Graph Quasi-Cliques," ACM Trans. Knowledge Discovery from Data, vol. 2, no. 4, pp. 16:1-16:42, 2009.
[14]
I. Gat-Viks, R. Sharan, and R. Shamir, "Scoring Clustering Solutions by Their Biological Relevance," Bioinformatics, vol. 19, no. 18, pp. 2381-2389, 2003.
[15]
M. Koyutürk, W. Szpankowski, and A. Grama, "Assessing Significance of Connectivity and Conservation in Protein Interaction Networks," J. Computational Biology, vol. 14, no. 6, pp. 747-764, 2007.
[16]
R. Shamir, R. Sharan, and D. Tsur, "Cluster Graph Modification Problems," Discrete Applied Math., vol. 144, no. 1-2, pp. 173-182, 2004.
[17]
H. Liu, P. Zhang, and D. Zhu, "On Editing Graphs into 2-Club Clusters," Proc. Sixth Int'l Frontiers in Algorithmics, and Eighth Int'l Conf. Algorithmic Aspects in Information and Management (FAWAAIM'12) , pp. 235-246, 2012.
[18]
G.-R. Cai and Y.-G. Sun, "The Minimum Augmentation of any Graph to a k-Edge-Connected Graph," Networks, vol. 19, no. 1, pp. 151-172, 1989.
[19]
D. Aloise, S. Cafieri, G. Caporossi, P. Hansen, S. Perron, and L. Liberti, "Column Generation Algorithms for Exact Modularity Maximization in Networks," Physical Rev. E, vol. 82, article 046112, 2010.
[20]
R. Impagliazzo, R. Paturi, and F. Zane, "Which Problems Have Strongly Exponential Complexity?" J. Computer and System Sciences, vol. 63, no. 4, pp. 512-530, 2001.
[21]
N. Atias and R. Sharan, "Comparative Analysis of Protein Networks: Hard Problems, Practical Solutions," Comm. ACM, vol. 55, no. 5, pp. 88-97, 2012.
[22]
S. Böcker, S. Briesemeister and G.W. Klau, "Exact Algorithms for Cluster Editing: Evaluation and Experiments," Algorithmica, vol. 60, no. 2, pp. 316-334, 2011.
[23]
R.G. Downey and M.R. Fellows, Parameterized Complexity. Springer, 1999.
[24]
J. Flum and M. Grohe, Parameterized Complexity Theory. Springer, 2006.
[25]
R. Niedermeier, Invitation to Fixed-Parameter Algorithms. Oxford Univ. Press, 2006.
[26]
J. Guo and R. Niedermeier, "Invitation to Data Reduction and Problem Kernelization," ACM SIGACT News, vol. 38, no. 1, pp. 31- 45, 2007.
[27]
D. Lokshtanov, D. Marx, and S. Saurabh, "Lower Bounds Based on the Exponential Time Hypothesis," Bull. EATCS, vol. 105, pp. 41-71, 2011.
[28]
J.M.M. van Rooij, M.E. van Kooten Niekerk, and H.L. Bodlaender, "Partition into Triangles on Bounded Degree Graphs," Theory of Computing Systems, vol. 52, no. 4, pp. 687-718, 2013.
[29]
C. Komusiewicz and J. Uhlmann, "Cluster Editing with Locally Bounded Modifications," Discrete Applied Math., vol. 160, no. 15, pp. 2259-2270, 2012.
[30]
R.E. Gomory and T.C. Hu, "Multi-Terminal Network Flows," J. Soc. for Industrial and Applied Math., vol. 9, no. 4, pp. 551-570, 1961.
[31]
V. King, S. Rao, and R.E. Tarjan, "A Faster Deterministic Maximum Flow Algorithm," J. Algorithms, vol. 17, no. 3, pp. 447-474, 1994.
[32]
S. Böcker and P. Damaschke, "Even Faster Parameterized Cluster Deletion and Cluster Editing," Information Processing Letters, vol. 111, no. 14, pp. 717-721, 2011.
[33]
W.-C. Chang, S. Vakati, R. Krause, and O. Eulenstein, "Exploring Biological Interaction Networks with Tailored Weighted Quasi-Bicliques," BMC Bioinformatics, vol. 13, no. S-10, article 516, 2012.
[34]
M. Grötschel, Y. Wakabayashi, "A Cutting Plane Algorithm for a Clustering Problem," Math. Programming, vol. 45, no. 1-3, pp. 59- 96, 1989.
[35]
A. Mehrotra and M.A. Trick, "Cliques and Clustering: A Combinatorial Approach," Operations Research Letters, vol. 22, no. 1, pp. 1-12, 1998.
[36]
X. Ji and J.E. Mitchell, "Branch-and-Price-and-Cut on the Clique Partitioning Problem with Minimum Clique Size Requirement," Discrete Optimization, vol. 4, no. 1, pp. 87-102, 2007.
[37]
C. Chekuri, A.V. Goldberg, D.R. Karger, M.S. Levine, and C. Stein, "Experimental Study of Minimum Cut Algorithms," Proc. Eighth Symp. Discrete Algorithms, pp. 324-333, 1997.
[38]
C. Stark, B.-J. Breitkreutz, T. Reguly, L. Boucher, A. Breitkreutz, and M. Tyers, "BioGRID: A General Repository for Interaction Datasets," Nucleic Acids Research, vol. 34, suppl. 1, pp. D535-D539, 2006.
[39]
S. Falcon and R. Gentleman, "Using GOstats to Test Gene Lists for GO Term Association," Bioinformatics, vol. 23, no. 2, suppl. 1, pp. 257-258, 2007.
[40]
T.Z. Berardini et al., "Functional Annotation of the Arabidopsis Genome Using Controlled Vocabularies," Plant Physiology, vol. 135, no. 2, pp. 1-11, 2004.
[41]
S. Brohée, J. van Helden, "Evaluation of Clustering Algorithms for Protein-Protein Interaction Networks," BMC Bioinformatics, vol. 7, no. 1, article 488, 2006.
[42]
S. van Dongen, "Graph Clustering by Flow Simulation," PhD dissertation, Univ. of Utrecht, 2000.
[43]
J.Z. Wang, Z. Du, R. Payattakool, P.S. Yu, and C.-F. Chen, "A New Method to Measure the Semantic Similarity of GO Terms," Bioinformatics, vol. 23, no. 10, pp. 1274-1281, 2007.
[44]
P. Ronhovde and Z. Nussinov, "Local Resolution-Limit-Free Potts Model for Community Detection," Physical Rev. E, vol. 81, no. 4, article 046114, 2010.

Cited By

View all
  • (2016)On the Computational Complexity of Software (Re)Modularization: Elaborations and OpportunitiesProceedings of the 9th EAI International Conference on Bio-inspired Information and Communications Technologies (formerly BIONETICS)10.4108/eai.3-12-2015.2262449(418-424)Online publication date: 24-May-2016
  • (2015)Parameterized Algorithmics for Graph Modification ProblemsRevised Papers of the 41st International Workshop on Graph-Theoretic Concepts in Computer Science - Volume 922410.1007/978-3-662-53174-7_1(3-15)Online publication date: 17-Jun-2015

Recommendations

Comments

Information & Contributors

Information

Published In

IEEE/ACM Transactions on Computational Biology and Bioinformatics  Volume 11, Issue 3
May/June 2014
159 pages

Publisher

IEEE Computer Society Press

Washington, DC, United States

Publication History

Published: 01 May 2014
Accepted: 26 November 2013
Revised: 04 October 2013
Received: 15 July 2013
Published in TCBB Volume 11, Issue 3

Author Tags

  1. PPI networks
  2. cluster analysis
  3. data reduction
  4. fixed-parameter tractability
  5. heuristics
  6. integer linear programming

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)1
Reflects downloads up to 11 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2016)On the Computational Complexity of Software (Re)Modularization: Elaborations and OpportunitiesProceedings of the 9th EAI International Conference on Bio-inspired Information and Communications Technologies (formerly BIONETICS)10.4108/eai.3-12-2015.2262449(418-424)Online publication date: 24-May-2016
  • (2015)Parameterized Algorithmics for Graph Modification ProblemsRevised Papers of the 41st International Workshop on Graph-Theoretic Concepts in Computer Science - Volume 922410.1007/978-3-662-53174-7_1(3-15)Online publication date: 17-Jun-2015

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media