Bad communities with high modularity
A Kehagias, L Pitsoulis - The European Physical Journal B, 2013 - Springer
The European Physical Journal B, 2013•Springer
In this paper we discuss some problematic aspects of Newman and Girvan's modularity
function QN. Given a graph G, the modularity of G can be written as QN= Q f− Q 0, where Q f
is the intracluster edge fraction of G and Q 0 is the expected intracluster edge fraction of the
null model, ie, a randomly connected graph with same expected degree distribution as G. It
follows that the maximization of QN must accomodate two factors pulling in opposite
directions: Q f favors a small number of clusters and Q 0 favors many balanced (ie, with …
function QN. Given a graph G, the modularity of G can be written as QN= Q f− Q 0, where Q f
is the intracluster edge fraction of G and Q 0 is the expected intracluster edge fraction of the
null model, ie, a randomly connected graph with same expected degree distribution as G. It
follows that the maximization of QN must accomodate two factors pulling in opposite
directions: Q f favors a small number of clusters and Q 0 favors many balanced (ie, with …
Abstract
In this paper we discuss some problematic aspects of Newman and Girvan’s modularity function Q N . Given a graph G, the modularity of G can be written as Q N = Q f − Q 0, where Q f is the intracluster edge fraction of G and Q 0 is the expected intracluster edge fraction of the null model, i.e., a randomly connected graph with same expected degree distribution as G. It follows that the maximization of Q N must accomodate two factors pulling in opposite directions:Q f favors a small number of clusters and Q 0 favors many balanced (i.e., with approximately equal degrees) clusters. In certain cases the Q 0 term can cause overestimation of the true cluster number; this is the opposite of the well-known underestimation effect caused by the “resolution limit” of modularity. We illustrate the overestimation effect by constructing families of graphs with a “natural” community structure which, however, does not maximize modularity. In fact, we show there exist graphs G with a “natural clustering” V of G and another, balanced clustering U of G such that (i) the pair (G, U) has higher modularity than (G, V) and (ii) V and U are arbitrarily different.
Springer