Four-set hypergraphlets for characterization of directed hypergraphs

H Moon, H Kim, S Kim, K Shin - arXiv preprint arXiv:2311.14289, 2023 - arxiv.org
H Moon, H Kim, S Kim, K Shin
arXiv preprint arXiv:2311.14289, 2023arxiv.org
A directed hypergraph, which consists of nodes and hyperarcs, is a higher-order data
structure that naturally models directional group interactions (eg, chemical reactions of
molecules). Although there have been extensive studies on local structures of (directed)
graphs in the real world, those of directed hypergraphs remain unexplored. In this work, we
focus on measurements, findings, and applications related to local structures of directed
hypergraphs, and they together contribute to a systematic understanding of various real …
A directed hypergraph, which consists of nodes and hyperarcs, is a higher-order data structure that naturally models directional group interactions (e.g., chemical reactions of molecules). Although there have been extensive studies on local structures of (directed) graphs in the real world, those of directed hypergraphs remain unexplored. In this work, we focus on measurements, findings, and applications related to local structures of directed hypergraphs, and they together contribute to a systematic understanding of various real-world systems interconnected by directed group interactions. Our first contribution is to define 91 directed hypergraphlets (DHGs), which disjointly categorize directed connections and overlaps among four node sets that compose two incident hyperarcs. Our second contribution is to develop exact and approximate algorithms for counting the occurrences of each DHG. Our last contribution is to characterize 11 real-world directed hypergraphs and individual hyperarcs in them using the occurrences of DHGs, which reveals clear domain-based local structural patterns. Our experiments demonstrate that our DHG-based characterization gives up to 12% and 33% better performances on hypergraph clustering and hyperarc prediction, respectively, than baseline characterization methods. Moreover, we show that CODA-A, which is our proposed approximate algorithm, is up to 32X faster than its competitors with similar characterization quality.
arxiv.org