Subgraph Enumeration in Massive Graphs

Silvestri, Francesco

We consider the problem of enumerating all instances of a given pattern graph in a large data graph. Our focus is on determining the input/output (I/O) complexity of this problem. Let E be the number of edges in the data graph, k=O(1) be the number of vertices in the pattern graph, B be the block length, and M be the main memory size. The main results of the paper are two algorithms that enumerate all instances of the pattern graph. The first one is a deterministic algorithm that exploits a suitable independent set of the pattern graph of size 1≤s≤k/2 and requires O(Ek−s/(BMk−s−1)) I/Os. The second algorithm is a randomized algorithm that enumerates all instances in O(Ek/2/(BMk/2−1)) expected I/Os; the same bound also applies with high probability under some assumptions. A lower bound shows that the deterministic algorithm is optimal for some pattern graphs with s=k/2 (e.g., paths and cycles of even length, meshes of even side), while the randomized algorithm is optimal for a wide class of pattern graphs, called Alon class (e.g., cliques, cycles and every graph with a perfect matching).