We consider the problem of enumerating all instances of a given pattern graph in a large data graph. Our focus is on determining the input/output (I/O) complexity of this problem. Let E be the number of edges in the data graph, k=O(1) be the number of vertices in the pattern graph, B be the block length, and M be the main memory size. The main results of the paper are two algorithms that enumerate all instances of the pattern graph. The first one is a deterministic algorithm that exploits a suitable independent set of the pattern graph of size 1≤s≤k/2 and requires O(Ek−s/(BMk−s−1)) I/Os. The second algorithm is a randomized algorithm that enumerates all instances in O(Ek/2/(BMk/2−1)) expected I/Os; the same bound also applies with high probability under some assumptions. A lower bound shows that the deterministic algorithm is optimal for some pattern graphs with s=k/2 (e.g., paths and cycles of even length, meshes of even side), while the randomized algorithm is optimal for a wide class of pattern graphs, called Alon class (e.g., cliques, cycles and every graph with a perfect matching).

Subgraph Enumeration in Massive Graphs

SILVESTRI, FRANCESCO
2014

Abstract

We consider the problem of enumerating all instances of a given pattern graph in a large data graph. Our focus is on determining the input/output (I/O) complexity of this problem. Let E be the number of edges in the data graph, k=O(1) be the number of vertices in the pattern graph, B be the block length, and M be the main memory size. The main results of the paper are two algorithms that enumerate all instances of the pattern graph. The first one is a deterministic algorithm that exploits a suitable independent set of the pattern graph of size 1≤s≤k/2 and requires O(Ek−s/(BMk−s−1)) I/Os. The second algorithm is a randomized algorithm that enumerates all instances in O(Ek/2/(BMk/2−1)) expected I/Os; the same bound also applies with high probability under some assumptions. A lower bound shows that the deterministic algorithm is optimal for some pattern graphs with s=k/2 (e.g., paths and cycles of even length, meshes of even side), while the randomized algorithm is optimal for a wide class of pattern graphs, called Alon class (e.g., cliques, cycles and every graph with a perfect matching).
2014
Proc. of the 6th Workshop on Massive Data Algorithmics
File in questo prodotto:
Non ci sono file associati a questo prodotto.
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11577/3233836
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact