Mining Frequent Patterns in Evolving Graphs

Friday, October 26, 2018

11.00 a.m.

ISI seminar room 1st floor

Dr. Anis Nasir King

Given a labeled graph, the frequent-subgraph mining (FSM) problem asks to find all the k-vertex subgraphs that appear with frequency greater than a given threshold. FSM has numerous applications ranging from biology to network science, as it provides a compact summary of the characteristics of the graph. However, the task is challenging, even more so for evolving graphs due to the streaming nature of the input and the exponential time complexity of the problem. We initiate the study of the approximate FSM problem in both incremental and fully-dynamic streaming settings, where arbitrary edges can be added or removed from the graph. For each streaming setting, we propose algorithms that can extract a high-quality approximation of the frequent k-vertex subgraphs for a given threshold, at any given time instance, with high probability. In contrast to the existing state-of-the-art solutions that require iterating over the entire set of subgraphs for any update, our algorithms operate by maintaining a uniform sample of k-vertex subgraphs with optimized neighborhood-exploration procedures local to the updates. We provide theoretical analysis of the proposed algorithms and empirically demonstrate that the proposed algorithms generate high-quality results compared to baselines.

Anis is a research engineer, working at streaming platform team at King. He finished his Phd in may 2018 from School of Electrical Engineering and Computer Science, KTH Royal Institute of Technology. During the PhD, he completed three internships at: Yahoo Labs Barcelona, Aalto University and IBM Research Tokyo. Prior to the PhD studies, he finished European Master in Distributed Computing from KTH Royal Institute of Technology and UPC, Polytechnic University of Catalunya. His research interests revolve around designing simple and efficient algorithms (i.e., deterministic, approximation or randomized) for computationally challenging problems related to: graph processing, data streams and sliding windows, large scale distributed systems.