HOPS: Probabilistic Subtree Mining for Small and Large Graphs

Pascal Welke, Florian Seiffarth, Michael Kamp, Stefan Wrobel: HOPS: Probabilistic Subtree Mining for Small and Large Graphs. In: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 1275–1284, Association for Computing Machinery, Virtual Event, CA, USA, 2020, ISBN: 9781450379984.

Abstract

Frequent subgraph mining, i.e., the identification of relevant patterns in graph databases, is a well-known data mining problem with high practical relevance, since next to summarizing the data, the resulting patterns can also be used to define powerful domain-specific similarity functions for prediction. In recent years, significant progress has been made towards subgraph mining algorithms that scale to complex graphs by focusing on tree patterns and probabilistically allowing a small amount of incompleteness in the result. Nonetheless, the complexity of the pattern matching component used for deciding subtree isomorphism on arbitrary graphs has significantly limited the scalability of existing approaches. In this paper, we adapt sampling techniques from mathematical combinatorics to the problem of probabilistic subtree mining in arbitrary databases of many small to medium-size graphs or a single large graph. By restricting on tree patterns, we provide an algorithm that approximately counts or decides subtree isomorphism for arbitrary transaction graphs in sub-linear time with one-sided error. Our empirical evaluation on a range of benchmark graph datasets shows that the novel algorithm substantially outperforms state-of-the-art approaches both in the task of approximate counting of embeddings in single large graphs and in probabilistic frequent subtree mining in large databases of small to medium sized graphs.

BibTeX (Download)

@inproceedings{10.1145/3394486.3403180,
title = {HOPS: Probabilistic Subtree Mining for Small and Large Graphs},
author = {Pascal Welke and Florian Seiffarth and Michael Kamp and Stefan Wrobel},
url = {https://doi.org/10.1145/3394486.3403180},
doi = {10.1145/3394486.3403180},
isbn = {9781450379984},
year  = {2020},
date = {2020-01-01},
urldate = {2020-01-01},
booktitle = {Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining},
pages = {1275–1284},
publisher = {Association for Computing Machinery},
address = {Virtual Event, CA, USA},
series = {KDD '20},
abstract = {Frequent subgraph mining, i.e., the identification of relevant patterns in graph databases, is a well-known data mining problem with high practical relevance, since next to summarizing the data, the resulting patterns can also be used to define powerful domain-specific similarity functions for prediction. In recent years, significant progress has been made towards subgraph mining algorithms that scale to complex graphs by focusing on tree patterns and probabilistically allowing a small amount of incompleteness in the result. Nonetheless, the complexity of the pattern matching component used for deciding subtree isomorphism on arbitrary graphs has significantly limited the scalability of existing approaches. In this paper, we adapt sampling techniques from mathematical combinatorics to the problem of probabilistic subtree mining in arbitrary databases of many small to medium-size graphs or a single large graph. By restricting on tree patterns, we provide an algorithm that approximately counts or decides subtree isomorphism for arbitrary transaction graphs in sub-linear time with one-sided error. Our empirical evaluation on a range of benchmark graph datasets shows that the novel algorithm substantially outperforms state-of-the-art approaches both in the task of approximate counting of embeddings in single large graphs and in probabilistic frequent subtree mining in large databases of small to medium sized graphs.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.