Data-driven Anomaly Detection approaches have received increasing attention in many application areas in the past few years as a tool to monitor complex systems in addition to classical univariate control charts. Tree-based approaches have proven to be particularly effective when dealing with high-dimensional Anomaly Detection problems and with underlying non-gaussian data distributions. The most popular approach in this family is the Isolation Forest, which is currently one of the most popular choices for scientists and practitioners when dealing with Anomaly Detection tasks. The Isolation Forest represents a seminal algorithm upon which many extended approaches have been presented in the past years aiming at improving the original method or at dealing with peculiar application scenarios. In this work, we revise some of the most popular and powerful Tree-based approaches to Anomaly Detection (extensions of the Isolation Forest and other approaches), considering both batch and streaming data scenarios. This work will review several relevant aspects of the methods, like computational costs and interpretability traits. To help practitioners we also report available relevant libraries and open implementations, together with a review of real-world industrial applications of the considered approaches.
A Review of Tree-Based Approaches for Anomaly Detection
Barbariol T.;Marcato D.;Susto G. A.
2021
Abstract
Data-driven Anomaly Detection approaches have received increasing attention in many application areas in the past few years as a tool to monitor complex systems in addition to classical univariate control charts. Tree-based approaches have proven to be particularly effective when dealing with high-dimensional Anomaly Detection problems and with underlying non-gaussian data distributions. The most popular approach in this family is the Isolation Forest, which is currently one of the most popular choices for scientists and practitioners when dealing with Anomaly Detection tasks. The Isolation Forest represents a seminal algorithm upon which many extended approaches have been presented in the past years aiming at improving the original method or at dealing with peculiar application scenarios. In this work, we revise some of the most popular and powerful Tree-based approaches to Anomaly Detection (extensions of the Isolation Forest and other approaches), considering both batch and streaming data scenarios. This work will review several relevant aspects of the methods, like computational costs and interpretability traits. To help practitioners we also report available relevant libraries and open implementations, together with a review of real-world industrial applications of the considered approaches.Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.