Limits...
A Comparative Evaluation of Unsupervised Anomaly Detection Algorithms for Multivariate Data.

Goldstein M, Uchida S - PLoS ONE (2016)

Bottom Line: Dozens of algorithms have been proposed in this area, but unfortunately the research community still lacks a comparative universal evaluation as well as common publicly available datasets.Additionally, this evaluation reveals the strengths and weaknesses of the different approaches for the first time.As a conclusion, we give an advise on algorithm selection for typical real-world tasks.

View Article: PubMed Central - PubMed

Affiliation: Center for Co-Evolutional Social System Innovation, Kyushu University, Fukuoka, Japan.

ABSTRACT
Anomaly detection is the process of identifying unexpected items or events in datasets, which differ from the norm. In contrast to standard classification tasks, anomaly detection is often applied on unlabeled data, taking only the internal structure of the dataset into account. This challenge is known as unsupervised anomaly detection and is addressed in many practical applications, for example in network intrusion detection, fraud detection as well as in the life science and medical domain. Dozens of algorithms have been proposed in this area, but unfortunately the research community still lacks a comparative universal evaluation as well as common publicly available datasets. These shortcomings are addressed in this study, where 19 different unsupervised anomaly detection algorithms are evaluated on 10 different datasets from multiple application domains. By publishing the source code and the datasets, this paper aims to be a new well-funded basis for unsupervised anomaly detection research. Additionally, this evaluation reveals the strengths and weaknesses of the different approaches for the first time. Besides the anomaly detection performance, computational effort, the impact of parameter settings as well as the global/local anomaly detection behavior is outlined. As a conclusion, we give an advise on algorithm selection for typical real-world tasks.

Show MeSH

Related in: MedlinePlus

A simple two-dimensional example.It illustrates global anomalies (x1, x2), a local anomaly x3 and a micro-cluster c3.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4836738&req=5

pone.0152173.g002: A simple two-dimensional example.It illustrates global anomalies (x1, x2), a local anomaly x3 and a micro-cluster c3.

Mentions: The main idea of unsupervised anomaly detection algorithms is to detect data instances in a dataset, which deviate from the norm. However, there are a variety of cases in practice where this basic assumption is ambiguous. Fig 2 illustrates some of these cases using a simple two-dimensional dataset. Two anomalies can be easily identified by eye: x1 and x2 are very different from the dense areas with respect to their attributes and are therefore called global anomalies. When looking at the dataset globally, x3 can be seen as a normal record since it is not too far away from the cluster c2. However, when we focus only on the cluster c2 and compare it with x3 while neglecting all the other instances, it can be seen as an anomaly. Therefore, x3 is called a local anomaly, since it is only anomalous when compared with its close-by neighborhood. It depends on the application, whether local anomalies are of interest or not. Another interesting question is whether the instances of the cluster c3 should be seen as three anomalies or as a (small) regular cluster. These phenomena is called micro cluster and anomaly detection algorithms should assign scores to its members larger than the normal instances, but smaller values than the obvious anomalies. This simple example already illustrates that anomalies are not always obvious and a score is much more useful than a binary label assignment.


A Comparative Evaluation of Unsupervised Anomaly Detection Algorithms for Multivariate Data.

Goldstein M, Uchida S - PLoS ONE (2016)

A simple two-dimensional example.It illustrates global anomalies (x1, x2), a local anomaly x3 and a micro-cluster c3.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4836738&req=5

pone.0152173.g002: A simple two-dimensional example.It illustrates global anomalies (x1, x2), a local anomaly x3 and a micro-cluster c3.
Mentions: The main idea of unsupervised anomaly detection algorithms is to detect data instances in a dataset, which deviate from the norm. However, there are a variety of cases in practice where this basic assumption is ambiguous. Fig 2 illustrates some of these cases using a simple two-dimensional dataset. Two anomalies can be easily identified by eye: x1 and x2 are very different from the dense areas with respect to their attributes and are therefore called global anomalies. When looking at the dataset globally, x3 can be seen as a normal record since it is not too far away from the cluster c2. However, when we focus only on the cluster c2 and compare it with x3 while neglecting all the other instances, it can be seen as an anomaly. Therefore, x3 is called a local anomaly, since it is only anomalous when compared with its close-by neighborhood. It depends on the application, whether local anomalies are of interest or not. Another interesting question is whether the instances of the cluster c3 should be seen as three anomalies or as a (small) regular cluster. These phenomena is called micro cluster and anomaly detection algorithms should assign scores to its members larger than the normal instances, but smaller values than the obvious anomalies. This simple example already illustrates that anomalies are not always obvious and a score is much more useful than a binary label assignment.

Bottom Line: Dozens of algorithms have been proposed in this area, but unfortunately the research community still lacks a comparative universal evaluation as well as common publicly available datasets.Additionally, this evaluation reveals the strengths and weaknesses of the different approaches for the first time.As a conclusion, we give an advise on algorithm selection for typical real-world tasks.

View Article: PubMed Central - PubMed

Affiliation: Center for Co-Evolutional Social System Innovation, Kyushu University, Fukuoka, Japan.

ABSTRACT
Anomaly detection is the process of identifying unexpected items or events in datasets, which differ from the norm. In contrast to standard classification tasks, anomaly detection is often applied on unlabeled data, taking only the internal structure of the dataset into account. This challenge is known as unsupervised anomaly detection and is addressed in many practical applications, for example in network intrusion detection, fraud detection as well as in the life science and medical domain. Dozens of algorithms have been proposed in this area, but unfortunately the research community still lacks a comparative universal evaluation as well as common publicly available datasets. These shortcomings are addressed in this study, where 19 different unsupervised anomaly detection algorithms are evaluated on 10 different datasets from multiple application domains. By publishing the source code and the datasets, this paper aims to be a new well-funded basis for unsupervised anomaly detection research. Additionally, this evaluation reveals the strengths and weaknesses of the different approaches for the first time. Besides the anomaly detection performance, computational effort, the impact of parameter settings as well as the global/local anomaly detection behavior is outlined. As a conclusion, we give an advise on algorithm selection for typical real-world tasks.

Show MeSH
Related in: MedlinePlus