Limits...
Improving clustering by imposing network information.

Gerber S, Horenko I - Sci Adv (2015)

Bottom Line: Cluster analysis is one of the most popular data analysis tools in a wide range of applied disciplines.The introduced approach is illustrated on the problem of a noninvasive unsupervised brain signal classification.This task is faced with several challenging difficulties such as nonstationary noisy signals and a small sample size, combined with a high-dimensional feature space and huge noise-to-signal ratios.

View Article: PubMed Central - PubMed

Affiliation: Università della Svizzera Italiana, Via Giuseppe Buffi 13, 6900 Lugano, Switzerland.

ABSTRACT
Cluster analysis is one of the most popular data analysis tools in a wide range of applied disciplines. We propose and justify a computationally efficient and straightforward-to-implement way of imposing the available information from networks/graphs (a priori available in many application areas) on a broad family of clustering methods. The introduced approach is illustrated on the problem of a noninvasive unsupervised brain signal classification. This task is faced with several challenging difficulties such as nonstationary noisy signals and a small sample size, combined with a high-dimensional feature space and huge noise-to-signal ratios. Applying this approach results in an exact unsupervised classification of very short signals, opening new possibilities for clustering methods in the area of a noninvasive brain-computer interface.

No MeSH data available.


Visualization of the two identified manifolds.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4643807&req=5

Figure 2: Visualization of the two identified manifolds.

Mentions: We tested the methodology for this particular choice of distance measure g and weight matrix W with the first 20 subjects of the original data set. We achieved an exact and unsupervised classification of opened and closed eyes measurements for all of the cases with the measurement fragments that were down to 7 seconds short. We will show exemplarily results for the subject Nr. 1. Figure 1B demonstrates the mAIC curves for models with K = 1, 2, 3 clusters as a function of the regularization constant ϵ2. The overall minimum is attained at the position K = 2, ϵ2 ≥ 105. As can be seen from Fig. 1B, from the viewpoint of information theory, the optimal solution of clustering problems obtained with standard unregularized clustering methods (such as k-means and hierarchical clustering algorithms) is attained for K = 1, and allows no distinction between the two states (that is, between opened and closed eyes), and is inferior in terms of information contents to the solution of the regularized problem (Eq. 4) for a given set of data. By introducing regularization, the overall minimum is attained with two clusters (K = 2), where one cluster only corresponds to the opened eyes and the second to the closed. Both experiments are correctly classified to their respective manifolds and can be visualized by plotting the cluster affiliation function γ for the optimal result (please see fig. S1). The two identified attractive manifolds—each of which is characteristic for one experiment—can be visualized by plotting the first three dimensions (out of the 300 most significant dimensions) that were detected during the data reprocessing via PCA. Figure 2 gives an impression of the dynamics of the two systems in phase space. Both dynamic systems are essentially nonlinear oscillators. Although they behave similarly, one can see that the orientation of the planes in which the oscillations take place is different. These principal attractor manifolds are approximated and distinguished via linear projectors Θi deployed in our PCA-based regularized clustering procedure. However, because the dynamics are geared into each other, standard algorithms are incapable of correctly solving this clustering problem. This result demonstrates that deploying the manifold-based clustering combined with a priori persistency assumption for the underlying dynamics allows us to use the respective structure of the manifold as a classifier to determine whether the unlabeled short measurement belongs to a subject with opened or closed eyes. New data can be projected on the manifolds and (by means of proximity) assigned to one of them. Standard PCA clustering [with but without the graph-induced regularization] was, in contrast, not able to detect the two manifolds and proposed a common basis manifold for the two situations. That is, graph-induced regularization introduced in this paper appears to be essential for the correct unsupervised classification of these data sequences.


Improving clustering by imposing network information.

Gerber S, Horenko I - Sci Adv (2015)

Visualization of the two identified manifolds.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4643807&req=5

Figure 2: Visualization of the two identified manifolds.
Mentions: We tested the methodology for this particular choice of distance measure g and weight matrix W with the first 20 subjects of the original data set. We achieved an exact and unsupervised classification of opened and closed eyes measurements for all of the cases with the measurement fragments that were down to 7 seconds short. We will show exemplarily results for the subject Nr. 1. Figure 1B demonstrates the mAIC curves for models with K = 1, 2, 3 clusters as a function of the regularization constant ϵ2. The overall minimum is attained at the position K = 2, ϵ2 ≥ 105. As can be seen from Fig. 1B, from the viewpoint of information theory, the optimal solution of clustering problems obtained with standard unregularized clustering methods (such as k-means and hierarchical clustering algorithms) is attained for K = 1, and allows no distinction between the two states (that is, between opened and closed eyes), and is inferior in terms of information contents to the solution of the regularized problem (Eq. 4) for a given set of data. By introducing regularization, the overall minimum is attained with two clusters (K = 2), where one cluster only corresponds to the opened eyes and the second to the closed. Both experiments are correctly classified to their respective manifolds and can be visualized by plotting the cluster affiliation function γ for the optimal result (please see fig. S1). The two identified attractive manifolds—each of which is characteristic for one experiment—can be visualized by plotting the first three dimensions (out of the 300 most significant dimensions) that were detected during the data reprocessing via PCA. Figure 2 gives an impression of the dynamics of the two systems in phase space. Both dynamic systems are essentially nonlinear oscillators. Although they behave similarly, one can see that the orientation of the planes in which the oscillations take place is different. These principal attractor manifolds are approximated and distinguished via linear projectors Θi deployed in our PCA-based regularized clustering procedure. However, because the dynamics are geared into each other, standard algorithms are incapable of correctly solving this clustering problem. This result demonstrates that deploying the manifold-based clustering combined with a priori persistency assumption for the underlying dynamics allows us to use the respective structure of the manifold as a classifier to determine whether the unlabeled short measurement belongs to a subject with opened or closed eyes. New data can be projected on the manifolds and (by means of proximity) assigned to one of them. Standard PCA clustering [with but without the graph-induced regularization] was, in contrast, not able to detect the two manifolds and proposed a common basis manifold for the two situations. That is, graph-induced regularization introduced in this paper appears to be essential for the correct unsupervised classification of these data sequences.

Bottom Line: Cluster analysis is one of the most popular data analysis tools in a wide range of applied disciplines.The introduced approach is illustrated on the problem of a noninvasive unsupervised brain signal classification.This task is faced with several challenging difficulties such as nonstationary noisy signals and a small sample size, combined with a high-dimensional feature space and huge noise-to-signal ratios.

View Article: PubMed Central - PubMed

Affiliation: Università della Svizzera Italiana, Via Giuseppe Buffi 13, 6900 Lugano, Switzerland.

ABSTRACT
Cluster analysis is one of the most popular data analysis tools in a wide range of applied disciplines. We propose and justify a computationally efficient and straightforward-to-implement way of imposing the available information from networks/graphs (a priori available in many application areas) on a broad family of clustering methods. The introduced approach is illustrated on the problem of a noninvasive unsupervised brain signal classification. This task is faced with several challenging difficulties such as nonstationary noisy signals and a small sample size, combined with a high-dimensional feature space and huge noise-to-signal ratios. Applying this approach results in an exact unsupervised classification of very short signals, opening new possibilities for clustering methods in the area of a noninvasive brain-computer interface.

No MeSH data available.