Limits...
Self-organising maps and correlation analysis as a tool to explore patterns in excitation-emission matrix data sets and to discriminate dissolved organic matter fluorescence components.

Ejarque-Gonzalez E, Butturini A - PLoS ONE (2014)

Bottom Line: SOM is a pattern recognition method which clusterizes and reduces the dimensionality of input EEMs without relying on any assumption about the data structure.According to our results, chemical industry effluents appeared to have unique and distinctive spectral characteristics.We conclude that SOM coupled with a correlation analysis procedure is a promising tool for studying large and heterogeneous EEM data sets.

View Article: PubMed Central - PubMed

Affiliation: Departament d'Ecologia, Facultat de Biologia, Universitat de Barcelona, Barcelona, Catalunya, Spain.

ABSTRACT
Dissolved organic matter (DOM) is a complex mixture of organic compounds, ubiquitous in marine and freshwater systems. Fluorescence spectroscopy, by means of Excitation-Emission Matrices (EEM), has become an indispensable tool to study DOM sources, transport and fate in aquatic ecosystems. However the statistical treatment of large and heterogeneous EEM data sets still represents an important challenge for biogeochemists. Recently, Self-Organising Maps (SOM) has been proposed as a tool to explore patterns in large EEM data sets. SOM is a pattern recognition method which clusterizes and reduces the dimensionality of input EEMs without relying on any assumption about the data structure. In this paper, we show how SOM, coupled with a correlation analysis of the component planes, can be used both to explore patterns among samples, as well as to identify individual fluorescence components. We analysed a large and heterogeneous EEM data set, including samples from a river catchment collected under a range of hydrological conditions, along a 60-km downstream gradient, and under the influence of different degrees of anthropogenic impact. According to our results, chemical industry effluents appeared to have unique and distinctive spectral characteristics. On the other hand, river samples collected under flash flood conditions showed homogeneous EEM shapes. The correlation analysis of the component planes suggested the presence of four fluorescence components, consistent with DOM components previously described in the literature. A remarkable strength of this methodology was that outlier samples appeared naturally integrated in the analysis. We conclude that SOM coupled with a correlation analysis procedure is a promising tool for studying large and heterogeneous EEM data sets.

Show MeSH

Related in: MedlinePlus

Outlier sensitivity test.A) Quantization stability: variation of the average SSIntra among 270 LOO subsets. The black dot indicates the mean. The absence of outlier values of CV(SSIntra) and the similar mean and median should be noted. B) Stability of neighbourhood relations: Histograms of the stabilities over all pairs of observations. In red, histograms of the LOO subsets in which the left-out sample was assigned to a single-neuron cluster. In green, histograms of the remaining LOO subsets. In black: histogram of the whole data set. It should be noted that there is hardly any difference between them. In grey, theoretical histogram of a randomly distributed map, following a binomial distribution defined according to de Bodt et al. [44]. This demonstrates that the SOM results are organised in a far from random distribution.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4048288&req=5

pone-0099618-g004: Outlier sensitivity test.A) Quantization stability: variation of the average SSIntra among 270 LOO subsets. The black dot indicates the mean. The absence of outlier values of CV(SSIntra) and the similar mean and median should be noted. B) Stability of neighbourhood relations: Histograms of the stabilities over all pairs of observations. In red, histograms of the LOO subsets in which the left-out sample was assigned to a single-neuron cluster. In green, histograms of the remaining LOO subsets. In black: histogram of the whole data set. It should be noted that there is hardly any difference between them. In grey, theoretical histogram of a randomly distributed map, following a binomial distribution defined according to de Bodt et al. [44]. This demonstrates that the SOM results are organised in a far from random distribution.

Mentions: The outlier sensitivity test showed that the presence of a few samples with very distinctive and infrequent spectral shapes (especially those assigned to single-neuron clusters) did not affect the SOM outcome in a meaningful way. The SSIntra computed for the 270 LOO subsets followed a Gaussian distribution without any outlier values (Figure 4A). Moreover, the mean was almost identical to the median (92.27 and 92.17, respectively), further indicating that none of the LOO subsets exhibited a statistically relevant differentiated quantization structure.


Self-organising maps and correlation analysis as a tool to explore patterns in excitation-emission matrix data sets and to discriminate dissolved organic matter fluorescence components.

Ejarque-Gonzalez E, Butturini A - PLoS ONE (2014)

Outlier sensitivity test.A) Quantization stability: variation of the average SSIntra among 270 LOO subsets. The black dot indicates the mean. The absence of outlier values of CV(SSIntra) and the similar mean and median should be noted. B) Stability of neighbourhood relations: Histograms of the stabilities over all pairs of observations. In red, histograms of the LOO subsets in which the left-out sample was assigned to a single-neuron cluster. In green, histograms of the remaining LOO subsets. In black: histogram of the whole data set. It should be noted that there is hardly any difference between them. In grey, theoretical histogram of a randomly distributed map, following a binomial distribution defined according to de Bodt et al. [44]. This demonstrates that the SOM results are organised in a far from random distribution.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4048288&req=5

pone-0099618-g004: Outlier sensitivity test.A) Quantization stability: variation of the average SSIntra among 270 LOO subsets. The black dot indicates the mean. The absence of outlier values of CV(SSIntra) and the similar mean and median should be noted. B) Stability of neighbourhood relations: Histograms of the stabilities over all pairs of observations. In red, histograms of the LOO subsets in which the left-out sample was assigned to a single-neuron cluster. In green, histograms of the remaining LOO subsets. In black: histogram of the whole data set. It should be noted that there is hardly any difference between them. In grey, theoretical histogram of a randomly distributed map, following a binomial distribution defined according to de Bodt et al. [44]. This demonstrates that the SOM results are organised in a far from random distribution.
Mentions: The outlier sensitivity test showed that the presence of a few samples with very distinctive and infrequent spectral shapes (especially those assigned to single-neuron clusters) did not affect the SOM outcome in a meaningful way. The SSIntra computed for the 270 LOO subsets followed a Gaussian distribution without any outlier values (Figure 4A). Moreover, the mean was almost identical to the median (92.27 and 92.17, respectively), further indicating that none of the LOO subsets exhibited a statistically relevant differentiated quantization structure.

Bottom Line: SOM is a pattern recognition method which clusterizes and reduces the dimensionality of input EEMs without relying on any assumption about the data structure.According to our results, chemical industry effluents appeared to have unique and distinctive spectral characteristics.We conclude that SOM coupled with a correlation analysis procedure is a promising tool for studying large and heterogeneous EEM data sets.

View Article: PubMed Central - PubMed

Affiliation: Departament d'Ecologia, Facultat de Biologia, Universitat de Barcelona, Barcelona, Catalunya, Spain.

ABSTRACT
Dissolved organic matter (DOM) is a complex mixture of organic compounds, ubiquitous in marine and freshwater systems. Fluorescence spectroscopy, by means of Excitation-Emission Matrices (EEM), has become an indispensable tool to study DOM sources, transport and fate in aquatic ecosystems. However the statistical treatment of large and heterogeneous EEM data sets still represents an important challenge for biogeochemists. Recently, Self-Organising Maps (SOM) has been proposed as a tool to explore patterns in large EEM data sets. SOM is a pattern recognition method which clusterizes and reduces the dimensionality of input EEMs without relying on any assumption about the data structure. In this paper, we show how SOM, coupled with a correlation analysis of the component planes, can be used both to explore patterns among samples, as well as to identify individual fluorescence components. We analysed a large and heterogeneous EEM data set, including samples from a river catchment collected under a range of hydrological conditions, along a 60-km downstream gradient, and under the influence of different degrees of anthropogenic impact. According to our results, chemical industry effluents appeared to have unique and distinctive spectral characteristics. On the other hand, river samples collected under flash flood conditions showed homogeneous EEM shapes. The correlation analysis of the component planes suggested the presence of four fluorescence components, consistent with DOM components previously described in the literature. A remarkable strength of this methodology was that outlier samples appeared naturally integrated in the analysis. We conclude that SOM coupled with a correlation analysis procedure is a promising tool for studying large and heterogeneous EEM data sets.

Show MeSH
Related in: MedlinePlus