Limits...
Self-organising maps and correlation analysis as a tool to explore patterns in excitation-emission matrix data sets and to discriminate dissolved organic matter fluorescence components.

Ejarque-Gonzalez E, Butturini A - PLoS ONE (2014)

Bottom Line: SOM is a pattern recognition method which clusterizes and reduces the dimensionality of input EEMs without relying on any assumption about the data structure.According to our results, chemical industry effluents appeared to have unique and distinctive spectral characteristics.We conclude that SOM coupled with a correlation analysis procedure is a promising tool for studying large and heterogeneous EEM data sets.

View Article: PubMed Central - PubMed

Affiliation: Departament d'Ecologia, Facultat de Biologia, Universitat de Barcelona, Barcelona, Catalunya, Spain.

ABSTRACT
Dissolved organic matter (DOM) is a complex mixture of organic compounds, ubiquitous in marine and freshwater systems. Fluorescence spectroscopy, by means of Excitation-Emission Matrices (EEM), has become an indispensable tool to study DOM sources, transport and fate in aquatic ecosystems. However the statistical treatment of large and heterogeneous EEM data sets still represents an important challenge for biogeochemists. Recently, Self-Organising Maps (SOM) has been proposed as a tool to explore patterns in large EEM data sets. SOM is a pattern recognition method which clusterizes and reduces the dimensionality of input EEMs without relying on any assumption about the data structure. In this paper, we show how SOM, coupled with a correlation analysis of the component planes, can be used both to explore patterns among samples, as well as to identify individual fluorescence components. We analysed a large and heterogeneous EEM data set, including samples from a river catchment collected under a range of hydrological conditions, along a 60-km downstream gradient, and under the influence of different degrees of anthropogenic impact. According to our results, chemical industry effluents appeared to have unique and distinctive spectral characteristics. On the other hand, river samples collected under flash flood conditions showed homogeneous EEM shapes. The correlation analysis of the component planes suggested the presence of four fluorescence components, consistent with DOM components previously described in the literature. A remarkable strength of this methodology was that outlier samples appeared naturally integrated in the analysis. We conclude that SOM coupled with a correlation analysis procedure is a promising tool for studying large and heterogeneous EEM data sets.

Show MeSH

Related in: MedlinePlus

Experimental setting of the data set.A) Study site within the catchment from which the samples were collected. The river was operationally divided into three reaches: the “headwaters”, the “middle reaches” and the “lowland”. The divisions between segments correspond to the two big bends of Sant Celoni and Fogars de la Selva. B) Hydrogram contextualising the 15 sampling dates. Discharge data were recorded in the gauging station at Fogars de la Selva. Sampling dates were operationally divided into “flood” (Q>4 m3·s−1), “baseflow” (4>Q>1 m3·s−1) and “drought” (Q<1 m3·s−1) categories. As continuous monitoring was interrupted, the discharge on the last sampling date (2013/06/03) was measured individually on that date. All discharge data were provided by the Catalan Water Authority (Agència Catalana de l'Aigua, [24]).
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4048288&req=5

pone-0099618-g001: Experimental setting of the data set.A) Study site within the catchment from which the samples were collected. The river was operationally divided into three reaches: the “headwaters”, the “middle reaches” and the “lowland”. The divisions between segments correspond to the two big bends of Sant Celoni and Fogars de la Selva. B) Hydrogram contextualising the 15 sampling dates. Discharge data were recorded in the gauging station at Fogars de la Selva. Sampling dates were operationally divided into “flood” (Q>4 m3·s−1), “baseflow” (4>Q>1 m3·s−1) and “drought” (Q<1 m3·s−1) categories. As continuous monitoring was interrupted, the discharge on the last sampling date (2013/06/03) was measured individually on that date. All discharge data were provided by the Catalan Water Authority (Agència Catalana de l'Aigua, [24]).

Mentions: Our EEM data set included 270 samples from a Mediterranean river catchment called La Tordera (865 km2), situated to the north-west of Barcelona, Catalunya. The sampling strategy was designed in order to assess the influence of space and hydrology on the EEM spectral shapes. Accordingly, in order to characterise the longitudinal dimension, water samples were collected at 20 sites along the main stem (60 km long). The sites were operationally categorised into three main reaches, referred to as “headwaters”, “middle reaches” and “lowland”, divided by the bends of Sant Celoni and Fogars de la Selva (Figure 1A). Each of these three river reaches has distinctive properties. The “headwaters” section corresponds to a forested catchment area with accentuated slopes and incipient human pressure, the “middle reaches” are characterised by intensive anthropogenic activity, receiving both diffuse inputs from urban activities and point source effluents of waste water treatment plants (WWTPs) and industries; and finally the “lowland” corresponds to a shallow and meandering geomorphology with a lower density of direct anthropogenic effluents. Eleven influent waters were also sampled upstream from the confluence with the main stem. Some of them correspond to natural tributaries with varying degrees of anthropogenic impact, whereas others correspond to WWTPs or effluents from chemical industries.


Self-organising maps and correlation analysis as a tool to explore patterns in excitation-emission matrix data sets and to discriminate dissolved organic matter fluorescence components.

Ejarque-Gonzalez E, Butturini A - PLoS ONE (2014)

Experimental setting of the data set.A) Study site within the catchment from which the samples were collected. The river was operationally divided into three reaches: the “headwaters”, the “middle reaches” and the “lowland”. The divisions between segments correspond to the two big bends of Sant Celoni and Fogars de la Selva. B) Hydrogram contextualising the 15 sampling dates. Discharge data were recorded in the gauging station at Fogars de la Selva. Sampling dates were operationally divided into “flood” (Q>4 m3·s−1), “baseflow” (4>Q>1 m3·s−1) and “drought” (Q<1 m3·s−1) categories. As continuous monitoring was interrupted, the discharge on the last sampling date (2013/06/03) was measured individually on that date. All discharge data were provided by the Catalan Water Authority (Agència Catalana de l'Aigua, [24]).
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4048288&req=5

pone-0099618-g001: Experimental setting of the data set.A) Study site within the catchment from which the samples were collected. The river was operationally divided into three reaches: the “headwaters”, the “middle reaches” and the “lowland”. The divisions between segments correspond to the two big bends of Sant Celoni and Fogars de la Selva. B) Hydrogram contextualising the 15 sampling dates. Discharge data were recorded in the gauging station at Fogars de la Selva. Sampling dates were operationally divided into “flood” (Q>4 m3·s−1), “baseflow” (4>Q>1 m3·s−1) and “drought” (Q<1 m3·s−1) categories. As continuous monitoring was interrupted, the discharge on the last sampling date (2013/06/03) was measured individually on that date. All discharge data were provided by the Catalan Water Authority (Agència Catalana de l'Aigua, [24]).
Mentions: Our EEM data set included 270 samples from a Mediterranean river catchment called La Tordera (865 km2), situated to the north-west of Barcelona, Catalunya. The sampling strategy was designed in order to assess the influence of space and hydrology on the EEM spectral shapes. Accordingly, in order to characterise the longitudinal dimension, water samples were collected at 20 sites along the main stem (60 km long). The sites were operationally categorised into three main reaches, referred to as “headwaters”, “middle reaches” and “lowland”, divided by the bends of Sant Celoni and Fogars de la Selva (Figure 1A). Each of these three river reaches has distinctive properties. The “headwaters” section corresponds to a forested catchment area with accentuated slopes and incipient human pressure, the “middle reaches” are characterised by intensive anthropogenic activity, receiving both diffuse inputs from urban activities and point source effluents of waste water treatment plants (WWTPs) and industries; and finally the “lowland” corresponds to a shallow and meandering geomorphology with a lower density of direct anthropogenic effluents. Eleven influent waters were also sampled upstream from the confluence with the main stem. Some of them correspond to natural tributaries with varying degrees of anthropogenic impact, whereas others correspond to WWTPs or effluents from chemical industries.

Bottom Line: SOM is a pattern recognition method which clusterizes and reduces the dimensionality of input EEMs without relying on any assumption about the data structure.According to our results, chemical industry effluents appeared to have unique and distinctive spectral characteristics.We conclude that SOM coupled with a correlation analysis procedure is a promising tool for studying large and heterogeneous EEM data sets.

View Article: PubMed Central - PubMed

Affiliation: Departament d'Ecologia, Facultat de Biologia, Universitat de Barcelona, Barcelona, Catalunya, Spain.

ABSTRACT
Dissolved organic matter (DOM) is a complex mixture of organic compounds, ubiquitous in marine and freshwater systems. Fluorescence spectroscopy, by means of Excitation-Emission Matrices (EEM), has become an indispensable tool to study DOM sources, transport and fate in aquatic ecosystems. However the statistical treatment of large and heterogeneous EEM data sets still represents an important challenge for biogeochemists. Recently, Self-Organising Maps (SOM) has been proposed as a tool to explore patterns in large EEM data sets. SOM is a pattern recognition method which clusterizes and reduces the dimensionality of input EEMs without relying on any assumption about the data structure. In this paper, we show how SOM, coupled with a correlation analysis of the component planes, can be used both to explore patterns among samples, as well as to identify individual fluorescence components. We analysed a large and heterogeneous EEM data set, including samples from a river catchment collected under a range of hydrological conditions, along a 60-km downstream gradient, and under the influence of different degrees of anthropogenic impact. According to our results, chemical industry effluents appeared to have unique and distinctive spectral characteristics. On the other hand, river samples collected under flash flood conditions showed homogeneous EEM shapes. The correlation analysis of the component planes suggested the presence of four fluorescence components, consistent with DOM components previously described in the literature. A remarkable strength of this methodology was that outlier samples appeared naturally integrated in the analysis. We conclude that SOM coupled with a correlation analysis procedure is a promising tool for studying large and heterogeneous EEM data sets.

Show MeSH
Related in: MedlinePlus