Limits...
Some considerations for analyzing biodiversity using integrative metagenomics and gene networks.

Bittner L, Halary S, Payri C, Cruaud C, de Reviers B, Lopez P, Bapteste E - Biol. Direct (2010)

Bottom Line: We reached these conclusions through a comparison of the theoretical foundations of two molecular approaches seeking to assess biodiversity: metagenomics (mostly used on prokaryotes and protists) and DNA barcoding (mostly used on multicellular eukaryotes), and by pragmatic considerations of the issues caused by the 'species problem' in biodiversity studies.Evolutionary gene networks reduce the risk of producing biodiversity estimates with limited explanatory power, biased either by unequal rates of LGT, or difficult to interpret due to (practical) problems caused by type I and type II grey zones.Moreover, these networks would easily accommodate additional (meta)transcriptomic and (meta)proteomic data.

View Article: PubMed Central - HTML - PubMed

Affiliation: UMR CNRS 7138 Systématique, Adaptation, Evolution, Université Pierre et Marie Curie, Paris, France.

ABSTRACT

Background: Improving knowledge of biodiversity will benefit conservation biology, enhance bioremediation studies, and could lead to new medical treatments. However there is no standard approach to estimate and to compare the diversity of different environments, or to study its past, and possibly, future evolution.

Presentation of the hypothesis: We argue that there are two conditions for significant progress in the identification and quantification of biodiversity. First, integrative metagenomic studies - aiming at the simultaneous examination (or even better at the integration) of observations about the elements, functions and evolutionary processes captured by the massive sequencing of multiple markers - should be preferred over DNA barcoding projects and over metagenomic projects based on a single marker. Second, such metagenomic data should be studied with novel inclusive network-based approaches, designed to draw inferences both on the many units and on the many processes present in the environments.

Testing the hypothesis: We reached these conclusions through a comparison of the theoretical foundations of two molecular approaches seeking to assess biodiversity: metagenomics (mostly used on prokaryotes and protists) and DNA barcoding (mostly used on multicellular eukaryotes), and by pragmatic considerations of the issues caused by the 'species problem' in biodiversity studies.

Implications of the hypothesis: Evolutionary gene networks reduce the risk of producing biodiversity estimates with limited explanatory power, biased either by unequal rates of LGT, or difficult to interpret due to (practical) problems caused by type I and type II grey zones. Moreover, these networks would easily accommodate additional (meta)transcriptomic and (meta)proteomic data.

Show MeSH
Histograms of the frequency of p-distances for CO1 and psbA in a Corallinales Dataset. A. Results for the CO1 dataset: the horizontal axis represents the pairwise sequence divergence (p-distances) for the specimens of a given class of frequency; the vertical axis corresponds to the number of pairs of specimens of each class. 'n' indicates the number of specimens sampled for a given locality. Barcode gaps are indicated by a star. Inferred interspecific distances are reported in green, inferred intraspecific distances are reported in red. B. Results for the psbA dataset. Same legend. On the global sampling, no barcode gap can be defined. Several discontinuities exist in the distribution, as represented by the grey area. When more data are included (data not shown), the barcode gap disappears.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2921367&req=5

Figure 2: Histograms of the frequency of p-distances for CO1 and psbA in a Corallinales Dataset. A. Results for the CO1 dataset: the horizontal axis represents the pairwise sequence divergence (p-distances) for the specimens of a given class of frequency; the vertical axis corresponds to the number of pairs of specimens of each class. 'n' indicates the number of specimens sampled for a given locality. Barcode gaps are indicated by a star. Inferred interspecific distances are reported in green, inferred intraspecific distances are reported in red. B. Results for the psbA dataset. Same legend. On the global sampling, no barcode gap can be defined. Several discontinuities exist in the distribution, as represented by the grey area. When more data are included (data not shown), the barcode gap disappears.

Mentions: For 206 specimens sharing these two markers, BCG[78] and MYC[53] methods proposed inconsistent method-, locality- and gene-dependent estimates of the number of Corallinales species present in the dataset. Methodological biases and artefacts (e.g. the use of an incorrect ultrametric tree in the MYC approach or of a wrong model of evolution) can for sure explain some of the disagreement between methods (inter-approach pluralism). Yet, even for a given method the two markers generally returned incompatible estimates (Table 1). The closest assessments between CO1 and psbA presented an average of 45% of groups with different specimen contents. This intra-approach pluralism is problematic because it was impossible to determine whether and which of these incompatible groups may correspond to a unified 'species'. Each group had a lower degree of genetic diversity than that reported as bona fide intra-specific distance in previous studies[87-89]. All showed a comparable coherence in terms of monophyly and morphology, and a similar lack of geographical coherence (data not shown). Partitioning the dataset by sampling sites also had a dramatic effect on biodiversity analyses (Table 2). For both markers, histograms of p-distances comprising the entire dataset showed no clear gap, while every site specific sub-sample presented a gap, seemingly defining an unambiguous limit for intra- and inter- genetic diversity (Figure 2). However, the genetic distances inferred from each site to define a species were highly variable. Problematically, between localities, some inter-specific distances overlapped with intra-specific distances (type I grey zone), and sometimes conflicted (type II grey zone)(Table 2). No standard threshold to define Corallinales species with CO1 or psbA could be proposed.


Some considerations for analyzing biodiversity using integrative metagenomics and gene networks.

Bittner L, Halary S, Payri C, Cruaud C, de Reviers B, Lopez P, Bapteste E - Biol. Direct (2010)

Histograms of the frequency of p-distances for CO1 and psbA in a Corallinales Dataset. A. Results for the CO1 dataset: the horizontal axis represents the pairwise sequence divergence (p-distances) for the specimens of a given class of frequency; the vertical axis corresponds to the number of pairs of specimens of each class. 'n' indicates the number of specimens sampled for a given locality. Barcode gaps are indicated by a star. Inferred interspecific distances are reported in green, inferred intraspecific distances are reported in red. B. Results for the psbA dataset. Same legend. On the global sampling, no barcode gap can be defined. Several discontinuities exist in the distribution, as represented by the grey area. When more data are included (data not shown), the barcode gap disappears.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2921367&req=5

Figure 2: Histograms of the frequency of p-distances for CO1 and psbA in a Corallinales Dataset. A. Results for the CO1 dataset: the horizontal axis represents the pairwise sequence divergence (p-distances) for the specimens of a given class of frequency; the vertical axis corresponds to the number of pairs of specimens of each class. 'n' indicates the number of specimens sampled for a given locality. Barcode gaps are indicated by a star. Inferred interspecific distances are reported in green, inferred intraspecific distances are reported in red. B. Results for the psbA dataset. Same legend. On the global sampling, no barcode gap can be defined. Several discontinuities exist in the distribution, as represented by the grey area. When more data are included (data not shown), the barcode gap disappears.
Mentions: For 206 specimens sharing these two markers, BCG[78] and MYC[53] methods proposed inconsistent method-, locality- and gene-dependent estimates of the number of Corallinales species present in the dataset. Methodological biases and artefacts (e.g. the use of an incorrect ultrametric tree in the MYC approach or of a wrong model of evolution) can for sure explain some of the disagreement between methods (inter-approach pluralism). Yet, even for a given method the two markers generally returned incompatible estimates (Table 1). The closest assessments between CO1 and psbA presented an average of 45% of groups with different specimen contents. This intra-approach pluralism is problematic because it was impossible to determine whether and which of these incompatible groups may correspond to a unified 'species'. Each group had a lower degree of genetic diversity than that reported as bona fide intra-specific distance in previous studies[87-89]. All showed a comparable coherence in terms of monophyly and morphology, and a similar lack of geographical coherence (data not shown). Partitioning the dataset by sampling sites also had a dramatic effect on biodiversity analyses (Table 2). For both markers, histograms of p-distances comprising the entire dataset showed no clear gap, while every site specific sub-sample presented a gap, seemingly defining an unambiguous limit for intra- and inter- genetic diversity (Figure 2). However, the genetic distances inferred from each site to define a species were highly variable. Problematically, between localities, some inter-specific distances overlapped with intra-specific distances (type I grey zone), and sometimes conflicted (type II grey zone)(Table 2). No standard threshold to define Corallinales species with CO1 or psbA could be proposed.

Bottom Line: We reached these conclusions through a comparison of the theoretical foundations of two molecular approaches seeking to assess biodiversity: metagenomics (mostly used on prokaryotes and protists) and DNA barcoding (mostly used on multicellular eukaryotes), and by pragmatic considerations of the issues caused by the 'species problem' in biodiversity studies.Evolutionary gene networks reduce the risk of producing biodiversity estimates with limited explanatory power, biased either by unequal rates of LGT, or difficult to interpret due to (practical) problems caused by type I and type II grey zones.Moreover, these networks would easily accommodate additional (meta)transcriptomic and (meta)proteomic data.

View Article: PubMed Central - HTML - PubMed

Affiliation: UMR CNRS 7138 Systématique, Adaptation, Evolution, Université Pierre et Marie Curie, Paris, France.

ABSTRACT

Background: Improving knowledge of biodiversity will benefit conservation biology, enhance bioremediation studies, and could lead to new medical treatments. However there is no standard approach to estimate and to compare the diversity of different environments, or to study its past, and possibly, future evolution.

Presentation of the hypothesis: We argue that there are two conditions for significant progress in the identification and quantification of biodiversity. First, integrative metagenomic studies - aiming at the simultaneous examination (or even better at the integration) of observations about the elements, functions and evolutionary processes captured by the massive sequencing of multiple markers - should be preferred over DNA barcoding projects and over metagenomic projects based on a single marker. Second, such metagenomic data should be studied with novel inclusive network-based approaches, designed to draw inferences both on the many units and on the many processes present in the environments.

Testing the hypothesis: We reached these conclusions through a comparison of the theoretical foundations of two molecular approaches seeking to assess biodiversity: metagenomics (mostly used on prokaryotes and protists) and DNA barcoding (mostly used on multicellular eukaryotes), and by pragmatic considerations of the issues caused by the 'species problem' in biodiversity studies.

Implications of the hypothesis: Evolutionary gene networks reduce the risk of producing biodiversity estimates with limited explanatory power, biased either by unequal rates of LGT, or difficult to interpret due to (practical) problems caused by type I and type II grey zones. Moreover, these networks would easily accommodate additional (meta)transcriptomic and (meta)proteomic data.

Show MeSH