Limits...
Are 100 enough? Inferring acanthomorph teleost phylogeny using Anchored Hybrid Enrichment.

Eytan RI, Evans BR, Dornburg A, Lemmon AR, Lemmon EM, Wainwright PC, Near TJ - BMC Evol. Biol. (2015)

Bottom Line: However, many nodes in the phylogeny associated with the early diversification of Ovalentaria are poorly resolved in several analyses.Through the use of rarefaction curves we show that limited phylogenetic resolution among the earliest nodes in the Ovalentaria phylogeny does not appear to be due to a deficiency of data, as average global node support ceases to increase when only 1/3rd of the sampled loci are used in analyses.Although it does not appear that the limited phylogenetic resolution among the earliest nodes in the Ovalentaria phylogeny is due to a deficiency of data, it may be that both stochastic and systematic error resulting from model misspecification play a role in the poor resolution at the base of the Ovalentaria tree as a Bayesian approach was able to resolve some of the deeper nodes, where the other methods failed.

View Article: PubMed Central - PubMed

Affiliation: Department of Ecology & Evolutionary Biology and Peabody Museum of Natural History, Yale University, New Haven, 06520, CT, USA. eytanr@tamug.edu.

ABSTRACT

Background: The past decade has witnessed remarkable progress towards resolution of the Tree of Life. However, despite the increased use of genomic scale datasets, some phylogenetic relationships remain difficult to resolve. Here we employ anchored phylogenomics to capture 107 nuclear loci in 29 species of acanthomorph teleost fishes, with 25 of these species sampled from the recently delimited clade Ovalentaria. Previous studies employing multilocus nuclear exon datasets have not been able to resolve the nodes at the base of the Ovalentaria tree with confidence. Here we test whether a phylogenomic approach will provide better support for these nodes, and if not, why this may be.

Results: After using a novel method to account for paralogous loci, we estimated phylogenies with maximum likelihood and species tree methods using DNA sequence alignments of over 80,000 base pairs. Several key relationships within Ovalentaria are well resolved, including 1) the sister taxon relationship between Cichlidae and Pholidichthys, 2) a clade containing blennies, grammas, clingfishes, and jawfishes, and 3) monophyly of Atherinomorpha (topminnows, flyingfishes, and silversides). However, many nodes in the phylogeny associated with the early diversification of Ovalentaria are poorly resolved in several analyses. Through the use of rarefaction curves we show that limited phylogenetic resolution among the earliest nodes in the Ovalentaria phylogeny does not appear to be due to a deficiency of data, as average global node support ceases to increase when only 1/3rd of the sampled loci are used in analyses. Instead this lack of resolution may be driven by model misspecification as a Bayesian mixed model analysis of the amino acid dataset provided good support for parts of the base of the Ovalentaria tree.

Conclusions: Although it does not appear that the limited phylogenetic resolution among the earliest nodes in the Ovalentaria phylogeny is due to a deficiency of data, it may be that both stochastic and systematic error resulting from model misspecification play a role in the poor resolution at the base of the Ovalentaria tree as a Bayesian approach was able to resolve some of the deeper nodes, where the other methods failed.

Show MeSH
Concatenated maximum likelihood phylogeny inferred using RAxML, from the full 29 species, 107 locus dataset, partitioned by codon position. Shapes and colored circles represent bootstrap support for a given node. Higher-level named clades are noted. Percent GC of third codon positions is listed for each species. Note that Pseudochromidae is not a clade
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4465735&req=5

Fig2: Concatenated maximum likelihood phylogeny inferred using RAxML, from the full 29 species, 107 locus dataset, partitioned by codon position. Shapes and colored circles represent bootstrap support for a given node. Higher-level named clades are noted. Percent GC of third codon positions is listed for each species. Note that Pseudochromidae is not a clade

Mentions: After removal of all paralogous copies there was 107 loci, totaling 82,782 bp of DNA sequence data (Table 2). In nine cases we used both copies of a particular locus. The full matrix contained 43 % variable sites, and third codon positions comprised 67 % of the variable sites (Table 2). There was a clear bias away from adenine residues at all codon positions. GC%, without accounting for ambiguities is 47.3 %. When accounting for ambiguities, GC% is 52.7 %. G-C skew is−0.051. There was no clear pattern of GC bias in third codon positions (Fig. 2). The compositional homogeneity test implemented in PhyloBayes did not indicate compositional heterogeneity (p = 0.11). The principal component analysis (PCA) of the amino acid frequencies did not point to compositional artifacts (not shown). We removed Pholidicthys from the PCA because of its large amount of missing data. The full data matrix is available on Dryad (accession pending).Table 2


Are 100 enough? Inferring acanthomorph teleost phylogeny using Anchored Hybrid Enrichment.

Eytan RI, Evans BR, Dornburg A, Lemmon AR, Lemmon EM, Wainwright PC, Near TJ - BMC Evol. Biol. (2015)

Concatenated maximum likelihood phylogeny inferred using RAxML, from the full 29 species, 107 locus dataset, partitioned by codon position. Shapes and colored circles represent bootstrap support for a given node. Higher-level named clades are noted. Percent GC of third codon positions is listed for each species. Note that Pseudochromidae is not a clade
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4465735&req=5

Fig2: Concatenated maximum likelihood phylogeny inferred using RAxML, from the full 29 species, 107 locus dataset, partitioned by codon position. Shapes and colored circles represent bootstrap support for a given node. Higher-level named clades are noted. Percent GC of third codon positions is listed for each species. Note that Pseudochromidae is not a clade
Mentions: After removal of all paralogous copies there was 107 loci, totaling 82,782 bp of DNA sequence data (Table 2). In nine cases we used both copies of a particular locus. The full matrix contained 43 % variable sites, and third codon positions comprised 67 % of the variable sites (Table 2). There was a clear bias away from adenine residues at all codon positions. GC%, without accounting for ambiguities is 47.3 %. When accounting for ambiguities, GC% is 52.7 %. G-C skew is−0.051. There was no clear pattern of GC bias in third codon positions (Fig. 2). The compositional homogeneity test implemented in PhyloBayes did not indicate compositional heterogeneity (p = 0.11). The principal component analysis (PCA) of the amino acid frequencies did not point to compositional artifacts (not shown). We removed Pholidicthys from the PCA because of its large amount of missing data. The full data matrix is available on Dryad (accession pending).Table 2

Bottom Line: However, many nodes in the phylogeny associated with the early diversification of Ovalentaria are poorly resolved in several analyses.Through the use of rarefaction curves we show that limited phylogenetic resolution among the earliest nodes in the Ovalentaria phylogeny does not appear to be due to a deficiency of data, as average global node support ceases to increase when only 1/3rd of the sampled loci are used in analyses.Although it does not appear that the limited phylogenetic resolution among the earliest nodes in the Ovalentaria phylogeny is due to a deficiency of data, it may be that both stochastic and systematic error resulting from model misspecification play a role in the poor resolution at the base of the Ovalentaria tree as a Bayesian approach was able to resolve some of the deeper nodes, where the other methods failed.

View Article: PubMed Central - PubMed

Affiliation: Department of Ecology & Evolutionary Biology and Peabody Museum of Natural History, Yale University, New Haven, 06520, CT, USA. eytanr@tamug.edu.

ABSTRACT

Background: The past decade has witnessed remarkable progress towards resolution of the Tree of Life. However, despite the increased use of genomic scale datasets, some phylogenetic relationships remain difficult to resolve. Here we employ anchored phylogenomics to capture 107 nuclear loci in 29 species of acanthomorph teleost fishes, with 25 of these species sampled from the recently delimited clade Ovalentaria. Previous studies employing multilocus nuclear exon datasets have not been able to resolve the nodes at the base of the Ovalentaria tree with confidence. Here we test whether a phylogenomic approach will provide better support for these nodes, and if not, why this may be.

Results: After using a novel method to account for paralogous loci, we estimated phylogenies with maximum likelihood and species tree methods using DNA sequence alignments of over 80,000 base pairs. Several key relationships within Ovalentaria are well resolved, including 1) the sister taxon relationship between Cichlidae and Pholidichthys, 2) a clade containing blennies, grammas, clingfishes, and jawfishes, and 3) monophyly of Atherinomorpha (topminnows, flyingfishes, and silversides). However, many nodes in the phylogeny associated with the early diversification of Ovalentaria are poorly resolved in several analyses. Through the use of rarefaction curves we show that limited phylogenetic resolution among the earliest nodes in the Ovalentaria phylogeny does not appear to be due to a deficiency of data, as average global node support ceases to increase when only 1/3rd of the sampled loci are used in analyses. Instead this lack of resolution may be driven by model misspecification as a Bayesian mixed model analysis of the amino acid dataset provided good support for parts of the base of the Ovalentaria tree.

Conclusions: Although it does not appear that the limited phylogenetic resolution among the earliest nodes in the Ovalentaria phylogeny is due to a deficiency of data, it may be that both stochastic and systematic error resulting from model misspecification play a role in the poor resolution at the base of the Ovalentaria tree as a Bayesian approach was able to resolve some of the deeper nodes, where the other methods failed.

Show MeSH