Limits...
Separating the wheat from the chaff: mitigating the effects of noise in a plastome phylogenomic data set from Pinus L. (Pinaceae).

Parks M, Cronn R, Liston A - BMC Evol. Biol. (2012)

Bottom Line: Through next-generation sequencing, the amount of sequence data potentially available for phylogenetic analyses has increased exponentially in recent years.Simultaneously, the risk of incorporating 'noisy' data with misleading phylogenetic signal has also increased, and may disproportionately influence the topology of weakly supported nodes and lineages featuring rapid radiations and/or elevated rates of evolution.Similar trends were observed using long-branch exclusion, but patterns were neither as strong nor as clear.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Botany and Plant Pathology, Oregon State University, Corvallis, OR 97331-2902, USA. parksma@science.oregonstate.edu

ABSTRACT

Background: Through next-generation sequencing, the amount of sequence data potentially available for phylogenetic analyses has increased exponentially in recent years. Simultaneously, the risk of incorporating 'noisy' data with misleading phylogenetic signal has also increased, and may disproportionately influence the topology of weakly supported nodes and lineages featuring rapid radiations and/or elevated rates of evolution.

Results: We investigated the influence of phylogenetic noise in large data sets by applying two fundamental strategies, variable site removal and long-branch exclusion, to the phylogenetic analysis of a full plastome alignment of 107 species of Pinus and six Pinaceae outgroups. While high overall phylogenetic resolution resulted from inclusion of all data, three historically recalcitrant nodes remained conflicted with previous analyses. Close investigation of these nodes revealed dramatically different responses to data removal. Whereas topological resolution and bootstrap support for two clades peaked with removal of highly variable sites, the third clade resolved most strongly when all sites were included. Similar trends were observed using long-branch exclusion, but patterns were neither as strong nor as clear. When compared to previous phylogenetic analyses of nuclear loci and morphological data, the most highly supported topologies seen in Pinus plastome analysis are congruent for the two clades gaining support from variable site removal and long-branch exclusion, but in conflict for the clade with highest support from the full data set.

Conclusions: These results suggest that removal of misleading signal in phylogenomic datasets can result not only in increased resolution for poorly supported nodes, but may serve as a tool for identifying erroneous yet highly supported topologies. For Pinus chloroplast genomes, removal of variable sites appears to be more effective than long-branch exclusion for clarifying phylogenetic hypotheses.

Show MeSH

Related in: MedlinePlus

Distribution of bootstrap support values for phylogenetic position of three clades in genus Pinus . a) Bootstrap support values for placement of subsection Krempfianae . Circles correspond to placement of P. krempfii  sister to subsection Gerardianae . b) Bootstrap support values for placement of Pinus merkusii  / P. latteri . Circles correspond to placement of P. merkusii/P. latteri as sister to subsection Pinaster and triangles as sister to subsection Pinus. c) Bootstrap support values for placement of subsection Contortae. Circles correspond to placement of subsection Contortae as sister to subsection Australes  and triangles as basal to both subsections Australes and Contortae. For all charts, filled data points correspond to An partition sizes falling between final decrease of branch score metric values and start of decrease in overall bootstrap support values for An partitions, as shown in Figure 3. Squares represent variable phylogenetic placements not including those represented by circles or triangles. Arrows in b) and c) indicate partition size at which bootstrap support for monophyly of clade falls below 100%.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3475122&req=5

Figure 4: Distribution of bootstrap support values for phylogenetic position of three clades in genus Pinus . a) Bootstrap support values for placement of subsection Krempfianae . Circles correspond to placement of P. krempfii sister to subsection Gerardianae . b) Bootstrap support values for placement of Pinus merkusii / P. latteri . Circles correspond to placement of P. merkusii/P. latteri as sister to subsection Pinaster and triangles as sister to subsection Pinus. c) Bootstrap support values for placement of subsection Contortae. Circles correspond to placement of subsection Contortae as sister to subsection Australes and triangles as basal to both subsections Australes and Contortae. For all charts, filled data points correspond to An partition sizes falling between final decrease of branch score metric values and start of decrease in overall bootstrap support values for An partitions, as shown in Figure 3. Squares represent variable phylogenetic placements not including those represented by circles or triangles. Arrows in b) and c) indicate partition size at which bootstrap support for monophyly of clade falls below 100%.

Mentions: Bootstrap support for the phylogenetic position of P. krempfii was moderate (59–84%) until removal of the most variable 5.7 kbp (ca. A135665), at which point bootstrap values steadily increased until peaking at 97–100% after removal of the most variable 6.3–7.8 kbp (ca. A135065 to A133665) (Figure 4). An partitions greater than 129.4 kbp in size recovered section Quinquefoliae as subsection Strobus + (P. krempfii + subsection Gerardianae); at An partition sizes smaller than this phylogenetic position was variable.


Separating the wheat from the chaff: mitigating the effects of noise in a plastome phylogenomic data set from Pinus L. (Pinaceae).

Parks M, Cronn R, Liston A - BMC Evol. Biol. (2012)

Distribution of bootstrap support values for phylogenetic position of three clades in genus Pinus . a) Bootstrap support values for placement of subsection Krempfianae . Circles correspond to placement of P. krempfii  sister to subsection Gerardianae . b) Bootstrap support values for placement of Pinus merkusii  / P. latteri . Circles correspond to placement of P. merkusii/P. latteri as sister to subsection Pinaster and triangles as sister to subsection Pinus. c) Bootstrap support values for placement of subsection Contortae. Circles correspond to placement of subsection Contortae as sister to subsection Australes  and triangles as basal to both subsections Australes and Contortae. For all charts, filled data points correspond to An partition sizes falling between final decrease of branch score metric values and start of decrease in overall bootstrap support values for An partitions, as shown in Figure 3. Squares represent variable phylogenetic placements not including those represented by circles or triangles. Arrows in b) and c) indicate partition size at which bootstrap support for monophyly of clade falls below 100%.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3475122&req=5

Figure 4: Distribution of bootstrap support values for phylogenetic position of three clades in genus Pinus . a) Bootstrap support values for placement of subsection Krempfianae . Circles correspond to placement of P. krempfii sister to subsection Gerardianae . b) Bootstrap support values for placement of Pinus merkusii / P. latteri . Circles correspond to placement of P. merkusii/P. latteri as sister to subsection Pinaster and triangles as sister to subsection Pinus. c) Bootstrap support values for placement of subsection Contortae. Circles correspond to placement of subsection Contortae as sister to subsection Australes and triangles as basal to both subsections Australes and Contortae. For all charts, filled data points correspond to An partition sizes falling between final decrease of branch score metric values and start of decrease in overall bootstrap support values for An partitions, as shown in Figure 3. Squares represent variable phylogenetic placements not including those represented by circles or triangles. Arrows in b) and c) indicate partition size at which bootstrap support for monophyly of clade falls below 100%.
Mentions: Bootstrap support for the phylogenetic position of P. krempfii was moderate (59–84%) until removal of the most variable 5.7 kbp (ca. A135665), at which point bootstrap values steadily increased until peaking at 97–100% after removal of the most variable 6.3–7.8 kbp (ca. A135065 to A133665) (Figure 4). An partitions greater than 129.4 kbp in size recovered section Quinquefoliae as subsection Strobus + (P. krempfii + subsection Gerardianae); at An partition sizes smaller than this phylogenetic position was variable.

Bottom Line: Through next-generation sequencing, the amount of sequence data potentially available for phylogenetic analyses has increased exponentially in recent years.Simultaneously, the risk of incorporating 'noisy' data with misleading phylogenetic signal has also increased, and may disproportionately influence the topology of weakly supported nodes and lineages featuring rapid radiations and/or elevated rates of evolution.Similar trends were observed using long-branch exclusion, but patterns were neither as strong nor as clear.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Botany and Plant Pathology, Oregon State University, Corvallis, OR 97331-2902, USA. parksma@science.oregonstate.edu

ABSTRACT

Background: Through next-generation sequencing, the amount of sequence data potentially available for phylogenetic analyses has increased exponentially in recent years. Simultaneously, the risk of incorporating 'noisy' data with misleading phylogenetic signal has also increased, and may disproportionately influence the topology of weakly supported nodes and lineages featuring rapid radiations and/or elevated rates of evolution.

Results: We investigated the influence of phylogenetic noise in large data sets by applying two fundamental strategies, variable site removal and long-branch exclusion, to the phylogenetic analysis of a full plastome alignment of 107 species of Pinus and six Pinaceae outgroups. While high overall phylogenetic resolution resulted from inclusion of all data, three historically recalcitrant nodes remained conflicted with previous analyses. Close investigation of these nodes revealed dramatically different responses to data removal. Whereas topological resolution and bootstrap support for two clades peaked with removal of highly variable sites, the third clade resolved most strongly when all sites were included. Similar trends were observed using long-branch exclusion, but patterns were neither as strong nor as clear. When compared to previous phylogenetic analyses of nuclear loci and morphological data, the most highly supported topologies seen in Pinus plastome analysis are congruent for the two clades gaining support from variable site removal and long-branch exclusion, but in conflict for the clade with highest support from the full data set.

Conclusions: These results suggest that removal of misleading signal in phylogenomic datasets can result not only in increased resolution for poorly supported nodes, but may serve as a tool for identifying erroneous yet highly supported topologies. For Pinus chloroplast genomes, removal of variable sites appears to be more effective than long-branch exclusion for clarifying phylogenetic hypotheses.

Show MeSH
Related in: MedlinePlus