Limits...
Separating the wheat from the chaff: mitigating the effects of noise in a plastome phylogenomic data set from Pinus L. (Pinaceae).

Parks M, Cronn R, Liston A - BMC Evol. Biol. (2012)

Bottom Line: Through next-generation sequencing, the amount of sequence data potentially available for phylogenetic analyses has increased exponentially in recent years.Simultaneously, the risk of incorporating 'noisy' data with misleading phylogenetic signal has also increased, and may disproportionately influence the topology of weakly supported nodes and lineages featuring rapid radiations and/or elevated rates of evolution.Similar trends were observed using long-branch exclusion, but patterns were neither as strong nor as clear.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Botany and Plant Pathology, Oregon State University, Corvallis, OR 97331-2902, USA. parksma@science.oregonstate.edu

ABSTRACT

Background: Through next-generation sequencing, the amount of sequence data potentially available for phylogenetic analyses has increased exponentially in recent years. Simultaneously, the risk of incorporating 'noisy' data with misleading phylogenetic signal has also increased, and may disproportionately influence the topology of weakly supported nodes and lineages featuring rapid radiations and/or elevated rates of evolution.

Results: We investigated the influence of phylogenetic noise in large data sets by applying two fundamental strategies, variable site removal and long-branch exclusion, to the phylogenetic analysis of a full plastome alignment of 107 species of Pinus and six Pinaceae outgroups. While high overall phylogenetic resolution resulted from inclusion of all data, three historically recalcitrant nodes remained conflicted with previous analyses. Close investigation of these nodes revealed dramatically different responses to data removal. Whereas topological resolution and bootstrap support for two clades peaked with removal of highly variable sites, the third clade resolved most strongly when all sites were included. Similar trends were observed using long-branch exclusion, but patterns were neither as strong nor as clear. When compared to previous phylogenetic analyses of nuclear loci and morphological data, the most highly supported topologies seen in Pinus plastome analysis are congruent for the two clades gaining support from variable site removal and long-branch exclusion, but in conflict for the clade with highest support from the full data set.

Conclusions: These results suggest that removal of misleading signal in phylogenomic datasets can result not only in increased resolution for poorly supported nodes, but may serve as a tool for identifying erroneous yet highly supported topologies. For Pinus chloroplast genomes, removal of variable sites appears to be more effective than long-branch exclusion for clarifying phylogenetic hypotheses.

Show MeSH

Related in: MedlinePlus

Phylogenetic hypotheses for genus Pinus . Alternate placements (indicated by dashed lines) of subsections Contortae and Krempfianae, as well as the clade consisting of Pinus merkusii  and P. latteri  are shown. The most common plastid-based resolution of these groups is indicated by gray shading. Tree topology and relative branch lengths reflective of data from Gernandt et al.[42], Parks et al.[31] and this study.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3475122&req=5

Figure 1: Phylogenetic hypotheses for genus Pinus . Alternate placements (indicated by dashed lines) of subsections Contortae and Krempfianae, as well as the clade consisting of Pinus merkusii and P. latteri are shown. The most common plastid-based resolution of these groups is indicated by gray shading. Tree topology and relative branch lengths reflective of data from Gernandt et al.[42], Parks et al.[31] and this study.

Mentions: The genus Pinus, consists of ca. 110 species distributed primarily throughout the northern hemisphere, and contains evolutionary patterns ranging from deep divergence events to apparent rapid and relatively shallow radiations. In addition, the moderate size of the genus facilitates thorough taxon sampling. Pinus is represented by a relatively well-documented fossil record reaching back over 100 million years[38-40] and has been the focus of a large body of phylogenetic work, including studies based in morphology[41-45], crossability[41,46-48] and molecular data, including restriction fragment analyses[49,50] and both nuclear[51-54] and chloroplast sequence data[31,42,44,55-59]. The most recent molecular systematic treatment of Pinus[42] recovered a well-supported systematic framework consisting of two subgenera (Pinus and Strobus ), four sections (sections Pinus and Trifoliae in subgenus Pinus, sections Parrya and Quinquefoliae in subgenus Strobus ) and 11 subsections (Figure 1) that is widely accepted today. However, while nearly complete plastome sequences for a subset of pine species support this framework and result in increased resolution across much of the genus[31], there remain a number of taxa with poor resolution and/or incongruence between chloroplast-based and nuclear- or morphology-based analyses. In particular, subsections Krempfianae and Contortae , as well as a clade of the two closely related species Pinus merkusii and P. latteri each demonstrate these conflicts (Figure 1). In the present study, we investigated poor and conflicting resolutions in these clades using highly variable alignment positions and long-branches as proxies for phylogenetic noise. Sequential removal of variable sites and long branches was applied to the phylogenetic analysis of a full-plastome alignment which included most of the world’s pine species and several Pinaceae outgroups. While responses to these treatments differed between these three clades, each case provided insight into both the general patterns of response to noise removal in a phylogenomic dataset as well as specific characteristics of the plastid-based Pinus evolutionary history.


Separating the wheat from the chaff: mitigating the effects of noise in a plastome phylogenomic data set from Pinus L. (Pinaceae).

Parks M, Cronn R, Liston A - BMC Evol. Biol. (2012)

Phylogenetic hypotheses for genus Pinus . Alternate placements (indicated by dashed lines) of subsections Contortae and Krempfianae, as well as the clade consisting of Pinus merkusii  and P. latteri  are shown. The most common plastid-based resolution of these groups is indicated by gray shading. Tree topology and relative branch lengths reflective of data from Gernandt et al.[42], Parks et al.[31] and this study.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3475122&req=5

Figure 1: Phylogenetic hypotheses for genus Pinus . Alternate placements (indicated by dashed lines) of subsections Contortae and Krempfianae, as well as the clade consisting of Pinus merkusii and P. latteri are shown. The most common plastid-based resolution of these groups is indicated by gray shading. Tree topology and relative branch lengths reflective of data from Gernandt et al.[42], Parks et al.[31] and this study.
Mentions: The genus Pinus, consists of ca. 110 species distributed primarily throughout the northern hemisphere, and contains evolutionary patterns ranging from deep divergence events to apparent rapid and relatively shallow radiations. In addition, the moderate size of the genus facilitates thorough taxon sampling. Pinus is represented by a relatively well-documented fossil record reaching back over 100 million years[38-40] and has been the focus of a large body of phylogenetic work, including studies based in morphology[41-45], crossability[41,46-48] and molecular data, including restriction fragment analyses[49,50] and both nuclear[51-54] and chloroplast sequence data[31,42,44,55-59]. The most recent molecular systematic treatment of Pinus[42] recovered a well-supported systematic framework consisting of two subgenera (Pinus and Strobus ), four sections (sections Pinus and Trifoliae in subgenus Pinus, sections Parrya and Quinquefoliae in subgenus Strobus ) and 11 subsections (Figure 1) that is widely accepted today. However, while nearly complete plastome sequences for a subset of pine species support this framework and result in increased resolution across much of the genus[31], there remain a number of taxa with poor resolution and/or incongruence between chloroplast-based and nuclear- or morphology-based analyses. In particular, subsections Krempfianae and Contortae , as well as a clade of the two closely related species Pinus merkusii and P. latteri each demonstrate these conflicts (Figure 1). In the present study, we investigated poor and conflicting resolutions in these clades using highly variable alignment positions and long-branches as proxies for phylogenetic noise. Sequential removal of variable sites and long branches was applied to the phylogenetic analysis of a full-plastome alignment which included most of the world’s pine species and several Pinaceae outgroups. While responses to these treatments differed between these three clades, each case provided insight into both the general patterns of response to noise removal in a phylogenomic dataset as well as specific characteristics of the plastid-based Pinus evolutionary history.

Bottom Line: Through next-generation sequencing, the amount of sequence data potentially available for phylogenetic analyses has increased exponentially in recent years.Simultaneously, the risk of incorporating 'noisy' data with misleading phylogenetic signal has also increased, and may disproportionately influence the topology of weakly supported nodes and lineages featuring rapid radiations and/or elevated rates of evolution.Similar trends were observed using long-branch exclusion, but patterns were neither as strong nor as clear.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Botany and Plant Pathology, Oregon State University, Corvallis, OR 97331-2902, USA. parksma@science.oregonstate.edu

ABSTRACT

Background: Through next-generation sequencing, the amount of sequence data potentially available for phylogenetic analyses has increased exponentially in recent years. Simultaneously, the risk of incorporating 'noisy' data with misleading phylogenetic signal has also increased, and may disproportionately influence the topology of weakly supported nodes and lineages featuring rapid radiations and/or elevated rates of evolution.

Results: We investigated the influence of phylogenetic noise in large data sets by applying two fundamental strategies, variable site removal and long-branch exclusion, to the phylogenetic analysis of a full plastome alignment of 107 species of Pinus and six Pinaceae outgroups. While high overall phylogenetic resolution resulted from inclusion of all data, three historically recalcitrant nodes remained conflicted with previous analyses. Close investigation of these nodes revealed dramatically different responses to data removal. Whereas topological resolution and bootstrap support for two clades peaked with removal of highly variable sites, the third clade resolved most strongly when all sites were included. Similar trends were observed using long-branch exclusion, but patterns were neither as strong nor as clear. When compared to previous phylogenetic analyses of nuclear loci and morphological data, the most highly supported topologies seen in Pinus plastome analysis are congruent for the two clades gaining support from variable site removal and long-branch exclusion, but in conflict for the clade with highest support from the full data set.

Conclusions: These results suggest that removal of misleading signal in phylogenomic datasets can result not only in increased resolution for poorly supported nodes, but may serve as a tool for identifying erroneous yet highly supported topologies. For Pinus chloroplast genomes, removal of variable sites appears to be more effective than long-branch exclusion for clarifying phylogenetic hypotheses.

Show MeSH
Related in: MedlinePlus