Limits...
PhyloMap: an algorithm for visualizing relationships of large sequence data sets and its application to the influenza A virus genome.

Zhang J, Mamlouk AM, Martinetz T, Chang S, Wang J, Hilgenfeld R - BMC Bioinformatics (2011)

Bottom Line: Such a tree can typically only include up to a few hundred sequences.Here we present a new algorithm, "PhyloMap", which combines ordination, vector quantization, and phylogenetic tree construction to give an elegant representation of a large sequence data set.It utilizes the entire data set, minimizes bias, and provides intuitive visualization.

View Article: PubMed Central - HTML - PubMed

Affiliation: Institute of Biochemistry, Center for Structural and Cell Biology in Medicine, University of Lübeck, Ratzeburger Allee 160, 23538 Lübeck, Germany.

Show MeSH

Related in: MedlinePlus

PB1 PhyloMap. (A) The PhyloMap for 4022 PB1 protein sequences. Each spot in the plot corresponds to one sequence, and the first two dimensions represent 41.4% of the total variation. The phylogenetic tree mapped onto the plot is shown in (B). The mapping error is 0.00261. The strain names that stand for the numbers in the plot are shown in the phylogenetic tree in (B). (B) The NJ tree of PB1 protein sequences built using distances inferred by the JTT model, 40 sequences have been selected by PhyloMap as data centers, the other 2 sequences (in bold italics) have been added manually. This tree has been mapped onto the PCoA result as shown in (A). Bootstrap values (1000 replications) for key nodes are shown.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3142226&req=5

Figure 4: PB1 PhyloMap. (A) The PhyloMap for 4022 PB1 protein sequences. Each spot in the plot corresponds to one sequence, and the first two dimensions represent 41.4% of the total variation. The phylogenetic tree mapped onto the plot is shown in (B). The mapping error is 0.00261. The strain names that stand for the numbers in the plot are shown in the phylogenetic tree in (B). (B) The NJ tree of PB1 protein sequences built using distances inferred by the JTT model, 40 sequences have been selected by PhyloMap as data centers, the other 2 sequences (in bold italics) have been added manually. This tree has been mapped onto the PCoA result as shown in (A). Bootstrap values (1000 replications) for key nodes are shown.

Mentions: We have generated the PhyloMap for all influenza A virus internal genes using their protein sequences, i.e. PB2, PB1, PA, NP, M1, M2, NS1, and NS2 (Figures 2, 3, 4, 5, 6, 7, 8, 9, 10 and 11). Figure 2A illustrates the results for the example of the influenza A virus NP gene. The following major lineages can be easily identified: (i), seasonal human H1N1 (as shown by the data points close to "12: A/Taiwan/5072/1999(H1N1)"), (ii), seasonal human H3N2 (as shown by the data points close to "2: A/Waikato/122/2003(H3N2)"), (iii), early human (as shown by the data points close to "15: A/United Kingdom/1/1933(H1N1)"), (iv), classical swine [32] (as shown by the data points close to "26: A/Swine/Wisconsin/163/97(H1N1)", which includes S-OIV), (v), equine (as shown by the data points close to "15: A/United Kingdom/1/1933(H1N1)"), and (vi), avian (as shown by the data points close to "20: A/gray teal/Australia/2/1979(H4N4)"). PhyloMap has successfully captured all major lineages of the influenza A virus NP gene that were shown to exist in a previous study [3] using sequences sampled manually.


PhyloMap: an algorithm for visualizing relationships of large sequence data sets and its application to the influenza A virus genome.

Zhang J, Mamlouk AM, Martinetz T, Chang S, Wang J, Hilgenfeld R - BMC Bioinformatics (2011)

PB1 PhyloMap. (A) The PhyloMap for 4022 PB1 protein sequences. Each spot in the plot corresponds to one sequence, and the first two dimensions represent 41.4% of the total variation. The phylogenetic tree mapped onto the plot is shown in (B). The mapping error is 0.00261. The strain names that stand for the numbers in the plot are shown in the phylogenetic tree in (B). (B) The NJ tree of PB1 protein sequences built using distances inferred by the JTT model, 40 sequences have been selected by PhyloMap as data centers, the other 2 sequences (in bold italics) have been added manually. This tree has been mapped onto the PCoA result as shown in (A). Bootstrap values (1000 replications) for key nodes are shown.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3142226&req=5

Figure 4: PB1 PhyloMap. (A) The PhyloMap for 4022 PB1 protein sequences. Each spot in the plot corresponds to one sequence, and the first two dimensions represent 41.4% of the total variation. The phylogenetic tree mapped onto the plot is shown in (B). The mapping error is 0.00261. The strain names that stand for the numbers in the plot are shown in the phylogenetic tree in (B). (B) The NJ tree of PB1 protein sequences built using distances inferred by the JTT model, 40 sequences have been selected by PhyloMap as data centers, the other 2 sequences (in bold italics) have been added manually. This tree has been mapped onto the PCoA result as shown in (A). Bootstrap values (1000 replications) for key nodes are shown.
Mentions: We have generated the PhyloMap for all influenza A virus internal genes using their protein sequences, i.e. PB2, PB1, PA, NP, M1, M2, NS1, and NS2 (Figures 2, 3, 4, 5, 6, 7, 8, 9, 10 and 11). Figure 2A illustrates the results for the example of the influenza A virus NP gene. The following major lineages can be easily identified: (i), seasonal human H1N1 (as shown by the data points close to "12: A/Taiwan/5072/1999(H1N1)"), (ii), seasonal human H3N2 (as shown by the data points close to "2: A/Waikato/122/2003(H3N2)"), (iii), early human (as shown by the data points close to "15: A/United Kingdom/1/1933(H1N1)"), (iv), classical swine [32] (as shown by the data points close to "26: A/Swine/Wisconsin/163/97(H1N1)", which includes S-OIV), (v), equine (as shown by the data points close to "15: A/United Kingdom/1/1933(H1N1)"), and (vi), avian (as shown by the data points close to "20: A/gray teal/Australia/2/1979(H4N4)"). PhyloMap has successfully captured all major lineages of the influenza A virus NP gene that were shown to exist in a previous study [3] using sequences sampled manually.

Bottom Line: Such a tree can typically only include up to a few hundred sequences.Here we present a new algorithm, "PhyloMap", which combines ordination, vector quantization, and phylogenetic tree construction to give an elegant representation of a large sequence data set.It utilizes the entire data set, minimizes bias, and provides intuitive visualization.

View Article: PubMed Central - HTML - PubMed

Affiliation: Institute of Biochemistry, Center for Structural and Cell Biology in Medicine, University of Lübeck, Ratzeburger Allee 160, 23538 Lübeck, Germany.

Show MeSH
Related in: MedlinePlus