Limits...
Ultra-large alignments using phylogeny-aware profiles.

Nguyen NP, Mirarab S, Kumar K, Warnow T - Genome Biol. (2015)

Bottom Line: Many biological questions, including the estimation of deep evolutionary histories and the detection of remote homology between protein sequences, rely upon multiple sequence alignments and phylogenetic trees of large datasets.However, accurate large-scale multiple sequence alignment is very difficult, especially when the dataset contains fragmentary sequences.UPP produces highly accurate alignments for both nucleotide and amino acid sequences, even on ultra-large datasets or datasets containing fragmentary sequences.

View Article: PubMed Central - PubMed

Affiliation: Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, 1206 West Gregory Drive, Urbana, 61801, Illinois, USA. namphuon@illinois.edu.

ABSTRACT
Many biological questions, including the estimation of deep evolutionary histories and the detection of remote homology between protein sequences, rely upon multiple sequence alignments and phylogenetic trees of large datasets. However, accurate large-scale multiple sequence alignment is very difficult, especially when the dataset contains fragmentary sequences. We present UPP, a multiple sequence alignment method that uses a new machine learning technique, the ensemble of hidden Markov models, which we propose here. UPP produces highly accurate alignments for both nucleotide and amino acid sequences, even on ultra-large datasets or datasets containing fragmentary sequences. UPP is available at https://github.com/smirarab/sepp .

No MeSH data available.


Impact of fragmentary sequences on alignment SP-error and tree error. We show average a alignment error and bΔFN error rates for different methods for the ROSE NT 1000M2 datasets, but include results where a percentage of the sequences are made fragmentary, varying the percentage from 0 % to 50 %. Fragmentary sequences have average length 500 (i.e., roughly half the average sequence length for ROSE 1000M2)
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4492008&req=5

Fig3: Impact of fragmentary sequences on alignment SP-error and tree error. We show average a alignment error and bΔFN error rates for different methods for the ROSE NT 1000M2 datasets, but include results where a percentage of the sequences are made fragmentary, varying the percentage from 0 % to 50 %. Fragmentary sequences have average length 500 (i.e., roughly half the average sequence length for ROSE 1000M2)

Mentions: Figure 3 shows the impact of fragmentation in detail. It has results for ROSE NT 1000M2 (a very challenging condition due to high rates of indels and substitutions), with varying levels of fragmentation.Fig. 3


Ultra-large alignments using phylogeny-aware profiles.

Nguyen NP, Mirarab S, Kumar K, Warnow T - Genome Biol. (2015)

Impact of fragmentary sequences on alignment SP-error and tree error. We show average a alignment error and bΔFN error rates for different methods for the ROSE NT 1000M2 datasets, but include results where a percentage of the sequences are made fragmentary, varying the percentage from 0 % to 50 %. Fragmentary sequences have average length 500 (i.e., roughly half the average sequence length for ROSE 1000M2)
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4492008&req=5

Fig3: Impact of fragmentary sequences on alignment SP-error and tree error. We show average a alignment error and bΔFN error rates for different methods for the ROSE NT 1000M2 datasets, but include results where a percentage of the sequences are made fragmentary, varying the percentage from 0 % to 50 %. Fragmentary sequences have average length 500 (i.e., roughly half the average sequence length for ROSE 1000M2)
Mentions: Figure 3 shows the impact of fragmentation in detail. It has results for ROSE NT 1000M2 (a very challenging condition due to high rates of indels and substitutions), with varying levels of fragmentation.Fig. 3

Bottom Line: Many biological questions, including the estimation of deep evolutionary histories and the detection of remote homology between protein sequences, rely upon multiple sequence alignments and phylogenetic trees of large datasets.However, accurate large-scale multiple sequence alignment is very difficult, especially when the dataset contains fragmentary sequences.UPP produces highly accurate alignments for both nucleotide and amino acid sequences, even on ultra-large datasets or datasets containing fragmentary sequences.

View Article: PubMed Central - PubMed

Affiliation: Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, 1206 West Gregory Drive, Urbana, 61801, Illinois, USA. namphuon@illinois.edu.

ABSTRACT
Many biological questions, including the estimation of deep evolutionary histories and the detection of remote homology between protein sequences, rely upon multiple sequence alignments and phylogenetic trees of large datasets. However, accurate large-scale multiple sequence alignment is very difficult, especially when the dataset contains fragmentary sequences. We present UPP, a multiple sequence alignment method that uses a new machine learning technique, the ensemble of hidden Markov models, which we propose here. UPP produces highly accurate alignments for both nucleotide and amino acid sequences, even on ultra-large datasets or datasets containing fragmentary sequences. UPP is available at https://github.com/smirarab/sepp .

No MeSH data available.