Limits...
InParanoid 8: orthology analysis between 273 proteomes, mostly eukaryotic.

Sonnhammer EL, Östlund G - Nucleic Acids Res. (2014)

Bottom Line: Compared to the previous release, this increases the number of species by 173% and the number of pairwise species comparisons by 650%.In turn, the number of ortholog groups has increased by 423%.We present the contents and usages of InParanoid 8, and a detailed analysis of how the proteome content has changed since the previous release.

View Article: PubMed Central - PubMed

Affiliation: Stockholm Bioinformatics Center, Department of Biochemistry and Biophysics, Stockholm University, Science for Life Laboratory, Box 1031, SE-17121 Solna, Sweden erik.sonnhammer@scilifelab.se.

Show MeSH
Workflow for the parallel 2-pass BLAST procedure used for generating InParanoid 8. BLAST runs are launched for all pairs of proteomes, running both passes in parallel. When both passes are finished, their outputs are validated by checking for truncation or failure to complete. Intra-proteome matches are checked against the proteome sequences to ensure inclusion of all genes. Pass 1 pairs are combined with pass 2 results such that only pairs accepted in pass 1 are kept, but with alignments from pass 2. A failed validation will either lead to a whole proteome rerun for failed/truncated results or individual serial pass2 reruns for pass1 pairs lacking pass2 results.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4383983&req=5

Figure 1: Workflow for the parallel 2-pass BLAST procedure used for generating InParanoid 8. BLAST runs are launched for all pairs of proteomes, running both passes in parallel. When both passes are finished, their outputs are validated by checking for truncation or failure to complete. Intra-proteome matches are checked against the proteome sequences to ensure inclusion of all genes. Pass 1 pairs are combined with pass 2 results such that only pairs accepted in pass 1 are kept, but with alignments from pass 2. A failed validation will either lead to a whole proteome rerun for failed/truncated results or individual serial pass2 reruns for pass1 pairs lacking pass2 results.

Mentions: In order to improve computational throughput, we ran both BLAST passes in parallel and after both were done, extracted matches from pass 2 for homologs found in pass 1; see Figure 1. This can also save total real runtime as only two BLAST runs are launched instead of thousands of tiny runs per species comparison, which causes a lot of input/output (I/O) overhead. There are some drawbacks however: the pass 2 computation and results become much larger, and the infrastructure and work required to synchronize, supervise and load balance the increased number of computational jobs is considerable. We opted for this solution mainly because it offers a higher degree of parallelization.


InParanoid 8: orthology analysis between 273 proteomes, mostly eukaryotic.

Sonnhammer EL, Östlund G - Nucleic Acids Res. (2014)

Workflow for the parallel 2-pass BLAST procedure used for generating InParanoid 8. BLAST runs are launched for all pairs of proteomes, running both passes in parallel. When both passes are finished, their outputs are validated by checking for truncation or failure to complete. Intra-proteome matches are checked against the proteome sequences to ensure inclusion of all genes. Pass 1 pairs are combined with pass 2 results such that only pairs accepted in pass 1 are kept, but with alignments from pass 2. A failed validation will either lead to a whole proteome rerun for failed/truncated results or individual serial pass2 reruns for pass1 pairs lacking pass2 results.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4383983&req=5

Figure 1: Workflow for the parallel 2-pass BLAST procedure used for generating InParanoid 8. BLAST runs are launched for all pairs of proteomes, running both passes in parallel. When both passes are finished, their outputs are validated by checking for truncation or failure to complete. Intra-proteome matches are checked against the proteome sequences to ensure inclusion of all genes. Pass 1 pairs are combined with pass 2 results such that only pairs accepted in pass 1 are kept, but with alignments from pass 2. A failed validation will either lead to a whole proteome rerun for failed/truncated results or individual serial pass2 reruns for pass1 pairs lacking pass2 results.
Mentions: In order to improve computational throughput, we ran both BLAST passes in parallel and after both were done, extracted matches from pass 2 for homologs found in pass 1; see Figure 1. This can also save total real runtime as only two BLAST runs are launched instead of thousands of tiny runs per species comparison, which causes a lot of input/output (I/O) overhead. There are some drawbacks however: the pass 2 computation and results become much larger, and the infrastructure and work required to synchronize, supervise and load balance the increased number of computational jobs is considerable. We opted for this solution mainly because it offers a higher degree of parallelization.

Bottom Line: Compared to the previous release, this increases the number of species by 173% and the number of pairwise species comparisons by 650%.In turn, the number of ortholog groups has increased by 423%.We present the contents and usages of InParanoid 8, and a detailed analysis of how the proteome content has changed since the previous release.

View Article: PubMed Central - PubMed

Affiliation: Stockholm Bioinformatics Center, Department of Biochemistry and Biophysics, Stockholm University, Science for Life Laboratory, Box 1031, SE-17121 Solna, Sweden erik.sonnhammer@scilifelab.se.

Show MeSH