Limits...
PAVE: program for assembling and viewing ESTs.

Soderlund C, Johnson E, Bomhoff M, Descour A - BMC Genomics (2009)

Bottom Line: The new 454 technology has the benefit of high-throughput expression profiling, but introduces time and space problems for assembling large contigs.A Java viewer program is provided for display and analysis of the results.The assembly software, data management software, Java viewer and user's guide are freely available.

View Article: PubMed Central - HTML - PubMed

Affiliation: BIO5 Institute, University of Arizona, Tucson, AZ 85721, USA. cari@agcol.arizona.edu

ABSTRACT

Background: New sequencing technologies are rapidly emerging. Many laboratories are simultaneously working with the traditional Sanger ESTs and experimenting with ESTs generated by the 454 Life Science sequencers. Though Sanger ESTs have been used to generate contigs for many years, no program takes full advantage of the 5' and 3' mate-pair information, hence, many tentative transcripts are assembled into two separate contigs. The new 454 technology has the benefit of high-throughput expression profiling, but introduces time and space problems for assembling large contigs.

Results: The PAVE (Program for Assembling and Viewing ESTs) assembler takes advantage of the 5' and 3' mate-pair information by requiring that the mate-pairs be assembled into the same contig and joined by n's if the two sub-contigs do not overlap. It handles the depth of 454 data sets by "burying" similar ESTs during assembly, which retains the expression level information while circumventing time and space problems. PAVE uses MegaBLAST for the clustering step and CAP3 for assembly, however it assembles incrementally to enforce the mate-pair constraint, bury ESTs, and reduce incorrect joins and splits. The PAVE data management system uses a MySQL database to store multiple libraries of ESTs along with their metadata; the management system allows multiple assemblies with variations on libraries and parameters. Analysis routines provide standard annotation for the contigs including a measure of differentially expressed genes across the libraries. A Java viewer program is provided for display and analysis of the results. Our results clearly show the benefit of using the PAVE assembler to explicitly use mate-pair information and bury ESTs for large contigs.

Conclusion: The PAVE assembler provides a software package for assembling Sanger and/or 454 ESTs. The assembly software, data management software, Java viewer and user's guide are freely available.

Show MeSH
A schema of the PAVE assembly algorithm. The TC (transitive closure) loop is generally executed multiple times in order to merge contigs that have similar CCSs (contig consensus sequences). The user defines how many times the loop is executed, where for each loop a different set of parameters can be used. If the algorithm is being executed on a multi-processor machine, the user can request that the TC step use multiple processors.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2748094&req=5

Figure 1: A schema of the PAVE assembly algorithm. The TC (transitive closure) loop is generally executed multiple times in order to merge contigs that have similar CCSs (contig consensus sequences). The user defines how many times the loop is executed, where for each loop a different set of parameters can be used. If the algorithm is being executed on a multi-processor machine, the user can request that the TC step use multiple processors.

Mentions: As shown in Figure 1, the PAVE algorithm has a clique (fully connected graph) step, followed by one or more transitive closure (TC; connected graph) steps. The PAVE algorithm uses MegaBLAST [31] for similarity results and CAP3 [18] for assembly, with the following set of rules: (i) Mate-pairs must be in a contig together. If the mate-pairs assemble into two different contigs, the two sub-contigs are treated as a single contig. (ii) Contigs are incrementally built prioritized on bit scores and number of ESTs. Once a set of ESTs are in a contig together, they will never be split apart though they may be merged with others. (iii) CAP3 is only given sets of ESTs for assembly where the matched regions are correctly overlapping. Parameters governing these rules are provided by the user in a configuration file (see Additional file 2: Parameters and log files).


PAVE: program for assembling and viewing ESTs.

Soderlund C, Johnson E, Bomhoff M, Descour A - BMC Genomics (2009)

A schema of the PAVE assembly algorithm. The TC (transitive closure) loop is generally executed multiple times in order to merge contigs that have similar CCSs (contig consensus sequences). The user defines how many times the loop is executed, where for each loop a different set of parameters can be used. If the algorithm is being executed on a multi-processor machine, the user can request that the TC step use multiple processors.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2748094&req=5

Figure 1: A schema of the PAVE assembly algorithm. The TC (transitive closure) loop is generally executed multiple times in order to merge contigs that have similar CCSs (contig consensus sequences). The user defines how many times the loop is executed, where for each loop a different set of parameters can be used. If the algorithm is being executed on a multi-processor machine, the user can request that the TC step use multiple processors.
Mentions: As shown in Figure 1, the PAVE algorithm has a clique (fully connected graph) step, followed by one or more transitive closure (TC; connected graph) steps. The PAVE algorithm uses MegaBLAST [31] for similarity results and CAP3 [18] for assembly, with the following set of rules: (i) Mate-pairs must be in a contig together. If the mate-pairs assemble into two different contigs, the two sub-contigs are treated as a single contig. (ii) Contigs are incrementally built prioritized on bit scores and number of ESTs. Once a set of ESTs are in a contig together, they will never be split apart though they may be merged with others. (iii) CAP3 is only given sets of ESTs for assembly where the matched regions are correctly overlapping. Parameters governing these rules are provided by the user in a configuration file (see Additional file 2: Parameters and log files).

Bottom Line: The new 454 technology has the benefit of high-throughput expression profiling, but introduces time and space problems for assembling large contigs.A Java viewer program is provided for display and analysis of the results.The assembly software, data management software, Java viewer and user's guide are freely available.

View Article: PubMed Central - HTML - PubMed

Affiliation: BIO5 Institute, University of Arizona, Tucson, AZ 85721, USA. cari@agcol.arizona.edu

ABSTRACT

Background: New sequencing technologies are rapidly emerging. Many laboratories are simultaneously working with the traditional Sanger ESTs and experimenting with ESTs generated by the 454 Life Science sequencers. Though Sanger ESTs have been used to generate contigs for many years, no program takes full advantage of the 5' and 3' mate-pair information, hence, many tentative transcripts are assembled into two separate contigs. The new 454 technology has the benefit of high-throughput expression profiling, but introduces time and space problems for assembling large contigs.

Results: The PAVE (Program for Assembling and Viewing ESTs) assembler takes advantage of the 5' and 3' mate-pair information by requiring that the mate-pairs be assembled into the same contig and joined by n's if the two sub-contigs do not overlap. It handles the depth of 454 data sets by "burying" similar ESTs during assembly, which retains the expression level information while circumventing time and space problems. PAVE uses MegaBLAST for the clustering step and CAP3 for assembly, however it assembles incrementally to enforce the mate-pair constraint, bury ESTs, and reduce incorrect joins and splits. The PAVE data management system uses a MySQL database to store multiple libraries of ESTs along with their metadata; the management system allows multiple assemblies with variations on libraries and parameters. Analysis routines provide standard annotation for the contigs including a measure of differentially expressed genes across the libraries. A Java viewer program is provided for display and analysis of the results. Our results clearly show the benefit of using the PAVE assembler to explicitly use mate-pair information and bury ESTs for large contigs.

Conclusion: The PAVE assembler provides a software package for assembling Sanger and/or 454 ESTs. The assembly software, data management software, Java viewer and user's guide are freely available.

Show MeSH