Limits...
NFU-Enabled FASTA: moving bioinformatics applications onto wide area networks.

Baker EJ, Lin GN, Liu H, Kosuri R - Source Code Biol Med (2007)

Bottom Line: We also find that genome-scale sizes of the stored data are easily adaptable to logistical networks.In situations where computation is subject to parallel solution over very large data sets, this approach provides a means to allow distributed collaborators access to a shared storage resource capable of storing the large volumes of data equated with modern life science.In addition, it provides a computation framework that removes the burden of computation from the client and places it within the network.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Computer Science, School of Engineering and Computer Science, Baylor University, Waco, TX, USA. Erich_Baker@baylor.edu.

ABSTRACT

Background: Advances in Internet technologies have allowed life science researchers to reach beyond the lab-centric research paradigm to create distributed collaborations. Of the existing technologies that support distributed collaborations, there are currently none that simultaneously support data storage and computation as a shared network resource, enabling computational burden to be wholly removed from participating clients. Software using computation-enable logistical networking components of the Internet Backplane Protocol provides a suitable means to accomplish these tasks. Here, we demonstrate software that enables this approach by distributing both the FASTA algorithm and appropriate data sets within the framework of a wide area network.

Results: For large datasets, computation-enabled logistical networks provide a significant reduction in FASTA algorithm running time over local and non-distributed logistical networking frameworks. We also find that genome-scale sizes of the stored data are easily adaptable to logistical networks.

Conclusion: Network function unit-enabled Internet Backplane Protocol effectively distributes FASTA algorithm computation over large data sets stored within the scaleable network. In situations where computation is subject to parallel solution over very large data sets, this approach provides a means to allow distributed collaborators access to a shared storage resource capable of storing the large volumes of data equated with modern life science. In addition, it provides a computation framework that removes the burden of computation from the client and places it within the network.

No MeSH data available.


Average response time per database chunk vs. NFU-enables IBP FASTA services. FASTA-formatted genome sequence databases were either kept locally as an unformatted dataset, distributed within a local IBP node in 20 chunks, or distributed within a non-local IBP network to 1, 5, 10 or 20 nodes. In distributed, or chunked, systems the average response time of three efforts per node remains constant throughout the system, indicating that future speed-ups in time will be a function of the granularity of data stripping across the IBP network with a lower bound based on network communication time.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2211279&req=5

Figure 6: Average response time per database chunk vs. NFU-enables IBP FASTA services. FASTA-formatted genome sequence databases were either kept locally as an unformatted dataset, distributed within a local IBP node in 20 chunks, or distributed within a non-local IBP network to 1, 5, 10 or 20 nodes. In distributed, or chunked, systems the average response time of three efforts per node remains constant throughout the system, indicating that future speed-ups in time will be a function of the granularity of data stripping across the IBP network with a lower bound based on network communication time.

Mentions: The total response time versus various NFU-enabled IBP FASTA services as a function of query size was tested against the total data sets. Results indicate that query sizes of 500 and 1000 bp against remote one node FASTA systems return the slowest response time (Figure 5). This slowdown is expected over local FASTA systems as a result of network communication times. The local FASTA system with 20 nodes had a slightly better response time as compared with the system of local server with unfragmented datasets. This indicates that there is a break-even point where server communication time balances with data stripping and replication. In distributed, or non-local, systems the average response time per node remained constant throughout the system (Figure 6), indicating that future speed-ups in time will be a function of the granularity of data stripping across the IBP network with a lower bound based on network communication time.


NFU-Enabled FASTA: moving bioinformatics applications onto wide area networks.

Baker EJ, Lin GN, Liu H, Kosuri R - Source Code Biol Med (2007)

Average response time per database chunk vs. NFU-enables IBP FASTA services. FASTA-formatted genome sequence databases were either kept locally as an unformatted dataset, distributed within a local IBP node in 20 chunks, or distributed within a non-local IBP network to 1, 5, 10 or 20 nodes. In distributed, or chunked, systems the average response time of three efforts per node remains constant throughout the system, indicating that future speed-ups in time will be a function of the granularity of data stripping across the IBP network with a lower bound based on network communication time.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2211279&req=5

Figure 6: Average response time per database chunk vs. NFU-enables IBP FASTA services. FASTA-formatted genome sequence databases were either kept locally as an unformatted dataset, distributed within a local IBP node in 20 chunks, or distributed within a non-local IBP network to 1, 5, 10 or 20 nodes. In distributed, or chunked, systems the average response time of three efforts per node remains constant throughout the system, indicating that future speed-ups in time will be a function of the granularity of data stripping across the IBP network with a lower bound based on network communication time.
Mentions: The total response time versus various NFU-enabled IBP FASTA services as a function of query size was tested against the total data sets. Results indicate that query sizes of 500 and 1000 bp against remote one node FASTA systems return the slowest response time (Figure 5). This slowdown is expected over local FASTA systems as a result of network communication times. The local FASTA system with 20 nodes had a slightly better response time as compared with the system of local server with unfragmented datasets. This indicates that there is a break-even point where server communication time balances with data stripping and replication. In distributed, or non-local, systems the average response time per node remained constant throughout the system (Figure 6), indicating that future speed-ups in time will be a function of the granularity of data stripping across the IBP network with a lower bound based on network communication time.

Bottom Line: We also find that genome-scale sizes of the stored data are easily adaptable to logistical networks.In situations where computation is subject to parallel solution over very large data sets, this approach provides a means to allow distributed collaborators access to a shared storage resource capable of storing the large volumes of data equated with modern life science.In addition, it provides a computation framework that removes the burden of computation from the client and places it within the network.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Computer Science, School of Engineering and Computer Science, Baylor University, Waco, TX, USA. Erich_Baker@baylor.edu.

ABSTRACT

Background: Advances in Internet technologies have allowed life science researchers to reach beyond the lab-centric research paradigm to create distributed collaborations. Of the existing technologies that support distributed collaborations, there are currently none that simultaneously support data storage and computation as a shared network resource, enabling computational burden to be wholly removed from participating clients. Software using computation-enable logistical networking components of the Internet Backplane Protocol provides a suitable means to accomplish these tasks. Here, we demonstrate software that enables this approach by distributing both the FASTA algorithm and appropriate data sets within the framework of a wide area network.

Results: For large datasets, computation-enabled logistical networks provide a significant reduction in FASTA algorithm running time over local and non-distributed logistical networking frameworks. We also find that genome-scale sizes of the stored data are easily adaptable to logistical networks.

Conclusion: Network function unit-enabled Internet Backplane Protocol effectively distributes FASTA algorithm computation over large data sets stored within the scaleable network. In situations where computation is subject to parallel solution over very large data sets, this approach provides a means to allow distributed collaborators access to a shared storage resource capable of storing the large volumes of data equated with modern life science. In addition, it provides a computation framework that removes the burden of computation from the client and places it within the network.

No MeSH data available.