Limits...
Design and implementation of a generalized laboratory data model.

Wendl MC, Smith S, Pohl CS, Dooling DJ, Chinwalla AT, Crouse K, Hepler T, Leong S, Carmichael L, Nhan M, Oberkfell BJ, Mardis ER, Hillier LW, Wilson RK - BMC Bioinformatics (2007)

Bottom Line: The implementation described here has served as the Washington University Genome Sequencing Center's primary information system for several years.It handles all transactions underlying a throughput rate of about 9 million sequencing reactions of various kinds per month and has handily weathered a number of major pipeline reconfigurations.The basic data model can be readily adapted to other high-volume processing environments.

View Article: PubMed Central - HTML - PubMed

Affiliation: Genome Sequencing Center, Washington University, St, Louis, MO 63108, USA. mwendl@wustl.edu

ABSTRACT

Background: Investigators in the biological sciences continue to exploit laboratory automation methods and have dramatically increased the rates at which they can generate data. In many environments, the methods themselves also evolve in a rapid and fluid manner. These observations point to the importance of robust information management systems in the modern laboratory. Designing and implementing such systems is non-trivial and it appears that in many cases a database project ultimately proves unserviceable.

Results: We describe a general modeling framework for laboratory data and its implementation as an information management system. The model utilizes several abstraction techniques, focusing especially on the concepts of inheritance and meta-data. Traditional approaches commingle event-oriented data with regular entity data in ad hoc ways. Instead, we define distinct regular entity and event schemas, but fully integrate these via a standardized interface. The design allows straightforward definition of a "processing pipeline" as a sequence of events, obviating the need for separate workflow management systems. A layer above the event-oriented schema integrates events into a workflow by defining "processing directives", which act as automated project managers of items in the system. Directives can be added or modified in an almost trivial fashion, i.e., without the need for schema modification or re-certification of applications. Association between regular entities and events is managed via simple "many-to-many" relationships. We describe the programming interface, as well as techniques for handling input/output, process control, and state transitions.

Conclusion: The implementation described here has served as the Washington University Genome Sequencing Center's primary information system for several years. It handles all transactions underlying a throughput rate of about 9 million sequencing reactions of various kinds per month and has handily weathered a number of major pipeline reconfigurations. The basic data model can be readily adapted to other high-volume processing environments.

Show MeSH
Description of a medical sequencing pipeline. Boxes represent entity instances (objects), while arrow colors represent the following: event flow (black), output from an event (green), input to an event (blue), directives governing an event (red).
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2194795&req=5

Figure 5: Description of a medical sequencing pipeline. Boxes represent entity instances (objects), while arrow colors represent the following: event flow (black), output from an event (green), input to an event (blue), directives governing an event (red).

Mentions: The procedural layout of a typical medical sequencing operation is shown in Fig. 5. Physical articles (regular entities) and events (event entities) are represented, along with some of the specifications and constraints (directives) specific to this type of sequencing. Interactions among the various object types are complicated, demonstrating that direct designs similar to that in Fig. 4 would be inadequate. The use of directives is especially notable in this process. For example, the project object includes a list of reference sequences to target and details which portions of those reference sequences are significant. A set of primer pairs, called a tiling path, is defined such that the required regions will be covered, and this information is also part of the project. Subordinate directives are linked to the initial step in the pipeline, as well. One governs annotation of the reference sequence while another controls the tiling path algorithm. Finally, additional directives control data verification according to project-specific requirements.


Design and implementation of a generalized laboratory data model.

Wendl MC, Smith S, Pohl CS, Dooling DJ, Chinwalla AT, Crouse K, Hepler T, Leong S, Carmichael L, Nhan M, Oberkfell BJ, Mardis ER, Hillier LW, Wilson RK - BMC Bioinformatics (2007)

Description of a medical sequencing pipeline. Boxes represent entity instances (objects), while arrow colors represent the following: event flow (black), output from an event (green), input to an event (blue), directives governing an event (red).
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2194795&req=5

Figure 5: Description of a medical sequencing pipeline. Boxes represent entity instances (objects), while arrow colors represent the following: event flow (black), output from an event (green), input to an event (blue), directives governing an event (red).
Mentions: The procedural layout of a typical medical sequencing operation is shown in Fig. 5. Physical articles (regular entities) and events (event entities) are represented, along with some of the specifications and constraints (directives) specific to this type of sequencing. Interactions among the various object types are complicated, demonstrating that direct designs similar to that in Fig. 4 would be inadequate. The use of directives is especially notable in this process. For example, the project object includes a list of reference sequences to target and details which portions of those reference sequences are significant. A set of primer pairs, called a tiling path, is defined such that the required regions will be covered, and this information is also part of the project. Subordinate directives are linked to the initial step in the pipeline, as well. One governs annotation of the reference sequence while another controls the tiling path algorithm. Finally, additional directives control data verification according to project-specific requirements.

Bottom Line: The implementation described here has served as the Washington University Genome Sequencing Center's primary information system for several years.It handles all transactions underlying a throughput rate of about 9 million sequencing reactions of various kinds per month and has handily weathered a number of major pipeline reconfigurations.The basic data model can be readily adapted to other high-volume processing environments.

View Article: PubMed Central - HTML - PubMed

Affiliation: Genome Sequencing Center, Washington University, St, Louis, MO 63108, USA. mwendl@wustl.edu

ABSTRACT

Background: Investigators in the biological sciences continue to exploit laboratory automation methods and have dramatically increased the rates at which they can generate data. In many environments, the methods themselves also evolve in a rapid and fluid manner. These observations point to the importance of robust information management systems in the modern laboratory. Designing and implementing such systems is non-trivial and it appears that in many cases a database project ultimately proves unserviceable.

Results: We describe a general modeling framework for laboratory data and its implementation as an information management system. The model utilizes several abstraction techniques, focusing especially on the concepts of inheritance and meta-data. Traditional approaches commingle event-oriented data with regular entity data in ad hoc ways. Instead, we define distinct regular entity and event schemas, but fully integrate these via a standardized interface. The design allows straightforward definition of a "processing pipeline" as a sequence of events, obviating the need for separate workflow management systems. A layer above the event-oriented schema integrates events into a workflow by defining "processing directives", which act as automated project managers of items in the system. Directives can be added or modified in an almost trivial fashion, i.e., without the need for schema modification or re-certification of applications. Association between regular entities and events is managed via simple "many-to-many" relationships. We describe the programming interface, as well as techniques for handling input/output, process control, and state transitions.

Conclusion: The implementation described here has served as the Washington University Genome Sequencing Center's primary information system for several years. It handles all transactions underlying a throughput rate of about 9 million sequencing reactions of various kinds per month and has handily weathered a number of major pipeline reconfigurations. The basic data model can be readily adapted to other high-volume processing environments.

Show MeSH