Limits...
A perspective for biomedical data integration: design of databases for flow cytometry.

Drakos J, Karakantza M, Zoumbos NC, Lakoumentas J, Nikiforidis GC, Sakellaropoulos GC - BMC Bioinformatics (2008)

Bottom Line: The proposed schema can potentially achieve up to 8 orders of magnitude reduction in query complexity and up to 2 orders of magnitude reduction in response time for data originating from flow cytometers that record 256 colours.This is mainly achieved by managing to maintain an almost constant number of data-mining procedures regardless of the size and complexity of the stored information.Analysis of the requirements of a specific domain for integration and massive data processing can provide the necessary schema modifications that will unlock the additional functionality of a relational database.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Medical Physics, School of Medicine, University of Patras, GR-26504 Rion, Greece. drakos@upatras.gr

ABSTRACT

Background: The integration of biomedical information is essential for tackling medical problems. We describe a data model in the domain of flow cytometry (FC) allowing for massive management, analysis and integration with other laboratory and clinical information. The paper is concerned with the proper translation of the Flow Cytometry Standard (FCS) into a relational database schema, in a way that facilitates end users at either doing research on FC or studying specific cases of patients undergone FC analysis

Results: The proposed database schema provides integration of data originating from diverse acquisition settings, organized in a way that allows syntactically simple queries that provide results significantly faster than the conventional implementations of the FCS standard. The proposed schema can potentially achieve up to 8 orders of magnitude reduction in query complexity and up to 2 orders of magnitude reduction in response time for data originating from flow cytometers that record 256 colours. This is mainly achieved by managing to maintain an almost constant number of data-mining procedures regardless of the size and complexity of the stored information.

Conclusion: It is evident that using single-file data storage standards for the design of databases without any structural transformations significantly limits the flexibility of databases. Analysis of the requirements of a specific domain for integration and massive data processing can provide the necessary schema modifications that will unlock the additional functionality of a relational database.

Show MeSH

Related in: MedlinePlus

Horizontal vs Vertical. Header HILs: similar to both schemata. Cell HILs: While in horizontal HILs the fields have clear meanings (FS, SS, CD5 etc), vertical HILs are useless because field labels cannot be resolved. Measurement HILs: In the horizontal case, cell fields maintain their clear meanings and HIL gains value from the additional fields of the Header segment. In the vertical case, the additional information of the Header segment can be used to resolve the cell field labels. Even though both HILs contain the same amount of information, it remains hidden in the vertical case. Experiment HILs: Horizontal HILs manage to remain simple (all fields have clear meanings and data is perfectly aligned among different records) despite the fact that the amount of information has significantly grown. For vertical HILs to become useful, the user must resolve field labels, unify the different naming forms of the same antigen and finally align all the fragmented values of the same antigen under a single field.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2267440&req=5

Figure 4: Horizontal vs Vertical. Header HILs: similar to both schemata. Cell HILs: While in horizontal HILs the fields have clear meanings (FS, SS, CD5 etc), vertical HILs are useless because field labels cannot be resolved. Measurement HILs: In the horizontal case, cell fields maintain their clear meanings and HIL gains value from the additional fields of the Header segment. In the vertical case, the additional information of the Header segment can be used to resolve the cell field labels. Even though both HILs contain the same amount of information, it remains hidden in the vertical case. Experiment HILs: Horizontal HILs manage to remain simple (all fields have clear meanings and data is perfectly aligned among different records) despite the fact that the amount of information has significantly grown. For vertical HILs to become useful, the user must resolve field labels, unify the different naming forms of the same antigen and finally align all the fragmented values of the same antigen under a single field.

Mentions: We define the term "Horizontal Database Schema" as the database design in which the retrieval of all data originating from the same HIL is enough to render them useful. No further data mining is needed (Figure 4) for the retrieved information to become useful. Horizontal Database Schemata are vital when intra- or inter-domain integration is the goal of an information system.


A perspective for biomedical data integration: design of databases for flow cytometry.

Drakos J, Karakantza M, Zoumbos NC, Lakoumentas J, Nikiforidis GC, Sakellaropoulos GC - BMC Bioinformatics (2008)

Horizontal vs Vertical. Header HILs: similar to both schemata. Cell HILs: While in horizontal HILs the fields have clear meanings (FS, SS, CD5 etc), vertical HILs are useless because field labels cannot be resolved. Measurement HILs: In the horizontal case, cell fields maintain their clear meanings and HIL gains value from the additional fields of the Header segment. In the vertical case, the additional information of the Header segment can be used to resolve the cell field labels. Even though both HILs contain the same amount of information, it remains hidden in the vertical case. Experiment HILs: Horizontal HILs manage to remain simple (all fields have clear meanings and data is perfectly aligned among different records) despite the fact that the amount of information has significantly grown. For vertical HILs to become useful, the user must resolve field labels, unify the different naming forms of the same antigen and finally align all the fragmented values of the same antigen under a single field.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2267440&req=5

Figure 4: Horizontal vs Vertical. Header HILs: similar to both schemata. Cell HILs: While in horizontal HILs the fields have clear meanings (FS, SS, CD5 etc), vertical HILs are useless because field labels cannot be resolved. Measurement HILs: In the horizontal case, cell fields maintain their clear meanings and HIL gains value from the additional fields of the Header segment. In the vertical case, the additional information of the Header segment can be used to resolve the cell field labels. Even though both HILs contain the same amount of information, it remains hidden in the vertical case. Experiment HILs: Horizontal HILs manage to remain simple (all fields have clear meanings and data is perfectly aligned among different records) despite the fact that the amount of information has significantly grown. For vertical HILs to become useful, the user must resolve field labels, unify the different naming forms of the same antigen and finally align all the fragmented values of the same antigen under a single field.
Mentions: We define the term "Horizontal Database Schema" as the database design in which the retrieval of all data originating from the same HIL is enough to render them useful. No further data mining is needed (Figure 4) for the retrieved information to become useful. Horizontal Database Schemata are vital when intra- or inter-domain integration is the goal of an information system.

Bottom Line: The proposed schema can potentially achieve up to 8 orders of magnitude reduction in query complexity and up to 2 orders of magnitude reduction in response time for data originating from flow cytometers that record 256 colours.This is mainly achieved by managing to maintain an almost constant number of data-mining procedures regardless of the size and complexity of the stored information.Analysis of the requirements of a specific domain for integration and massive data processing can provide the necessary schema modifications that will unlock the additional functionality of a relational database.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Medical Physics, School of Medicine, University of Patras, GR-26504 Rion, Greece. drakos@upatras.gr

ABSTRACT

Background: The integration of biomedical information is essential for tackling medical problems. We describe a data model in the domain of flow cytometry (FC) allowing for massive management, analysis and integration with other laboratory and clinical information. The paper is concerned with the proper translation of the Flow Cytometry Standard (FCS) into a relational database schema, in a way that facilitates end users at either doing research on FC or studying specific cases of patients undergone FC analysis

Results: The proposed database schema provides integration of data originating from diverse acquisition settings, organized in a way that allows syntactically simple queries that provide results significantly faster than the conventional implementations of the FCS standard. The proposed schema can potentially achieve up to 8 orders of magnitude reduction in query complexity and up to 2 orders of magnitude reduction in response time for data originating from flow cytometers that record 256 colours. This is mainly achieved by managing to maintain an almost constant number of data-mining procedures regardless of the size and complexity of the stored information.

Conclusion: It is evident that using single-file data storage standards for the design of databases without any structural transformations significantly limits the flexibility of databases. Analysis of the requirements of a specific domain for integration and massive data processing can provide the necessary schema modifications that will unlock the additional functionality of a relational database.

Show MeSH
Related in: MedlinePlus