Limits...
Semantic Web-based integration of cancer pathways and allele frequency data.

Holford ME, Rajeevan H, Zhao H, Kidd KK, Cheung KH - Cancer Inform (2009)

Bottom Line: The ability to perform queries across the domains of population genetics and pathways offers the potential to answer a number of cancer-related research questions.This sort of information could be useful for designing clinical studies and for providing background data in personalized medicine.It could also assist with the interpretation of genetic analysis results such as those from genome-wide association studies.

View Article: PubMed Central - PubMed

Affiliation: Department of Biostatistics, School of Public Health, Yale University, New Haven, CT, USA.

ABSTRACT
We demonstrate the use of Semantic Web technology to integrate the ALFRED allele frequency database and the Starpath pathway resource. The linking of population-specific genotype data with cancer-related pathway data is potentially useful given the growing interest in personalized medicine and the exploitation of pathway knowledge for cancer drug discovery. We model our data using the Web Ontology Language (OWL), drawing upon ideas from existing standard formats BioPAX for pathway data and PML for allele frequency data. We store our data within an Oracle database, using Oracle Semantic Technologies. We then query the data using Oracle's rule-based inference engine and SPARQL-like RDF query language. The ability to perform queries across the domains of population genetics and pathways offers the potential to answer a number of cancer-related research questions. Among the possibilities is the ability to identify genetic variants which are associated with cancer pathways and whose frequency varies significantly between ethnic groups. This sort of information could be useful for designing clinical studies and for providing background data in personalized medicine. It could also assist with the interpretation of genetic analysis results such as those from genome-wide association studies.

No MeSH data available.


Related in: MedlinePlus

The core of the Starpath Pathway data model.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC2664696&req=5

f3-cin-08-19: The core of the Starpath Pathway data model.

Mentions: Our ontology for the ALFRED data (illustrated in Fig. 3) is based upon the subset of PML that ALFRED uses for data export. In most cases, the class names and properties are drawn directly from the elements and attributes defined in PML’s xsd schema. A notable exception is in the handling of populations and samples, a central feature of the ALFRED database. Whereas in the PML format, each of these are treated as Panel elements with differing attributes, for the ALFRED ontology, we created distinct Population and Sample classes. We also define a sample object property of multiple cardinality within the Population domain to express the fact that individual Populations can have more than one sample. The Population class is further specified by a unique id from the ALFRED database and by datatype properties for ethnicity, geographicRegion, languageFamily and primaryLanguage. A paragraph-length description of the Population is also provided. A GeographicLocation class is defined to hold latitudinal and longitudinal data for the Population and is specified as an object property bounded by the Population class. The Sample class also holds a unique id generated by the ALFRED database and datatype properties indicating countUnit and size for the sample. A brief description further details the procedure used to gather the sample. The second central feature of the ALFRED semantic store is the GenomicPolymorphism class. In addition to its database-derived id, this class acts as domain for datatype properties representing the snpID from dbSNP,28 the validationStatus and zero to many corresponding geneIDs from the NCBI’s Entrez database. Two other classes are defined for ranges of object properties on the GenomicPolymorphism class. The GenomicAllele class defines the one or more alleles in question by its database-generated id. The ReferenceGenomicLocationInAssembly class details the location of the polymorphism upon the chromosome by specifying the strand, chromosomeName, start and end of the sequence. Finally, we define a GenomicAllelePopulation-Frequency class to join information about the polymorphism with information about the population. Along with object properties pointing to the genomicAllele and the sample, there are datatype properties giving the frequency value and count.


Semantic Web-based integration of cancer pathways and allele frequency data.

Holford ME, Rajeevan H, Zhao H, Kidd KK, Cheung KH - Cancer Inform (2009)

The core of the Starpath Pathway data model.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC2664696&req=5

f3-cin-08-19: The core of the Starpath Pathway data model.
Mentions: Our ontology for the ALFRED data (illustrated in Fig. 3) is based upon the subset of PML that ALFRED uses for data export. In most cases, the class names and properties are drawn directly from the elements and attributes defined in PML’s xsd schema. A notable exception is in the handling of populations and samples, a central feature of the ALFRED database. Whereas in the PML format, each of these are treated as Panel elements with differing attributes, for the ALFRED ontology, we created distinct Population and Sample classes. We also define a sample object property of multiple cardinality within the Population domain to express the fact that individual Populations can have more than one sample. The Population class is further specified by a unique id from the ALFRED database and by datatype properties for ethnicity, geographicRegion, languageFamily and primaryLanguage. A paragraph-length description of the Population is also provided. A GeographicLocation class is defined to hold latitudinal and longitudinal data for the Population and is specified as an object property bounded by the Population class. The Sample class also holds a unique id generated by the ALFRED database and datatype properties indicating countUnit and size for the sample. A brief description further details the procedure used to gather the sample. The second central feature of the ALFRED semantic store is the GenomicPolymorphism class. In addition to its database-derived id, this class acts as domain for datatype properties representing the snpID from dbSNP,28 the validationStatus and zero to many corresponding geneIDs from the NCBI’s Entrez database. Two other classes are defined for ranges of object properties on the GenomicPolymorphism class. The GenomicAllele class defines the one or more alleles in question by its database-generated id. The ReferenceGenomicLocationInAssembly class details the location of the polymorphism upon the chromosome by specifying the strand, chromosomeName, start and end of the sequence. Finally, we define a GenomicAllelePopulation-Frequency class to join information about the polymorphism with information about the population. Along with object properties pointing to the genomicAllele and the sample, there are datatype properties giving the frequency value and count.

Bottom Line: The ability to perform queries across the domains of population genetics and pathways offers the potential to answer a number of cancer-related research questions.This sort of information could be useful for designing clinical studies and for providing background data in personalized medicine.It could also assist with the interpretation of genetic analysis results such as those from genome-wide association studies.

View Article: PubMed Central - PubMed

Affiliation: Department of Biostatistics, School of Public Health, Yale University, New Haven, CT, USA.

ABSTRACT
We demonstrate the use of Semantic Web technology to integrate the ALFRED allele frequency database and the Starpath pathway resource. The linking of population-specific genotype data with cancer-related pathway data is potentially useful given the growing interest in personalized medicine and the exploitation of pathway knowledge for cancer drug discovery. We model our data using the Web Ontology Language (OWL), drawing upon ideas from existing standard formats BioPAX for pathway data and PML for allele frequency data. We store our data within an Oracle database, using Oracle Semantic Technologies. We then query the data using Oracle's rule-based inference engine and SPARQL-like RDF query language. The ability to perform queries across the domains of population genetics and pathways offers the potential to answer a number of cancer-related research questions. Among the possibilities is the ability to identify genetic variants which are associated with cancer pathways and whose frequency varies significantly between ethnic groups. This sort of information could be useful for designing clinical studies and for providing background data in personalized medicine. It could also assist with the interpretation of genetic analysis results such as those from genome-wide association studies.

No MeSH data available.


Related in: MedlinePlus