Limits...
An ontology approach to comparative phenomics in plants.

Oellrich A, Walls RL, Cannon EK, Cannon SB, Cooper L, Gardiner J, Gkoutos GV, Harper L, He M, Hoehndorf R, Jaiswal P, Kalberer SR, Lloyd JP, Meinke D, Menda N, Moore L, Nelson RT, Pujar A, Lawrence CJ, Huala E - Plant Methods (2015)

Bottom Line: Our effort focused on mutant phenotypes associated with genes of known sequence in Arabidopsis thaliana (L.) Heynh. (Arabidopsis), Zea mays L. subsp. mays (maize), Medicago truncatula Gaertn. (barrel medic or Medicago), Oryza sativa L. (rice), Glycine max (L.) Merr. (soybean), and Solanum lycopersicum L. (tomato).We then compared ontology-based phenotypic descriptions with an existing classification system for plant phenotypes and evaluated our semantic similarity dataset for its ability to enhance predictions of gene families, protein functions, and shared metabolic pathways that underlie informative plant phenotypes.The use of ontologies, annotation standards, shared formats, and best practices for cross-taxon phenotype data analyses represents a novel approach to plant phenomics that enhances the utility of model genetic organisms and can be readily applied to species with fewer genetic resources and less well-characterized genomes.

View Article: PubMed Central - PubMed

Affiliation: Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, CB10 1SA UK.

ABSTRACT

Background: Plant phenotype datasets include many different types of data, formats, and terms from specialized vocabularies. Because these datasets were designed for different audiences, they frequently contain language and details tailored to investigators with different research objectives and backgrounds. Although phenotype comparisons across datasets have long been possible on a small scale, comprehensive queries and analyses that span a broad set of reference species, research disciplines, and knowledge domains continue to be severely limited by the absence of a common semantic framework.

Results: We developed a workflow to curate and standardize existing phenotype datasets for six plant species, encompassing both model species and crop plants with established genetic resources. Our effort focused on mutant phenotypes associated with genes of known sequence in Arabidopsis thaliana (L.) Heynh. (Arabidopsis), Zea mays L. subsp. mays (maize), Medicago truncatula Gaertn. (barrel medic or Medicago), Oryza sativa L. (rice), Glycine max (L.) Merr. (soybean), and Solanum lycopersicum L. (tomato). We applied the same ontologies, annotation standards, formats, and best practices across all six species, thereby ensuring that the shared dataset could be used for cross-species querying and semantic similarity analyses. Curated phenotypes were first converted into a common format using taxonomically broad ontologies such as the Plant Ontology, Gene Ontology, and Phenotype and Trait Ontology. We then compared ontology-based phenotypic descriptions with an existing classification system for plant phenotypes and evaluated our semantic similarity dataset for its ability to enhance predictions of gene families, protein functions, and shared metabolic pathways that underlie informative plant phenotypes.

Conclusions: The use of ontologies, annotation standards, shared formats, and best practices for cross-taxon phenotype data analyses represents a novel approach to plant phenomics that enhances the utility of model genetic organisms and can be readily applied to species with fewer genetic resources and less well-characterized genomes. In addition, these tools should enhance future efforts to explore the relationships among phenotypic similarity, gene function, and sequence similarity in plants, and to make genotype-to-phenotype predictions relevant to plant biology, crop improvement, and potentially even human health.

No MeSH data available.


Related in: MedlinePlus

Semantic similarity score distributions for inter- and intraspecific pairwise phenotype similarity. When binning all semantic similarity scores across all species, 44% of semantic similarity scores indicate a relatively low phenotypic overlap between genes (semantic similarity range 0–0.1) while 13% show highly similar phenotypes (similarity score range 0.9-1) (A). Distributions of intraspecific scores (pairwise scores where both genotypes belong to the same species) were similar to the overall distribution of scores (B-H).
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4359497&req=5

Fig2: Semantic similarity score distributions for inter- and intraspecific pairwise phenotype similarity. When binning all semantic similarity scores across all species, 44% of semantic similarity scores indicate a relatively low phenotypic overlap between genes (semantic similarity range 0–0.1) while 13% show highly similar phenotypes (similarity score range 0.9-1) (A). Distributions of intraspecific scores (pairwise scores where both genotypes belong to the same species) were similar to the overall distribution of scores (B-H).

Mentions: We calculated semantic similarity scores for 548,888 genotype pairs in the range of >0 – 1. A similarity score of 0 indicates no semantic overlap with respect to the phenotype, while a similarity score of 1 indicates an identical semantic phenotype description (and therefore equivalent sets of EQs). Figure 2A illustrates the distribution of semantic similarity scores for intra- as well as inter-species genotype pairs. For 13% (71,290) of the genotype pairs possessing a semantic similarity score, the score fell into the range 0.9 – 1 (not including the similarity of a genotype to itself, which is always 1). While 13% seems high, some of the nearly identical scores occur because of the limited availability of phenotype information for many genotypes. For example, if two genotypes are annotated with the same single EQ statement, the result is a semantic similarity score of one, even if in reality those mutant genotypes may have many more phenes that were not recorded. Only known phenes that were already curated from the scientific literature were assigned to genotypes, and our method cannot compensate for gaps in the literature (e.g., due to limitations in biological experiments). As the dataset grows, a better separation of genotypes with respect to their semantic phenotype similarity will be possible.Figure 2


An ontology approach to comparative phenomics in plants.

Oellrich A, Walls RL, Cannon EK, Cannon SB, Cooper L, Gardiner J, Gkoutos GV, Harper L, He M, Hoehndorf R, Jaiswal P, Kalberer SR, Lloyd JP, Meinke D, Menda N, Moore L, Nelson RT, Pujar A, Lawrence CJ, Huala E - Plant Methods (2015)

Semantic similarity score distributions for inter- and intraspecific pairwise phenotype similarity. When binning all semantic similarity scores across all species, 44% of semantic similarity scores indicate a relatively low phenotypic overlap between genes (semantic similarity range 0–0.1) while 13% show highly similar phenotypes (similarity score range 0.9-1) (A). Distributions of intraspecific scores (pairwise scores where both genotypes belong to the same species) were similar to the overall distribution of scores (B-H).
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4359497&req=5

Fig2: Semantic similarity score distributions for inter- and intraspecific pairwise phenotype similarity. When binning all semantic similarity scores across all species, 44% of semantic similarity scores indicate a relatively low phenotypic overlap between genes (semantic similarity range 0–0.1) while 13% show highly similar phenotypes (similarity score range 0.9-1) (A). Distributions of intraspecific scores (pairwise scores where both genotypes belong to the same species) were similar to the overall distribution of scores (B-H).
Mentions: We calculated semantic similarity scores for 548,888 genotype pairs in the range of >0 – 1. A similarity score of 0 indicates no semantic overlap with respect to the phenotype, while a similarity score of 1 indicates an identical semantic phenotype description (and therefore equivalent sets of EQs). Figure 2A illustrates the distribution of semantic similarity scores for intra- as well as inter-species genotype pairs. For 13% (71,290) of the genotype pairs possessing a semantic similarity score, the score fell into the range 0.9 – 1 (not including the similarity of a genotype to itself, which is always 1). While 13% seems high, some of the nearly identical scores occur because of the limited availability of phenotype information for many genotypes. For example, if two genotypes are annotated with the same single EQ statement, the result is a semantic similarity score of one, even if in reality those mutant genotypes may have many more phenes that were not recorded. Only known phenes that were already curated from the scientific literature were assigned to genotypes, and our method cannot compensate for gaps in the literature (e.g., due to limitations in biological experiments). As the dataset grows, a better separation of genotypes with respect to their semantic phenotype similarity will be possible.Figure 2

Bottom Line: Our effort focused on mutant phenotypes associated with genes of known sequence in Arabidopsis thaliana (L.) Heynh. (Arabidopsis), Zea mays L. subsp. mays (maize), Medicago truncatula Gaertn. (barrel medic or Medicago), Oryza sativa L. (rice), Glycine max (L.) Merr. (soybean), and Solanum lycopersicum L. (tomato).We then compared ontology-based phenotypic descriptions with an existing classification system for plant phenotypes and evaluated our semantic similarity dataset for its ability to enhance predictions of gene families, protein functions, and shared metabolic pathways that underlie informative plant phenotypes.The use of ontologies, annotation standards, shared formats, and best practices for cross-taxon phenotype data analyses represents a novel approach to plant phenomics that enhances the utility of model genetic organisms and can be readily applied to species with fewer genetic resources and less well-characterized genomes.

View Article: PubMed Central - PubMed

Affiliation: Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, CB10 1SA UK.

ABSTRACT

Background: Plant phenotype datasets include many different types of data, formats, and terms from specialized vocabularies. Because these datasets were designed for different audiences, they frequently contain language and details tailored to investigators with different research objectives and backgrounds. Although phenotype comparisons across datasets have long been possible on a small scale, comprehensive queries and analyses that span a broad set of reference species, research disciplines, and knowledge domains continue to be severely limited by the absence of a common semantic framework.

Results: We developed a workflow to curate and standardize existing phenotype datasets for six plant species, encompassing both model species and crop plants with established genetic resources. Our effort focused on mutant phenotypes associated with genes of known sequence in Arabidopsis thaliana (L.) Heynh. (Arabidopsis), Zea mays L. subsp. mays (maize), Medicago truncatula Gaertn. (barrel medic or Medicago), Oryza sativa L. (rice), Glycine max (L.) Merr. (soybean), and Solanum lycopersicum L. (tomato). We applied the same ontologies, annotation standards, formats, and best practices across all six species, thereby ensuring that the shared dataset could be used for cross-species querying and semantic similarity analyses. Curated phenotypes were first converted into a common format using taxonomically broad ontologies such as the Plant Ontology, Gene Ontology, and Phenotype and Trait Ontology. We then compared ontology-based phenotypic descriptions with an existing classification system for plant phenotypes and evaluated our semantic similarity dataset for its ability to enhance predictions of gene families, protein functions, and shared metabolic pathways that underlie informative plant phenotypes.

Conclusions: The use of ontologies, annotation standards, shared formats, and best practices for cross-taxon phenotype data analyses represents a novel approach to plant phenomics that enhances the utility of model genetic organisms and can be readily applied to species with fewer genetic resources and less well-characterized genomes. In addition, these tools should enhance future efforts to explore the relationships among phenotypic similarity, gene function, and sequence similarity in plants, and to make genotype-to-phenotype predictions relevant to plant biology, crop improvement, and potentially even human health.

No MeSH data available.


Related in: MedlinePlus