Limits...
A CTD-Pfizer collaboration: manual curation of 88,000 scientific articles text mined for drug-disease and drug-phenotype interactions.

Davis AP, Wiegers TC, Roberts PM, King BL, Lay JM, Lennon-Hopkins K, Sciaky D, Johnson R, Keating H, Greene N, Hernandez R, McConnell KJ, Enayetallah AE, Mattingly CJ - Database (Oxford) (2013)

Bottom Line: This curation can be leveraged for information about toxic endpoints important to drug safety and help develop testable hypotheses for drug-disease events.The availability of these detailed, contextualized, high-quality annotations curated from seven decades' worth of the scientific literature should help facilitate new mechanistic screening assays for pharmaceutical compound survival.This unique partnership demonstrates the importance of resource sharing and collaboration between public and private entities and underscores the complementary needs of the environmental health science and pharmaceutical communities.

View Article: PubMed Central - PubMed

Affiliation: Department of Biological Sciences, 3510 Thomas Hall, North Carolina State University, Raleigh, NC 27695-7617, USA, Computational Sciences Center of Emphasis, 200 Cambridgepark Drive, Pfizer Inc., Cambridge, MA 02139, USA, Department of Bioinformatics, P.O. Box 35, Old Bar Harbor Road, MDI Biological Laboratory, Salisbury Cove, ME 04672, USA, Compound Safety Prediction, MS 8118-B3, Eastern Point Road, Pfizer Inc., Groton, CT 06340, USA, Computational Sciences Center of Emphasis, Pfizer Inc., Ramsgate Road, Sandwich, Kent CT13 9NJ, UK, Computational Sciences Center of Emphasis, 558 Eastern Point Road, Pfizer Inc., Groton, CT 06340, USA and Drug Safety Research and Development, 558 Eastern Point Road, Pfizer Inc., Groton, CT 06340, USA.

ABSTRACT
Improving the prediction of chemical toxicity is a goal common to both environmental health research and pharmaceutical drug development. To improve safety detection assays, it is critical to have a reference set of molecules with well-defined toxicity annotations for training and validation purposes. Here, we describe a collaboration between safety researchers at Pfizer and the research team at the Comparative Toxicogenomics Database (CTD) to text mine and manually review a collection of 88,629 articles relating over 1,200 pharmaceutical drugs to their potential involvement in cardiovascular, neurological, renal and hepatic toxicity. In 1 year, CTD biocurators curated 254,173 toxicogenomic interactions (152,173 chemical-disease, 58,572 chemical-gene, 5,345 gene-disease and 38,083 phenotype interactions). All chemical-gene-disease interactions are fully integrated with public CTD, and phenotype interactions can be downloaded. We describe Pfizer's text-mining process to collate the articles, and CTD's curation strategy, performance metrics, enhanced data content and new module to curate phenotype information. As well, we show how data integration can connect phenotypes to diseases. This curation can be leveraged for information about toxic endpoints important to drug safety and help develop testable hypotheses for drug-disease events. The availability of these detailed, contextualized, high-quality annotations curated from seven decades' worth of the scientific literature should help facilitate new mechanistic screening assays for pharmaceutical compound survival. This unique partnership demonstrates the importance of resource sharing and collaboration between public and private entities and underscores the complementary needs of the environmental health science and pharmaceutical communities. Database URL: http://ctdbase.org/

Show MeSH

Related in: MedlinePlus

CTD phenotypes inferred to diseases through shared chemicals. A matrix of 74 phenotypes (rows) by 750 diseases (columns) was constructed where each cell represented the number of shared chemicals. The matrix was analysed by two-dimensional hierarchical clustering and visualized as a heatmap where the normalized number of shared chemicals are colored (green = low; black = medium; red = high). The similarities among the number of shared chemicals for diseases across all phenotypes are shown in the dendrogram beneath the heatmap, where the lengths of the lines are inversely proportional to the similarity (i.e., short = highly similar, long = dissimilar). An enlargement (blue boxes, blue arrow) shows how the disease dendrogram was trimmed to select 18 disease clusters (dotted line, with clusters numbered), and these boundaries are also represented on the heatmap (numbered white boxes). Below, the number of unique phenotypes, chemicals and diseases are charted for each cluster. In pie charts at the very bottom, predominant disease classes for some of the clusters are shown (only the top four disease classes are graphed). For example, of the 19 diseases in cluster 1, 28% of them represent cancers, 13% digestive system diseases, 13% immune system diseases and 9% lymphatic diseases. To the right of the heatmap, the similarities among the number of shared chemicals for phenotypes across all diseases are also shown in another dendrogram, where the lengths of the lines are inversely proportional to the similarity.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3842776&req=5

bat080-F5: CTD phenotypes inferred to diseases through shared chemicals. A matrix of 74 phenotypes (rows) by 750 diseases (columns) was constructed where each cell represented the number of shared chemicals. The matrix was analysed by two-dimensional hierarchical clustering and visualized as a heatmap where the normalized number of shared chemicals are colored (green = low; black = medium; red = high). The similarities among the number of shared chemicals for diseases across all phenotypes are shown in the dendrogram beneath the heatmap, where the lengths of the lines are inversely proportional to the similarity (i.e., short = highly similar, long = dissimilar). An enlargement (blue boxes, blue arrow) shows how the disease dendrogram was trimmed to select 18 disease clusters (dotted line, with clusters numbered), and these boundaries are also represented on the heatmap (numbered white boxes). Below, the number of unique phenotypes, chemicals and diseases are charted for each cluster. In pie charts at the very bottom, predominant disease classes for some of the clusters are shown (only the top four disease classes are graphed). For example, of the 19 diseases in cluster 1, 28% of them represent cancers, 13% digestive system diseases, 13% immune system diseases and 9% lymphatic diseases. To the right of the heatmap, the similarities among the number of shared chemicals for phenotypes across all diseases are also shown in another dendrogram, where the lengths of the lines are inversely proportional to the similarity.

Mentions: CTD then used this phenotype–disease inference file to construct a two-dimensional matrix, where each intersecting cell represented the number of shared chemicals between the phenotype and disease. For this analysis, inferences between a phenotype and disease were required to share a minimum of 10 chemicals. This stringency reduced the matrix to 74 phenotypes and 750 diseases. Two-dimensional hierarchical clustering ordered the phenotypes and diseases based on the similarity of the pattern profiles of shared chemicals (Figure 5).Figure 5.


A CTD-Pfizer collaboration: manual curation of 88,000 scientific articles text mined for drug-disease and drug-phenotype interactions.

Davis AP, Wiegers TC, Roberts PM, King BL, Lay JM, Lennon-Hopkins K, Sciaky D, Johnson R, Keating H, Greene N, Hernandez R, McConnell KJ, Enayetallah AE, Mattingly CJ - Database (Oxford) (2013)

CTD phenotypes inferred to diseases through shared chemicals. A matrix of 74 phenotypes (rows) by 750 diseases (columns) was constructed where each cell represented the number of shared chemicals. The matrix was analysed by two-dimensional hierarchical clustering and visualized as a heatmap where the normalized number of shared chemicals are colored (green = low; black = medium; red = high). The similarities among the number of shared chemicals for diseases across all phenotypes are shown in the dendrogram beneath the heatmap, where the lengths of the lines are inversely proportional to the similarity (i.e., short = highly similar, long = dissimilar). An enlargement (blue boxes, blue arrow) shows how the disease dendrogram was trimmed to select 18 disease clusters (dotted line, with clusters numbered), and these boundaries are also represented on the heatmap (numbered white boxes). Below, the number of unique phenotypes, chemicals and diseases are charted for each cluster. In pie charts at the very bottom, predominant disease classes for some of the clusters are shown (only the top four disease classes are graphed). For example, of the 19 diseases in cluster 1, 28% of them represent cancers, 13% digestive system diseases, 13% immune system diseases and 9% lymphatic diseases. To the right of the heatmap, the similarities among the number of shared chemicals for phenotypes across all diseases are also shown in another dendrogram, where the lengths of the lines are inversely proportional to the similarity.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3842776&req=5

bat080-F5: CTD phenotypes inferred to diseases through shared chemicals. A matrix of 74 phenotypes (rows) by 750 diseases (columns) was constructed where each cell represented the number of shared chemicals. The matrix was analysed by two-dimensional hierarchical clustering and visualized as a heatmap where the normalized number of shared chemicals are colored (green = low; black = medium; red = high). The similarities among the number of shared chemicals for diseases across all phenotypes are shown in the dendrogram beneath the heatmap, where the lengths of the lines are inversely proportional to the similarity (i.e., short = highly similar, long = dissimilar). An enlargement (blue boxes, blue arrow) shows how the disease dendrogram was trimmed to select 18 disease clusters (dotted line, with clusters numbered), and these boundaries are also represented on the heatmap (numbered white boxes). Below, the number of unique phenotypes, chemicals and diseases are charted for each cluster. In pie charts at the very bottom, predominant disease classes for some of the clusters are shown (only the top four disease classes are graphed). For example, of the 19 diseases in cluster 1, 28% of them represent cancers, 13% digestive system diseases, 13% immune system diseases and 9% lymphatic diseases. To the right of the heatmap, the similarities among the number of shared chemicals for phenotypes across all diseases are also shown in another dendrogram, where the lengths of the lines are inversely proportional to the similarity.
Mentions: CTD then used this phenotype–disease inference file to construct a two-dimensional matrix, where each intersecting cell represented the number of shared chemicals between the phenotype and disease. For this analysis, inferences between a phenotype and disease were required to share a minimum of 10 chemicals. This stringency reduced the matrix to 74 phenotypes and 750 diseases. Two-dimensional hierarchical clustering ordered the phenotypes and diseases based on the similarity of the pattern profiles of shared chemicals (Figure 5).Figure 5.

Bottom Line: This curation can be leveraged for information about toxic endpoints important to drug safety and help develop testable hypotheses for drug-disease events.The availability of these detailed, contextualized, high-quality annotations curated from seven decades' worth of the scientific literature should help facilitate new mechanistic screening assays for pharmaceutical compound survival.This unique partnership demonstrates the importance of resource sharing and collaboration between public and private entities and underscores the complementary needs of the environmental health science and pharmaceutical communities.

View Article: PubMed Central - PubMed

Affiliation: Department of Biological Sciences, 3510 Thomas Hall, North Carolina State University, Raleigh, NC 27695-7617, USA, Computational Sciences Center of Emphasis, 200 Cambridgepark Drive, Pfizer Inc., Cambridge, MA 02139, USA, Department of Bioinformatics, P.O. Box 35, Old Bar Harbor Road, MDI Biological Laboratory, Salisbury Cove, ME 04672, USA, Compound Safety Prediction, MS 8118-B3, Eastern Point Road, Pfizer Inc., Groton, CT 06340, USA, Computational Sciences Center of Emphasis, Pfizer Inc., Ramsgate Road, Sandwich, Kent CT13 9NJ, UK, Computational Sciences Center of Emphasis, 558 Eastern Point Road, Pfizer Inc., Groton, CT 06340, USA and Drug Safety Research and Development, 558 Eastern Point Road, Pfizer Inc., Groton, CT 06340, USA.

ABSTRACT
Improving the prediction of chemical toxicity is a goal common to both environmental health research and pharmaceutical drug development. To improve safety detection assays, it is critical to have a reference set of molecules with well-defined toxicity annotations for training and validation purposes. Here, we describe a collaboration between safety researchers at Pfizer and the research team at the Comparative Toxicogenomics Database (CTD) to text mine and manually review a collection of 88,629 articles relating over 1,200 pharmaceutical drugs to their potential involvement in cardiovascular, neurological, renal and hepatic toxicity. In 1 year, CTD biocurators curated 254,173 toxicogenomic interactions (152,173 chemical-disease, 58,572 chemical-gene, 5,345 gene-disease and 38,083 phenotype interactions). All chemical-gene-disease interactions are fully integrated with public CTD, and phenotype interactions can be downloaded. We describe Pfizer's text-mining process to collate the articles, and CTD's curation strategy, performance metrics, enhanced data content and new module to curate phenotype information. As well, we show how data integration can connect phenotypes to diseases. This curation can be leveraged for information about toxic endpoints important to drug safety and help develop testable hypotheses for drug-disease events. The availability of these detailed, contextualized, high-quality annotations curated from seven decades' worth of the scientific literature should help facilitate new mechanistic screening assays for pharmaceutical compound survival. This unique partnership demonstrates the importance of resource sharing and collaboration between public and private entities and underscores the complementary needs of the environmental health science and pharmaceutical communities. Database URL: http://ctdbase.org/

Show MeSH
Related in: MedlinePlus