Limits...
Comparing a knowledge-driven approach to a supervised machine learning approach in large-scale extraction of drug-side effect relationships from free-text biomedical literature.

Xu R, Wang Q - BMC Bioinformatics (2015)

Bottom Line: However, a comprehensive drug-SE association knowledge base does not exist.In this study, we present a novel knowledge-driven (KD) approach to effectively extract a large number of drug-SE pairs from published biomedical literature.On average, the KD approach has achieved a precision of 0.335, a recall of 0.509, and an F1 of 0.392, which is significantly better than a SVM-based machine learning approach (precision: 0.135, recall: 0.900, F1: 0.233) with a 73.0% increase in F1 score.

View Article: PubMed Central - HTML - PubMed

ABSTRACT

Background: Systems approaches to studying drug-side-effect (drug-SE) associations are emerging as an active research area for both drug target discovery and drug repositioning. However, a comprehensive drug-SE association knowledge base does not exist. In this study, we present a novel knowledge-driven (KD) approach to effectively extract a large number of drug-SE pairs from published biomedical literature.

Data and methods: For the text corpus, we used 21,354,075 MEDLINE records (119,085,682 sentences). First, we used known drug-SE associations derived from FDA drug labels as prior knowledge to automatically find SE-related sentences and abstracts. We then extracted a total of 49,575 drug-SE pairs from MEDLINE sentences and 180,454 pairs from abstracts.

Results: On average, the KD approach has achieved a precision of 0.335, a recall of 0.509, and an F1 of 0.392, which is significantly better than a SVM-based machine learning approach (precision: 0.135, recall: 0.900, F1: 0.233) with a 73.0% increase in F1 score. Through integrative analysis, we demonstrate that the higher-level phenotypic drug-SE relationships reflects lower-level genetic, genomic, and chemical drug mechanisms. In addition, we show that the extracted drug-SE pairs can be directly used in drug repositioning.

Conclusion: In summary, we automatically constructed a large-scale higher-level drug phenotype relationship knowledge, which can have great potential in computational drug discovery.

Show MeSH

Related in: MedlinePlus

The correlation between shared SEs and chemical similarity: "Database", "Experimental" and "Similarity".
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4402591&req=5

Figure 4: The correlation between shared SEs and chemical similarity: "Database", "Experimental" and "Similarity".

Mentions: Genetic, genomic and structural chemical-chemical relationships have been widely used for both drug target discovery and drug repositioning. We investigated whether the phenotypic side effect similarity between drugs captured chemical similarities as measured by chemical structure, gene co-expression and pathway interactions. We used chemical-chemical relationships from curated pathway database ("Database"), chemical 2D structure ("Similarity") and gene expression ("Experimental") in our study. As shown in Figure 4, drug-drug pairs that share SEs tend to share common pathway and 2D chemical structure, but not gene co-expression profiles. For instance, the average number of chemical similarity score based on "Database" for all drug-drug combination is 7.898; the number significantly increased to 12.06 for drug-drug pairs that share at least 10 SEs and to 27.795 for pairs that share at least 50 SEs. The correlation curve for chemical structure-based chemical relationships is similar but less prominent. There is no obvious correlation between gene expression-based drug similarity and drug side effects. In summary, high-level phenotypic relationships among drugs as determined by shared side effects indeed reflect drug relationships at genetic and chemical levels. Hence, systematic approaches in studying these higher-level phenotypic drug relationships can reveal insights into drug molecular mechanisms and offer opportunities for drug target discovery and drug repositioning.


Comparing a knowledge-driven approach to a supervised machine learning approach in large-scale extraction of drug-side effect relationships from free-text biomedical literature.

Xu R, Wang Q - BMC Bioinformatics (2015)

The correlation between shared SEs and chemical similarity: "Database", "Experimental" and "Similarity".
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4402591&req=5

Figure 4: The correlation between shared SEs and chemical similarity: "Database", "Experimental" and "Similarity".
Mentions: Genetic, genomic and structural chemical-chemical relationships have been widely used for both drug target discovery and drug repositioning. We investigated whether the phenotypic side effect similarity between drugs captured chemical similarities as measured by chemical structure, gene co-expression and pathway interactions. We used chemical-chemical relationships from curated pathway database ("Database"), chemical 2D structure ("Similarity") and gene expression ("Experimental") in our study. As shown in Figure 4, drug-drug pairs that share SEs tend to share common pathway and 2D chemical structure, but not gene co-expression profiles. For instance, the average number of chemical similarity score based on "Database" for all drug-drug combination is 7.898; the number significantly increased to 12.06 for drug-drug pairs that share at least 10 SEs and to 27.795 for pairs that share at least 50 SEs. The correlation curve for chemical structure-based chemical relationships is similar but less prominent. There is no obvious correlation between gene expression-based drug similarity and drug side effects. In summary, high-level phenotypic relationships among drugs as determined by shared side effects indeed reflect drug relationships at genetic and chemical levels. Hence, systematic approaches in studying these higher-level phenotypic drug relationships can reveal insights into drug molecular mechanisms and offer opportunities for drug target discovery and drug repositioning.

Bottom Line: However, a comprehensive drug-SE association knowledge base does not exist.In this study, we present a novel knowledge-driven (KD) approach to effectively extract a large number of drug-SE pairs from published biomedical literature.On average, the KD approach has achieved a precision of 0.335, a recall of 0.509, and an F1 of 0.392, which is significantly better than a SVM-based machine learning approach (precision: 0.135, recall: 0.900, F1: 0.233) with a 73.0% increase in F1 score.

View Article: PubMed Central - HTML - PubMed

ABSTRACT

Background: Systems approaches to studying drug-side-effect (drug-SE) associations are emerging as an active research area for both drug target discovery and drug repositioning. However, a comprehensive drug-SE association knowledge base does not exist. In this study, we present a novel knowledge-driven (KD) approach to effectively extract a large number of drug-SE pairs from published biomedical literature.

Data and methods: For the text corpus, we used 21,354,075 MEDLINE records (119,085,682 sentences). First, we used known drug-SE associations derived from FDA drug labels as prior knowledge to automatically find SE-related sentences and abstracts. We then extracted a total of 49,575 drug-SE pairs from MEDLINE sentences and 180,454 pairs from abstracts.

Results: On average, the KD approach has achieved a precision of 0.335, a recall of 0.509, and an F1 of 0.392, which is significantly better than a SVM-based machine learning approach (precision: 0.135, recall: 0.900, F1: 0.233) with a 73.0% increase in F1 score. Through integrative analysis, we demonstrate that the higher-level phenotypic drug-SE relationships reflects lower-level genetic, genomic, and chemical drug mechanisms. In addition, we show that the extracted drug-SE pairs can be directly used in drug repositioning.

Conclusion: In summary, we automatically constructed a large-scale higher-level drug phenotype relationship knowledge, which can have great potential in computational drug discovery.

Show MeSH
Related in: MedlinePlus