Limits...
Comparing a knowledge-driven approach to a supervised machine learning approach in large-scale extraction of drug-side effect relationships from free-text biomedical literature.

Xu R, Wang Q - BMC Bioinformatics (2015)

Bottom Line: However, a comprehensive drug-SE association knowledge base does not exist.In this study, we present a novel knowledge-driven (KD) approach to effectively extract a large number of drug-SE pairs from published biomedical literature.On average, the KD approach has achieved a precision of 0.335, a recall of 0.509, and an F1 of 0.392, which is significantly better than a SVM-based machine learning approach (precision: 0.135, recall: 0.900, F1: 0.233) with a 73.0% increase in F1 score.

View Article: PubMed Central - HTML - PubMed

ABSTRACT

Background: Systems approaches to studying drug-side-effect (drug-SE) associations are emerging as an active research area for both drug target discovery and drug repositioning. However, a comprehensive drug-SE association knowledge base does not exist. In this study, we present a novel knowledge-driven (KD) approach to effectively extract a large number of drug-SE pairs from published biomedical literature.

Data and methods: For the text corpus, we used 21,354,075 MEDLINE records (119,085,682 sentences). First, we used known drug-SE associations derived from FDA drug labels as prior knowledge to automatically find SE-related sentences and abstracts. We then extracted a total of 49,575 drug-SE pairs from MEDLINE sentences and 180,454 pairs from abstracts.

Results: On average, the KD approach has achieved a precision of 0.335, a recall of 0.509, and an F1 of 0.392, which is significantly better than a SVM-based machine learning approach (precision: 0.135, recall: 0.900, F1: 0.233) with a 73.0% increase in F1 score. Through integrative analysis, we demonstrate that the higher-level phenotypic drug-SE relationships reflects lower-level genetic, genomic, and chemical drug mechanisms. In addition, we show that the extracted drug-SE pairs can be directly used in drug repositioning.

Conclusion: In summary, we automatically constructed a large-scale higher-level drug phenotype relationship knowledge, which can have great potential in computational drug discovery.

Show MeSH

Related in: MedlinePlus

The correlations between SEs and gene targets for drug-SE pairs from: SIDER, MEDLINE sentences ("KD_Sentence"), and abstracts("KD_Abstract").
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4402591&req=5

Figure 2: The correlations between SEs and gene targets for drug-SE pairs from: SIDER, MEDLINE sentences ("KD_Sentence"), and abstracts("KD_Abstract").

Mentions: We investigated whether drug-drug pairs that share SEs tended to share gene targets. As shown in Figure 2, there is positive correlation between SEs and gene targets, with the positive correlation being much stronger for drug-SE pairs extracted from MEDLINE sentences than those from SIDER or MEDLINE abstracts. For instance, the average number of shared gene targets for all drug-drug pairs is 0.492. The number significantly increased to 0.813 for drug-drug pairs that shared at least one SEs and to 3.161 for pairs that shared at least 100 SEs. This strong positive correlation indicates that we may use these extracted drug-SE pairs to discover novel drug targets or use drug-related gene targets to predict unknown drug side effects.


Comparing a knowledge-driven approach to a supervised machine learning approach in large-scale extraction of drug-side effect relationships from free-text biomedical literature.

Xu R, Wang Q - BMC Bioinformatics (2015)

The correlations between SEs and gene targets for drug-SE pairs from: SIDER, MEDLINE sentences ("KD_Sentence"), and abstracts("KD_Abstract").
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4402591&req=5

Figure 2: The correlations between SEs and gene targets for drug-SE pairs from: SIDER, MEDLINE sentences ("KD_Sentence"), and abstracts("KD_Abstract").
Mentions: We investigated whether drug-drug pairs that share SEs tended to share gene targets. As shown in Figure 2, there is positive correlation between SEs and gene targets, with the positive correlation being much stronger for drug-SE pairs extracted from MEDLINE sentences than those from SIDER or MEDLINE abstracts. For instance, the average number of shared gene targets for all drug-drug pairs is 0.492. The number significantly increased to 0.813 for drug-drug pairs that shared at least one SEs and to 3.161 for pairs that shared at least 100 SEs. This strong positive correlation indicates that we may use these extracted drug-SE pairs to discover novel drug targets or use drug-related gene targets to predict unknown drug side effects.

Bottom Line: However, a comprehensive drug-SE association knowledge base does not exist.In this study, we present a novel knowledge-driven (KD) approach to effectively extract a large number of drug-SE pairs from published biomedical literature.On average, the KD approach has achieved a precision of 0.335, a recall of 0.509, and an F1 of 0.392, which is significantly better than a SVM-based machine learning approach (precision: 0.135, recall: 0.900, F1: 0.233) with a 73.0% increase in F1 score.

View Article: PubMed Central - HTML - PubMed

ABSTRACT

Background: Systems approaches to studying drug-side-effect (drug-SE) associations are emerging as an active research area for both drug target discovery and drug repositioning. However, a comprehensive drug-SE association knowledge base does not exist. In this study, we present a novel knowledge-driven (KD) approach to effectively extract a large number of drug-SE pairs from published biomedical literature.

Data and methods: For the text corpus, we used 21,354,075 MEDLINE records (119,085,682 sentences). First, we used known drug-SE associations derived from FDA drug labels as prior knowledge to automatically find SE-related sentences and abstracts. We then extracted a total of 49,575 drug-SE pairs from MEDLINE sentences and 180,454 pairs from abstracts.

Results: On average, the KD approach has achieved a precision of 0.335, a recall of 0.509, and an F1 of 0.392, which is significantly better than a SVM-based machine learning approach (precision: 0.135, recall: 0.900, F1: 0.233) with a 73.0% increase in F1 score. Through integrative analysis, we demonstrate that the higher-level phenotypic drug-SE relationships reflects lower-level genetic, genomic, and chemical drug mechanisms. In addition, we show that the extracted drug-SE pairs can be directly used in drug repositioning.

Conclusion: In summary, we automatically constructed a large-scale higher-level drug phenotype relationship knowledge, which can have great potential in computational drug discovery.

Show MeSH
Related in: MedlinePlus