Limits...
Using unsupervised patterns to extract gene regulation relationships for network construction.

Tang YT, Li SJ, Kao HY, Tsai SJ, Wang HC - PLoS ONE (2011)

Bottom Line: The high scalability and low maintenance cost of the unsupervised patterns could help our system to extract gene expression from PubMed abstracts more precisely and effectively.Experiments on several regulators show reasonable precision and recall rates which validate AutoPat's practical applicability.The conducted regulation networks could also be built precisely and effectively.

View Article: PubMed Central - PubMed

Affiliation: Department of Computer Science and Information Engineering, National Cheng Kung Tainan, Taiwan, Republic of China. p7895125@mail.ncku.edu.tw

ABSTRACT

Background: The gene expression is usually described in the literature as a transcription factor X that regulates the target gene Y. Previously, some studies discovered gene regulations by using information from the biomedical literature and most of them require effort of human annotators to build the training dataset. Moreover, the large amount of textual knowledge recorded in the biomedical literature grows very rapidly, and the creation of manual patterns from literatures becomes more difficult. There is an increasing need to automate the process of establishing patterns.

Methodology/principal findings: In this article, we describe an unsupervised pattern generation method called AutoPat. It is a gene expression mining system that can generate unsupervised patterns automatically from a given set of seed patterns. The high scalability and low maintenance cost of the unsupervised patterns could help our system to extract gene expression from PubMed abstracts more precisely and effectively.

Conclusions/significance: Experiments on several regulators show reasonable precision and recall rates which validate AutoPat's practical applicability. The conducted regulation networks could also be built precisely and effectively. The system in this study is available at http://ikmbio.csie.ncku.edu.tw/AutoPat/.

Show MeSH
The frequency distribution of unsupervised patterns.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC3091867&req=5

pone-0019633-g003: The frequency distribution of unsupervised patterns.

Mentions: In addition, we use the extracted results to verify the relationship between correct sentences and highly frequent unsupervised patterns and to determine the threshold of unsupervised patterns for the following experiments. The frequency distribution of the unsupervised patterns is therefore also evaluated. The result is shown in Figure 3. The frequencies of the patterns are normalized by dividing by the maximal frequency, 1,510. This result shows that even though the incorrect sentences match the patterns, the patterns they match have lower frequencies. In addition, the correctness of extraction result is very important to biologists because thousands of genes may have associations with each other but not specific gene regulation relationships. Therefore, we consider not only the higher F score but also the higher extraction precision. We calculated the precision rates of the unsupervised patterns under different thresholds of the frequencies and the result is illustrated in Figure 4. When the threshold is raised to the value 700, which has a normalized value of 0.464, the precision rate can be increased to 100% while lots of False Positive (FP) cases are then filtered from extraction result. Moreover, the goal of our system is to construct a gene regulation network from literature. We also observed that many TGs in result sentences that are extracted by low-frequency patterns can also be found in sentences extracted by high-frequency patterns. Therefore, due to the merit of the high precision rate and the reduction of computation cost, the threshold of unsupervised patterns is set to value 700 for the following experiments.


Using unsupervised patterns to extract gene regulation relationships for network construction.

Tang YT, Li SJ, Kao HY, Tsai SJ, Wang HC - PLoS ONE (2011)

The frequency distribution of unsupervised patterns.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC3091867&req=5

pone-0019633-g003: The frequency distribution of unsupervised patterns.
Mentions: In addition, we use the extracted results to verify the relationship between correct sentences and highly frequent unsupervised patterns and to determine the threshold of unsupervised patterns for the following experiments. The frequency distribution of the unsupervised patterns is therefore also evaluated. The result is shown in Figure 3. The frequencies of the patterns are normalized by dividing by the maximal frequency, 1,510. This result shows that even though the incorrect sentences match the patterns, the patterns they match have lower frequencies. In addition, the correctness of extraction result is very important to biologists because thousands of genes may have associations with each other but not specific gene regulation relationships. Therefore, we consider not only the higher F score but also the higher extraction precision. We calculated the precision rates of the unsupervised patterns under different thresholds of the frequencies and the result is illustrated in Figure 4. When the threshold is raised to the value 700, which has a normalized value of 0.464, the precision rate can be increased to 100% while lots of False Positive (FP) cases are then filtered from extraction result. Moreover, the goal of our system is to construct a gene regulation network from literature. We also observed that many TGs in result sentences that are extracted by low-frequency patterns can also be found in sentences extracted by high-frequency patterns. Therefore, due to the merit of the high precision rate and the reduction of computation cost, the threshold of unsupervised patterns is set to value 700 for the following experiments.

Bottom Line: The high scalability and low maintenance cost of the unsupervised patterns could help our system to extract gene expression from PubMed abstracts more precisely and effectively.Experiments on several regulators show reasonable precision and recall rates which validate AutoPat's practical applicability.The conducted regulation networks could also be built precisely and effectively.

View Article: PubMed Central - PubMed

Affiliation: Department of Computer Science and Information Engineering, National Cheng Kung Tainan, Taiwan, Republic of China. p7895125@mail.ncku.edu.tw

ABSTRACT

Background: The gene expression is usually described in the literature as a transcription factor X that regulates the target gene Y. Previously, some studies discovered gene regulations by using information from the biomedical literature and most of them require effort of human annotators to build the training dataset. Moreover, the large amount of textual knowledge recorded in the biomedical literature grows very rapidly, and the creation of manual patterns from literatures becomes more difficult. There is an increasing need to automate the process of establishing patterns.

Methodology/principal findings: In this article, we describe an unsupervised pattern generation method called AutoPat. It is a gene expression mining system that can generate unsupervised patterns automatically from a given set of seed patterns. The high scalability and low maintenance cost of the unsupervised patterns could help our system to extract gene expression from PubMed abstracts more precisely and effectively.

Conclusions/significance: Experiments on several regulators show reasonable precision and recall rates which validate AutoPat's practical applicability. The conducted regulation networks could also be built precisely and effectively. The system in this study is available at http://ikmbio.csie.ncku.edu.tw/AutoPat/.

Show MeSH