Limits...
CoIN: a network analysis for document triage.

Hsu YY, Kao HY - Database (Oxford) (2013)

Bottom Line: Under these circumstances, a system that can automatically determine in advance which article has a higher priority for curation can effectively reduce the workload of biocurators.Determining how to effectively find the articles required by biocurators has become an important task.The experimental results show that our network-based approach combined with co-occurrence features can effectively classify curatable and non-curatable articles.

View Article: PubMed Central - PubMed

Affiliation: Department of Computer Science and Information Engineering, National Cheng Kung University, No. 1, University Road, Tainan City 701, Taiwan, R.O.C. (Republic of China).

ABSTRACT
In recent years, there was a rapid increase in the number of medical articles. The number of articles in PubMed has increased exponentially. Thus, the workload for biocurators has also increased exponentially. Under these circumstances, a system that can automatically determine in advance which article has a higher priority for curation can effectively reduce the workload of biocurators. Determining how to effectively find the articles required by biocurators has become an important task. In the triage task of BioCreative 2012, we proposed the Co-occurrence Interaction Nexus (CoIN) for learning and exploring relations in articles. We constructed a co-occurrence analysis system, which is applicable to PubMed articles and suitable for gene, chemical and disease queries. CoIN uses co-occurrence features and their network centralities to assess the influence of curatable articles from the Comparative Toxicogenomics Database. The experimental results show that our network-based approach combined with co-occurrence features can effectively classify curatable and non-curatable articles. CoIN also allows biocurators to survey the ranking lists for specific queries without reviewing meaningless information. At BioCreative 2012, CoIN achieved a 0.778 mean average precision in the triage task, thus finishing in second place out of all participants. Database URL: http://ikmbio.csie.ncku.edu.tw/coin/home.php.

Show MeSH
Co-occurrence interaction network of urethane from the test set. Vertices are PubMed articles, and the size of each vertex is specified by the degree of the vertex. Blue vertices are curatable articles, whereas red vertices are non-curatable articles. Red, blue and green edges are established when two PubMed articles have the co-occurrences of gene–chemical, chemical–disease and gene–disease relationships, respectively.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3822784&req=5

bat076-F2: Co-occurrence interaction network of urethane from the test set. Vertices are PubMed articles, and the size of each vertex is specified by the degree of the vertex. Blue vertices are curatable articles, whereas red vertices are non-curatable articles. Red, blue and green edges are established when two PubMed articles have the co-occurrences of gene–chemical, chemical–disease and gene–disease relationships, respectively.

Mentions: For the convenience of biocurators, CoIN allows users to query genes, diseases and chemicals. As shown in Figure 1, CoIN uses AIIAGMT (32) to identify gene names and separate articles into sentences. Next, we train conditional random fields to predict chemical names in the articles, and the training patterns are extracted from the CTD. This statistical modeling method is frequently applied in pattern recognition. To tag disease names, CoIN uses a dictionary-based method to identify diseases, and the dictionary is also extracted from the CTD. After collecting the tagging names, CoIN calculates the co-occurrences of the tagging names for each sentence. Then, the co-occurrence network is constructed using the co-occurrences of gene–disease, gene–chemical and chemical–disease relationships, as shown in Figure 2. In the last stage of CoIN, the system provides the normalized co-occurrence score, the betweenness and the PageRank value for prioritizing PubMed articles. In the ‘Methods’ section, we introduce the normalized co-occurrence score, betweenness and PageRank.Figure 1.


CoIN: a network analysis for document triage.

Hsu YY, Kao HY - Database (Oxford) (2013)

Co-occurrence interaction network of urethane from the test set. Vertices are PubMed articles, and the size of each vertex is specified by the degree of the vertex. Blue vertices are curatable articles, whereas red vertices are non-curatable articles. Red, blue and green edges are established when two PubMed articles have the co-occurrences of gene–chemical, chemical–disease and gene–disease relationships, respectively.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3822784&req=5

bat076-F2: Co-occurrence interaction network of urethane from the test set. Vertices are PubMed articles, and the size of each vertex is specified by the degree of the vertex. Blue vertices are curatable articles, whereas red vertices are non-curatable articles. Red, blue and green edges are established when two PubMed articles have the co-occurrences of gene–chemical, chemical–disease and gene–disease relationships, respectively.
Mentions: For the convenience of biocurators, CoIN allows users to query genes, diseases and chemicals. As shown in Figure 1, CoIN uses AIIAGMT (32) to identify gene names and separate articles into sentences. Next, we train conditional random fields to predict chemical names in the articles, and the training patterns are extracted from the CTD. This statistical modeling method is frequently applied in pattern recognition. To tag disease names, CoIN uses a dictionary-based method to identify diseases, and the dictionary is also extracted from the CTD. After collecting the tagging names, CoIN calculates the co-occurrences of the tagging names for each sentence. Then, the co-occurrence network is constructed using the co-occurrences of gene–disease, gene–chemical and chemical–disease relationships, as shown in Figure 2. In the last stage of CoIN, the system provides the normalized co-occurrence score, the betweenness and the PageRank value for prioritizing PubMed articles. In the ‘Methods’ section, we introduce the normalized co-occurrence score, betweenness and PageRank.Figure 1.

Bottom Line: Under these circumstances, a system that can automatically determine in advance which article has a higher priority for curation can effectively reduce the workload of biocurators.Determining how to effectively find the articles required by biocurators has become an important task.The experimental results show that our network-based approach combined with co-occurrence features can effectively classify curatable and non-curatable articles.

View Article: PubMed Central - PubMed

Affiliation: Department of Computer Science and Information Engineering, National Cheng Kung University, No. 1, University Road, Tainan City 701, Taiwan, R.O.C. (Republic of China).

ABSTRACT
In recent years, there was a rapid increase in the number of medical articles. The number of articles in PubMed has increased exponentially. Thus, the workload for biocurators has also increased exponentially. Under these circumstances, a system that can automatically determine in advance which article has a higher priority for curation can effectively reduce the workload of biocurators. Determining how to effectively find the articles required by biocurators has become an important task. In the triage task of BioCreative 2012, we proposed the Co-occurrence Interaction Nexus (CoIN) for learning and exploring relations in articles. We constructed a co-occurrence analysis system, which is applicable to PubMed articles and suitable for gene, chemical and disease queries. CoIN uses co-occurrence features and their network centralities to assess the influence of curatable articles from the Comparative Toxicogenomics Database. The experimental results show that our network-based approach combined with co-occurrence features can effectively classify curatable and non-curatable articles. CoIN also allows biocurators to survey the ranking lists for specific queries without reviewing meaningless information. At BioCreative 2012, CoIN achieved a 0.778 mean average precision in the triage task, thus finishing in second place out of all participants. Database URL: http://ikmbio.csie.ncku.edu.tw/coin/home.php.

Show MeSH