Limits...
CoIN: a network analysis for document triage.

Hsu YY, Kao HY - Database (Oxford) (2013)

Bottom Line: Under these circumstances, a system that can automatically determine in advance which article has a higher priority for curation can effectively reduce the workload of biocurators.Determining how to effectively find the articles required by biocurators has become an important task.The experimental results show that our network-based approach combined with co-occurrence features can effectively classify curatable and non-curatable articles.

View Article: PubMed Central - PubMed

Affiliation: Department of Computer Science and Information Engineering, National Cheng Kung University, No. 1, University Road, Tainan City 701, Taiwan, R.O.C. (Republic of China).

ABSTRACT
In recent years, there was a rapid increase in the number of medical articles. The number of articles in PubMed has increased exponentially. Thus, the workload for biocurators has also increased exponentially. Under these circumstances, a system that can automatically determine in advance which article has a higher priority for curation can effectively reduce the workload of biocurators. Determining how to effectively find the articles required by biocurators has become an important task. In the triage task of BioCreative 2012, we proposed the Co-occurrence Interaction Nexus (CoIN) for learning and exploring relations in articles. We constructed a co-occurrence analysis system, which is applicable to PubMed articles and suitable for gene, chemical and disease queries. CoIN uses co-occurrence features and their network centralities to assess the influence of curatable articles from the Comparative Toxicogenomics Database. The experimental results show that our network-based approach combined with co-occurrence features can effectively classify curatable and non-curatable articles. CoIN also allows biocurators to survey the ranking lists for specific queries without reviewing meaningless information. At BioCreative 2012, CoIN achieved a 0.778 mean average precision in the triage task, thus finishing in second place out of all participants. Database URL: http://ikmbio.csie.ncku.edu.tw/coin/home.php.

Show MeSH
A toy example of a co-occurrence interaction network. Pi is a PubMed article. If any sentence shares a co-occurrence of a chemical–disease, chemical–gene and gene–disease relationship between two articles, an edge is established.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3822784&req=5

bat076-F5: A toy example of a co-occurrence interaction network. Pi is a PubMed article. If any sentence shares a co-occurrence of a chemical–disease, chemical–gene and gene–disease relationship between two articles, an edge is established.

Mentions: After computing the PR value of the vertices in networks, we can consider that the vertices with a higher PR value have more influence than the vertices with a lower PR value. The co-occurrence network is constructed using the information of gene–disease, gene–chemical and chemical–disease co-occurrences, as shown in Figure 5. The co-occurrence network is derived from the linking structure of web pages, but the co-occurrence relationships are essentially different from the in- and out-links. In Figure 5, we use an undirected edge to represent the co-occurrence relationship between entities and this edge also represents a bidirectional edge when counting in- and out-degree of a node in the undirected network. Thus, the co-occurrence network is displayed as an undirected graph, where vertices represent PubMed articles and edges represent co-occurrence interactions between PubMed articles. Note that both the in- and out-links of vertex i and vertex j are increased by 1 if there is an edge between vertex i and vertex j. For example, P2 and P4 have the highest betweenness value [(P2) = (P4) = 3], while (P1) = (P3) = (P5) = 0. At the same time, P2 and P4 also have the highest PageRank value [PR(P2) = PR(P4) = 0.29], while PR(P3) = 0.19 and PR(P1) = PR(P5) = 0.11. The results of this toy example show that P2 and P4 are more important than the other nodes in the co-occurrence network.Figure 5.


CoIN: a network analysis for document triage.

Hsu YY, Kao HY - Database (Oxford) (2013)

A toy example of a co-occurrence interaction network. Pi is a PubMed article. If any sentence shares a co-occurrence of a chemical–disease, chemical–gene and gene–disease relationship between two articles, an edge is established.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3822784&req=5

bat076-F5: A toy example of a co-occurrence interaction network. Pi is a PubMed article. If any sentence shares a co-occurrence of a chemical–disease, chemical–gene and gene–disease relationship between two articles, an edge is established.
Mentions: After computing the PR value of the vertices in networks, we can consider that the vertices with a higher PR value have more influence than the vertices with a lower PR value. The co-occurrence network is constructed using the information of gene–disease, gene–chemical and chemical–disease co-occurrences, as shown in Figure 5. The co-occurrence network is derived from the linking structure of web pages, but the co-occurrence relationships are essentially different from the in- and out-links. In Figure 5, we use an undirected edge to represent the co-occurrence relationship between entities and this edge also represents a bidirectional edge when counting in- and out-degree of a node in the undirected network. Thus, the co-occurrence network is displayed as an undirected graph, where vertices represent PubMed articles and edges represent co-occurrence interactions between PubMed articles. Note that both the in- and out-links of vertex i and vertex j are increased by 1 if there is an edge between vertex i and vertex j. For example, P2 and P4 have the highest betweenness value [(P2) = (P4) = 3], while (P1) = (P3) = (P5) = 0. At the same time, P2 and P4 also have the highest PageRank value [PR(P2) = PR(P4) = 0.29], while PR(P3) = 0.19 and PR(P1) = PR(P5) = 0.11. The results of this toy example show that P2 and P4 are more important than the other nodes in the co-occurrence network.Figure 5.

Bottom Line: Under these circumstances, a system that can automatically determine in advance which article has a higher priority for curation can effectively reduce the workload of biocurators.Determining how to effectively find the articles required by biocurators has become an important task.The experimental results show that our network-based approach combined with co-occurrence features can effectively classify curatable and non-curatable articles.

View Article: PubMed Central - PubMed

Affiliation: Department of Computer Science and Information Engineering, National Cheng Kung University, No. 1, University Road, Tainan City 701, Taiwan, R.O.C. (Republic of China).

ABSTRACT
In recent years, there was a rapid increase in the number of medical articles. The number of articles in PubMed has increased exponentially. Thus, the workload for biocurators has also increased exponentially. Under these circumstances, a system that can automatically determine in advance which article has a higher priority for curation can effectively reduce the workload of biocurators. Determining how to effectively find the articles required by biocurators has become an important task. In the triage task of BioCreative 2012, we proposed the Co-occurrence Interaction Nexus (CoIN) for learning and exploring relations in articles. We constructed a co-occurrence analysis system, which is applicable to PubMed articles and suitable for gene, chemical and disease queries. CoIN uses co-occurrence features and their network centralities to assess the influence of curatable articles from the Comparative Toxicogenomics Database. The experimental results show that our network-based approach combined with co-occurrence features can effectively classify curatable and non-curatable articles. CoIN also allows biocurators to survey the ranking lists for specific queries without reviewing meaningless information. At BioCreative 2012, CoIN achieved a 0.778 mean average precision in the triage task, thus finishing in second place out of all participants. Database URL: http://ikmbio.csie.ncku.edu.tw/coin/home.php.

Show MeSH