Limits...
Molecular profiling of thyroid cancer subtypes using large-scale text mining.

Wu C, Schwartz JM, Brabant G, Nenadic G - BMC Med Genomics (2014)

Bottom Line: It is classified into multiple histopathological subtypes with potentially distinct molecular mechanisms.We evaluated the classification method on a gold standard derived from the PubMed Supplementary Concept annotations, achieving a micro-average F1-score of 85.9% for primary subtypes.An integration of subtype context can allow prioritized screening for diagnostic biomarkers and novel molecular targeted therapeutics.

View Article: PubMed Central - HTML - PubMed

ABSTRACT

Background: Thyroid cancer is the most common endocrine tumor with a steady increase in incidence. It is classified into multiple histopathological subtypes with potentially distinct molecular mechanisms. Identifying the most relevant genes and biological pathways reported in the thyroid cancer literature is vital for understanding of the disease and developing targeted therapeutics.

Results: We developed a large-scale text mining system to generate a molecular profiling of thyroid cancer subtypes. The system first uses a subtype classification method for the thyroid cancer literature, which employs a scoring scheme to assign different subtypes to articles. We evaluated the classification method on a gold standard derived from the PubMed Supplementary Concept annotations, achieving a micro-average F1-score of 85.9% for primary subtypes. We then used the subtype classification results to extract genes and pathways associated with different thyroid cancer subtypes and successfully unveiled important genes and pathways, including some instances that are missing from current manually annotated databases or most recent review articles.

Conclusions: Identification of key genes and pathways plays a central role in understanding the molecular biology of thyroid cancer. An integration of subtype context can allow prioritized screening for diagnostic biomarkers and novel molecular targeted therapeutics. Source code used for this study is made freely available online at https://github.com/chengkun-wu/GenesThyCan.

Show MeSH

Related in: MedlinePlus

Number of PubMed articles returned by the 'thyroid cancer' query. The size of thyroid cancer related literature is increasing rapidly, with over 2,000 articles published annually in recent years.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4290788&req=5

Figure 1: Number of PubMed articles returned by the 'thyroid cancer' query. The size of thyroid cancer related literature is increasing rapidly, with over 2,000 articles published annually in recent years.

Mentions: For systematic studies, a major challenge is to efficiently utilise the myriad of knowledge and information from unstructured scientific literature. PubMed, one of the most widely used systems for biomedical literature search [11], returns over 50,000 results with the search query 'thyroid cancer'. The number is increasing rapidly, with over 2,000 articles published annually in recent years, as illustrated in Figure 1. This trend has made it extremely difficult for scientists to identify, retrieve and assimilate all relevant publications.


Molecular profiling of thyroid cancer subtypes using large-scale text mining.

Wu C, Schwartz JM, Brabant G, Nenadic G - BMC Med Genomics (2014)

Number of PubMed articles returned by the 'thyroid cancer' query. The size of thyroid cancer related literature is increasing rapidly, with over 2,000 articles published annually in recent years.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4290788&req=5

Figure 1: Number of PubMed articles returned by the 'thyroid cancer' query. The size of thyroid cancer related literature is increasing rapidly, with over 2,000 articles published annually in recent years.
Mentions: For systematic studies, a major challenge is to efficiently utilise the myriad of knowledge and information from unstructured scientific literature. PubMed, one of the most widely used systems for biomedical literature search [11], returns over 50,000 results with the search query 'thyroid cancer'. The number is increasing rapidly, with over 2,000 articles published annually in recent years, as illustrated in Figure 1. This trend has made it extremely difficult for scientists to identify, retrieve and assimilate all relevant publications.

Bottom Line: It is classified into multiple histopathological subtypes with potentially distinct molecular mechanisms.We evaluated the classification method on a gold standard derived from the PubMed Supplementary Concept annotations, achieving a micro-average F1-score of 85.9% for primary subtypes.An integration of subtype context can allow prioritized screening for diagnostic biomarkers and novel molecular targeted therapeutics.

View Article: PubMed Central - HTML - PubMed

ABSTRACT

Background: Thyroid cancer is the most common endocrine tumor with a steady increase in incidence. It is classified into multiple histopathological subtypes with potentially distinct molecular mechanisms. Identifying the most relevant genes and biological pathways reported in the thyroid cancer literature is vital for understanding of the disease and developing targeted therapeutics.

Results: We developed a large-scale text mining system to generate a molecular profiling of thyroid cancer subtypes. The system first uses a subtype classification method for the thyroid cancer literature, which employs a scoring scheme to assign different subtypes to articles. We evaluated the classification method on a gold standard derived from the PubMed Supplementary Concept annotations, achieving a micro-average F1-score of 85.9% for primary subtypes. We then used the subtype classification results to extract genes and pathways associated with different thyroid cancer subtypes and successfully unveiled important genes and pathways, including some instances that are missing from current manually annotated databases or most recent review articles.

Conclusions: Identification of key genes and pathways plays a central role in understanding the molecular biology of thyroid cancer. An integration of subtype context can allow prioritized screening for diagnostic biomarkers and novel molecular targeted therapeutics. Source code used for this study is made freely available online at https://github.com/chengkun-wu/GenesThyCan.

Show MeSH
Related in: MedlinePlus