Limits...
Ten years of pathway analysis: current approaches and outstanding challenges.

Khatri P, Sirota M, Butte AJ - PLoS Comput. Biol. (2012)

Bottom Line: Pathway analysis has become the first choice for gaining insight into the underlying biology of differentially expressed genes and proteins, as it reduces complexity and has increased explanatory power.We discuss the evolution of knowledge base-driven pathway analysis over its first decade, distinctly divided into three generations.Furthermore, we identify a number of methodological challenges that the next generation of methods must tackle to take advantage of the technological advances in genomics and proteomics in order to improve specificity, sensitivity, and relevance of pathway analysis.

View Article: PubMed Central - PubMed

Affiliation: Division of Systems Medicine, Department of Pediatrics, Stanford University School of Medicine, Stanford, California, United States of America. pkhatri@stanford.edu

ABSTRACT
Pathway analysis has become the first choice for gaining insight into the underlying biology of differentially expressed genes and proteins, as it reduces complexity and has increased explanatory power. We discuss the evolution of knowledge base-driven pathway analysis over its first decade, distinctly divided into three generations. We also discuss the limitations that are specific to each generation, and how they are addressed by successive generations of methods. We identify a number of annotation challenges that must be addressed to enable development of the next generation of pathway analysis methods. Furthermore, we identify a number of methodological challenges that the next generation of methods must tackle to take advantage of the technological advances in genomics and proteomics in order to improve specificity, sensitivity, and relevance of pathway analysis.

Show MeSH
Number of GO-annotated genes (left panel) and number of GO annotations (right panel) for human from January 2003 to November 2009.As the estimated number of known genes in the human genome is adjusted (between January 2003 and December 2003) and annotation practices are modified (between December 2004 and December 2005, and between October 2008 and November 2009), one can argue that, although the number of annotated genes and the annotations are decreasing (which is mainly due to the adjusted number of genes in the human genome and changes in the annotation process), the quality of annotations is improving, as demonstrated by the steady increase in non-IEA annotations and the number of genes with non-IEA annotations. However, the increase in the number of genes with non-IEA annotations is very slow. In almost 7 years, between January 2003 and November 2009, only 2,039 new genes received non-IEA annotations. At the same time, the number of non-IEA annotations increased from 35,925 to 65,741, indicating a strong research bias for a small number of genes.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3285573&req=5

pcbi-1002375-g003: Number of GO-annotated genes (left panel) and number of GO annotations (right panel) for human from January 2003 to November 2009.As the estimated number of known genes in the human genome is adjusted (between January 2003 and December 2003) and annotation practices are modified (between December 2004 and December 2005, and between October 2008 and November 2009), one can argue that, although the number of annotated genes and the annotations are decreasing (which is mainly due to the adjusted number of genes in the human genome and changes in the annotation process), the quality of annotations is improving, as demonstrated by the steady increase in non-IEA annotations and the number of genes with non-IEA annotations. However, the increase in the number of genes with non-IEA annotations is very slow. In almost 7 years, between January 2003 and November 2009, only 2,039 new genes received non-IEA annotations. At the same time, the number of non-IEA annotations increased from 35,925 to 65,741, indicating a strong research bias for a small number of genes.

Mentions: Despite the enormous number of annotations available in the public domain, a surprisingly large number of genes are still not annotated. For instance, the November 2009 release of GO contained entries for 18,587 human genes annotated with at least one GO term (Figure 3). Many of the genes are hypothetical, predicted, or pseudogenes. For example, although the number of protein-coding genes in the human genome is estimated to be between 20,000 and 25,000 [52], according to National Center for Biotechnology Information (NCBI) Entrez Gene, there are 45,283 human genes, of which 14,162 are pseudogenes (Table S4). One could argue that the pseudogenes should not be included when evaluating functional annotation coverage. However, pseudogene-derived small interfering RNAs have been shown to regulate gene expression in mouse oocytes [53]. Furthermore, GO provides annotations for 271 pseudogenes. A widely used DNA microarray, Affymetrix HG U133 plus 2.0, contains 1,026 probe sets that correspond to 823 pseudogenes. Based on these examples, we believe that the pseudogenes should be included in the count when estimating annotation coverage for the human genome.


Ten years of pathway analysis: current approaches and outstanding challenges.

Khatri P, Sirota M, Butte AJ - PLoS Comput. Biol. (2012)

Number of GO-annotated genes (left panel) and number of GO annotations (right panel) for human from January 2003 to November 2009.As the estimated number of known genes in the human genome is adjusted (between January 2003 and December 2003) and annotation practices are modified (between December 2004 and December 2005, and between October 2008 and November 2009), one can argue that, although the number of annotated genes and the annotations are decreasing (which is mainly due to the adjusted number of genes in the human genome and changes in the annotation process), the quality of annotations is improving, as demonstrated by the steady increase in non-IEA annotations and the number of genes with non-IEA annotations. However, the increase in the number of genes with non-IEA annotations is very slow. In almost 7 years, between January 2003 and November 2009, only 2,039 new genes received non-IEA annotations. At the same time, the number of non-IEA annotations increased from 35,925 to 65,741, indicating a strong research bias for a small number of genes.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3285573&req=5

pcbi-1002375-g003: Number of GO-annotated genes (left panel) and number of GO annotations (right panel) for human from January 2003 to November 2009.As the estimated number of known genes in the human genome is adjusted (between January 2003 and December 2003) and annotation practices are modified (between December 2004 and December 2005, and between October 2008 and November 2009), one can argue that, although the number of annotated genes and the annotations are decreasing (which is mainly due to the adjusted number of genes in the human genome and changes in the annotation process), the quality of annotations is improving, as demonstrated by the steady increase in non-IEA annotations and the number of genes with non-IEA annotations. However, the increase in the number of genes with non-IEA annotations is very slow. In almost 7 years, between January 2003 and November 2009, only 2,039 new genes received non-IEA annotations. At the same time, the number of non-IEA annotations increased from 35,925 to 65,741, indicating a strong research bias for a small number of genes.
Mentions: Despite the enormous number of annotations available in the public domain, a surprisingly large number of genes are still not annotated. For instance, the November 2009 release of GO contained entries for 18,587 human genes annotated with at least one GO term (Figure 3). Many of the genes are hypothetical, predicted, or pseudogenes. For example, although the number of protein-coding genes in the human genome is estimated to be between 20,000 and 25,000 [52], according to National Center for Biotechnology Information (NCBI) Entrez Gene, there are 45,283 human genes, of which 14,162 are pseudogenes (Table S4). One could argue that the pseudogenes should not be included when evaluating functional annotation coverage. However, pseudogene-derived small interfering RNAs have been shown to regulate gene expression in mouse oocytes [53]. Furthermore, GO provides annotations for 271 pseudogenes. A widely used DNA microarray, Affymetrix HG U133 plus 2.0, contains 1,026 probe sets that correspond to 823 pseudogenes. Based on these examples, we believe that the pseudogenes should be included in the count when estimating annotation coverage for the human genome.

Bottom Line: Pathway analysis has become the first choice for gaining insight into the underlying biology of differentially expressed genes and proteins, as it reduces complexity and has increased explanatory power.We discuss the evolution of knowledge base-driven pathway analysis over its first decade, distinctly divided into three generations.Furthermore, we identify a number of methodological challenges that the next generation of methods must tackle to take advantage of the technological advances in genomics and proteomics in order to improve specificity, sensitivity, and relevance of pathway analysis.

View Article: PubMed Central - PubMed

Affiliation: Division of Systems Medicine, Department of Pediatrics, Stanford University School of Medicine, Stanford, California, United States of America. pkhatri@stanford.edu

ABSTRACT
Pathway analysis has become the first choice for gaining insight into the underlying biology of differentially expressed genes and proteins, as it reduces complexity and has increased explanatory power. We discuss the evolution of knowledge base-driven pathway analysis over its first decade, distinctly divided into three generations. We also discuss the limitations that are specific to each generation, and how they are addressed by successive generations of methods. We identify a number of annotation challenges that must be addressed to enable development of the next generation of pathway analysis methods. Furthermore, we identify a number of methodological challenges that the next generation of methods must tackle to take advantage of the technological advances in genomics and proteomics in order to improve specificity, sensitivity, and relevance of pathway analysis.

Show MeSH