Limits...
A biomedically enriched collection of 7000 human ORF clones.

Rolfs A, Hu Y, Ebert L, Hoffmann D, Zuo D, Ramachandran N, Raphael J, Kelley F, McCarron S, Jepson DA, Shen B, Baqui MM, Pearlberg J, Taycher E, DeLoughery C, Hoerlein A, Korn B, LaBaer J - PLoS ONE (2008)

Bottom Line: Second, clones were selected to represent the best available GenBank reference sequence.The target gene list was compared with 4000 human diseases and over 8500 biological and chemical MeSH classes in approximately 15 Million publications recorded in PubMed at the time of analysis.The outcome of this analysis revealed that relative to the genome and the MGC collection, this collection is enriched for the presence of genes with published associations with a wide range of diseases and biomedical terms without displaying a particular bias towards any single disease or concept.

View Article: PubMed Central - PubMed

Affiliation: Harvard Institute of Proteomics, Harvard Medical School, Cambridge, Massachusetts, USA.

ABSTRACT
We report the production and availability of over 7000 fully sequence verified plasmid ORF clones representing over 3400 unique human genes. These ORF clones were derived using the human MGC collection as template and were produced in two formats: with and without stop codons. Thus, this collection supports the production of either native protein or proteins with fusion tags added to either or both ends. The template clones used to generate this collection were enriched in three ways. First, gene redundancy was removed. Second, clones were selected to represent the best available GenBank reference sequence. Finally, a literature-based software tool was used to evaluate the list of target genes to ensure that it broadly reflected biomedical research interests. The target gene list was compared with 4000 human diseases and over 8500 biological and chemical MeSH classes in approximately 15 Million publications recorded in PubMed at the time of analysis. The outcome of this analysis revealed that relative to the genome and the MGC collection, this collection is enriched for the presence of genes with published associations with a wide range of diseases and biomedical terms without displaying a particular bias towards any single disease or concept. Thus, this collection is likely to be a powerful resource for researchers who wish to study protein function in a set of genes with documented biomedical significance.

Show MeSH

Related in: MedlinePlus

Genes associated with Disease Classes (MeSH) in Publications.The clone target list (HFLEX7000) was compared with all human genes (EntrezGene, 2004) and all genes represented by MGC (2004) with respect to published relationships of the genes to human diseases. The targeted genes reveal similar proportionality to the other gene lists but a general enrichment of genes related to diseases (Table 1; Supplementary Table S2).
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC2211400&req=5

pone-0001528-g001: Genes associated with Disease Classes (MeSH) in Publications.The clone target list (HFLEX7000) was compared with all human genes (EntrezGene, 2004) and all genes represented by MGC (2004) with respect to published relationships of the genes to human diseases. The targeted genes reveal similar proportionality to the other gene lists but a general enrichment of genes related to diseases (Table 1; Supplementary Table S2).

Mentions: Our discussions with researchers indicated that a focused set of genes in both formats (closed and fusion) would be of more value that a large set in only one format. To ensure that our final gene set (Supplementary Table S1) was enriched for genes related to human diseases without any specific bias, the candidate list was used to query MedGene [11] for genes associated with about 4000 human diseases. As described, MedGene is an automated literature-mining tool, which comprehensively summarizes and estimates the relative strengths of all human gene-disease relationships reported in Medline/PubMed. The result of this query was compared with queries using either all unique genes represented in MGC or all ∼33,000 human genes listed at the time in LocusLink (2004, now: EntrezGene [15]). As shown for a subset of diseases in Figure 1; Table 1 (complete list: Supplementary Table S2), the resulting target list: (a) was highly enriched for the presence of genes with published associations with a wide range of human diseases; (b) had a similar relative ratio among the various diseases to that of both the genome and the MGC; and (c) displayed a broad overlap among different diseases allowing multiple diseases to be addressed with this set of ORF clones.


A biomedically enriched collection of 7000 human ORF clones.

Rolfs A, Hu Y, Ebert L, Hoffmann D, Zuo D, Ramachandran N, Raphael J, Kelley F, McCarron S, Jepson DA, Shen B, Baqui MM, Pearlberg J, Taycher E, DeLoughery C, Hoerlein A, Korn B, LaBaer J - PLoS ONE (2008)

Genes associated with Disease Classes (MeSH) in Publications.The clone target list (HFLEX7000) was compared with all human genes (EntrezGene, 2004) and all genes represented by MGC (2004) with respect to published relationships of the genes to human diseases. The targeted genes reveal similar proportionality to the other gene lists but a general enrichment of genes related to diseases (Table 1; Supplementary Table S2).
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC2211400&req=5

pone-0001528-g001: Genes associated with Disease Classes (MeSH) in Publications.The clone target list (HFLEX7000) was compared with all human genes (EntrezGene, 2004) and all genes represented by MGC (2004) with respect to published relationships of the genes to human diseases. The targeted genes reveal similar proportionality to the other gene lists but a general enrichment of genes related to diseases (Table 1; Supplementary Table S2).
Mentions: Our discussions with researchers indicated that a focused set of genes in both formats (closed and fusion) would be of more value that a large set in only one format. To ensure that our final gene set (Supplementary Table S1) was enriched for genes related to human diseases without any specific bias, the candidate list was used to query MedGene [11] for genes associated with about 4000 human diseases. As described, MedGene is an automated literature-mining tool, which comprehensively summarizes and estimates the relative strengths of all human gene-disease relationships reported in Medline/PubMed. The result of this query was compared with queries using either all unique genes represented in MGC or all ∼33,000 human genes listed at the time in LocusLink (2004, now: EntrezGene [15]). As shown for a subset of diseases in Figure 1; Table 1 (complete list: Supplementary Table S2), the resulting target list: (a) was highly enriched for the presence of genes with published associations with a wide range of human diseases; (b) had a similar relative ratio among the various diseases to that of both the genome and the MGC; and (c) displayed a broad overlap among different diseases allowing multiple diseases to be addressed with this set of ORF clones.

Bottom Line: Second, clones were selected to represent the best available GenBank reference sequence.The target gene list was compared with 4000 human diseases and over 8500 biological and chemical MeSH classes in approximately 15 Million publications recorded in PubMed at the time of analysis.The outcome of this analysis revealed that relative to the genome and the MGC collection, this collection is enriched for the presence of genes with published associations with a wide range of diseases and biomedical terms without displaying a particular bias towards any single disease or concept.

View Article: PubMed Central - PubMed

Affiliation: Harvard Institute of Proteomics, Harvard Medical School, Cambridge, Massachusetts, USA.

ABSTRACT
We report the production and availability of over 7000 fully sequence verified plasmid ORF clones representing over 3400 unique human genes. These ORF clones were derived using the human MGC collection as template and were produced in two formats: with and without stop codons. Thus, this collection supports the production of either native protein or proteins with fusion tags added to either or both ends. The template clones used to generate this collection were enriched in three ways. First, gene redundancy was removed. Second, clones were selected to represent the best available GenBank reference sequence. Finally, a literature-based software tool was used to evaluate the list of target genes to ensure that it broadly reflected biomedical research interests. The target gene list was compared with 4000 human diseases and over 8500 biological and chemical MeSH classes in approximately 15 Million publications recorded in PubMed at the time of analysis. The outcome of this analysis revealed that relative to the genome and the MGC collection, this collection is enriched for the presence of genes with published associations with a wide range of diseases and biomedical terms without displaying a particular bias towards any single disease or concept. Thus, this collection is likely to be a powerful resource for researchers who wish to study protein function in a set of genes with documented biomedical significance.

Show MeSH
Related in: MedlinePlus