Limits...
Evidence-based annotation of the malaria parasite's genome using comparative expression profiling.

Zhou Y, Ramachandran V, Kumar KA, Westenberger S, Refour P, Zhou B, Li F, Young JA, Chen K, Plouffe D, Henson K, Nussenzweig V, Carlton J, Vinetz JM, Duraisingh MT, Winzeler EA - PLoS ONE (2008)

Bottom Line: Here we perform comparative expression analysis on Plasmodium parasite life cycle data derived from P. falciparum blood, sporozoite, zygote and ookinete stages, and P. yoelii mosquito oocyst and salivary gland sporozoites, blood and liver stages and show that type II fatty acid biosynthesis genes are upregulated in liver and insect stages relative to asexual blood stages.We also show that some universally uncharacterized genes with orthologs in Plasmodium species, Saccharomyces cerevisiae and humans show coordinated transcription patterns in large collections of human and yeast expression data and that the function of the uncharacterized genes can sometimes be predicted based on the expression patterns across these diverse organisms.We also use a comprehensive and unbiased literature mining method to predict which uncharacterized parasite-specific genes are likely to have roles in processes such as gliding motility, host-cell interactions, sporozoite stage, or rhoptry function.

View Article: PubMed Central - PubMed

Affiliation: Genomics Institute of the Novartis Research Foundation, San Diego, California, USA.

ABSTRACT
A fundamental problem in systems biology and whole genome sequence analysis is how to infer functions for the many uncharacterized proteins that are identified, whether they are conserved across organisms of different phyla or are phylum-specific. This problem is especially acute in pathogens, such as malaria parasites, where genetic and biochemical investigations are likely to be more difficult. Here we perform comparative expression analysis on Plasmodium parasite life cycle data derived from P. falciparum blood, sporozoite, zygote and ookinete stages, and P. yoelii mosquito oocyst and salivary gland sporozoites, blood and liver stages and show that type II fatty acid biosynthesis genes are upregulated in liver and insect stages relative to asexual blood stages. We also show that some universally uncharacterized genes with orthologs in Plasmodium species, Saccharomyces cerevisiae and humans show coordinated transcription patterns in large collections of human and yeast expression data and that the function of the uncharacterized genes can sometimes be predicted based on the expression patterns across these diverse organisms. We also use a comprehensive and unbiased literature mining method to predict which uncharacterized parasite-specific genes are likely to have roles in processes such as gliding motility, host-cell interactions, sporozoite stage, or rhoptry function. These analyses, together with protein-protein interaction data, provide probabilistic models that predict the function of 926 uncharacterized malaria genes and also suggest that malaria parasites may provide a simple model system for the study of some human processes. These data also provide a foundation for further studies of transcriptional regulation in malaria parasites.

Show MeSH

Related in: MedlinePlus

An automatic pipeline for malaria literature mining.Approach A, full text search by literature search engines: A1) All P. falciparum and P. yoelii locus names were downloaded from PlasmoDB and searched against Google Scholar and SCIRUS one at a time; A2) URL hits were then mapped to PubMed entries. Approach B, NCBI database mining: B1) Mapping between GenBank sequence entries and PubMed entries were systematically retrieved from NCBI for four Plasmodium species; B2) Sequences were mapped to malaria locus names by BLAST alignment. The pipeline resulted in 6,428 functional associations between 3,262 malaria proteins and 1,278 PubMed papers.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC2215772&req=5

pone-0001570-g002: An automatic pipeline for malaria literature mining.Approach A, full text search by literature search engines: A1) All P. falciparum and P. yoelii locus names were downloaded from PlasmoDB and searched against Google Scholar and SCIRUS one at a time; A2) URL hits were then mapped to PubMed entries. Approach B, NCBI database mining: B1) Mapping between GenBank sequence entries and PubMed entries were systematically retrieved from NCBI for four Plasmodium species; B2) Sequences were mapped to malaria locus names by BLAST alignment. The pipeline resulted in 6,428 functional associations between 3,262 malaria proteins and 1,278 PubMed papers.

Mentions: Since malaria parasites have not traditionally served as model experimental systems, genome curation has mostly relied on transferring GO annotations from other model organisms via ortholog mapping. Unfortunately, this evidence-based annotation scheme is not applicable to the many parasite-specific processes that do not exist in humans or yeast. However, over the past few decades, the malaria community has investigated many Plasmodium-specific biological processes. In other organisms, high quality functional annotations have been assembled through automated and manual literature mining (e.g. the Saccharomyces Genome Database, http://www.yeastgenome.org). Such an approach has been adopted for model organisms, but not for Plasmodium spp. largely because of the low/non-profit nature of malaria research fails to justify the prohibitive cost. Therefore, we developed an automated literature-mining tool to identify groups of functionally-related malaria proteins based on their co-citation in the same manuscript or related group of publications (Figure 2). First, the World Wide Web was searched for occurrences of P. falciparum or P. yoelii locus names. An informatics pipeline was then used to process co-cited genes (Table S3, S4). Groups of co-cited genes in one manuscript or several closely related papers comprised over 1,023 virtual GO categories. A comparison in which gene expression correlation between randomly-associated genes, genes co-cited in a manuscript, or genes found in an ontology group indicated that the best correlation was amongst genes that were co-cited, no doubt because of the inclusion of expert knowledge (Figure 3). The figure shows that gene pairs within literature groups are 1,506 times more likely to have a correlation coefficient above 0.9 compared to a random gene pair. The enrichment factor for ontology groups are also as high as 62. The clear differences in the three distributions indicate genes mentioned in the same publication and genes sharing the same ontology terms are more likely to be co-regulated than by chance.


Evidence-based annotation of the malaria parasite's genome using comparative expression profiling.

Zhou Y, Ramachandran V, Kumar KA, Westenberger S, Refour P, Zhou B, Li F, Young JA, Chen K, Plouffe D, Henson K, Nussenzweig V, Carlton J, Vinetz JM, Duraisingh MT, Winzeler EA - PLoS ONE (2008)

An automatic pipeline for malaria literature mining.Approach A, full text search by literature search engines: A1) All P. falciparum and P. yoelii locus names were downloaded from PlasmoDB and searched against Google Scholar and SCIRUS one at a time; A2) URL hits were then mapped to PubMed entries. Approach B, NCBI database mining: B1) Mapping between GenBank sequence entries and PubMed entries were systematically retrieved from NCBI for four Plasmodium species; B2) Sequences were mapped to malaria locus names by BLAST alignment. The pipeline resulted in 6,428 functional associations between 3,262 malaria proteins and 1,278 PubMed papers.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC2215772&req=5

pone-0001570-g002: An automatic pipeline for malaria literature mining.Approach A, full text search by literature search engines: A1) All P. falciparum and P. yoelii locus names were downloaded from PlasmoDB and searched against Google Scholar and SCIRUS one at a time; A2) URL hits were then mapped to PubMed entries. Approach B, NCBI database mining: B1) Mapping between GenBank sequence entries and PubMed entries were systematically retrieved from NCBI for four Plasmodium species; B2) Sequences were mapped to malaria locus names by BLAST alignment. The pipeline resulted in 6,428 functional associations between 3,262 malaria proteins and 1,278 PubMed papers.
Mentions: Since malaria parasites have not traditionally served as model experimental systems, genome curation has mostly relied on transferring GO annotations from other model organisms via ortholog mapping. Unfortunately, this evidence-based annotation scheme is not applicable to the many parasite-specific processes that do not exist in humans or yeast. However, over the past few decades, the malaria community has investigated many Plasmodium-specific biological processes. In other organisms, high quality functional annotations have been assembled through automated and manual literature mining (e.g. the Saccharomyces Genome Database, http://www.yeastgenome.org). Such an approach has been adopted for model organisms, but not for Plasmodium spp. largely because of the low/non-profit nature of malaria research fails to justify the prohibitive cost. Therefore, we developed an automated literature-mining tool to identify groups of functionally-related malaria proteins based on their co-citation in the same manuscript or related group of publications (Figure 2). First, the World Wide Web was searched for occurrences of P. falciparum or P. yoelii locus names. An informatics pipeline was then used to process co-cited genes (Table S3, S4). Groups of co-cited genes in one manuscript or several closely related papers comprised over 1,023 virtual GO categories. A comparison in which gene expression correlation between randomly-associated genes, genes co-cited in a manuscript, or genes found in an ontology group indicated that the best correlation was amongst genes that were co-cited, no doubt because of the inclusion of expert knowledge (Figure 3). The figure shows that gene pairs within literature groups are 1,506 times more likely to have a correlation coefficient above 0.9 compared to a random gene pair. The enrichment factor for ontology groups are also as high as 62. The clear differences in the three distributions indicate genes mentioned in the same publication and genes sharing the same ontology terms are more likely to be co-regulated than by chance.

Bottom Line: Here we perform comparative expression analysis on Plasmodium parasite life cycle data derived from P. falciparum blood, sporozoite, zygote and ookinete stages, and P. yoelii mosquito oocyst and salivary gland sporozoites, blood and liver stages and show that type II fatty acid biosynthesis genes are upregulated in liver and insect stages relative to asexual blood stages.We also show that some universally uncharacterized genes with orthologs in Plasmodium species, Saccharomyces cerevisiae and humans show coordinated transcription patterns in large collections of human and yeast expression data and that the function of the uncharacterized genes can sometimes be predicted based on the expression patterns across these diverse organisms.We also use a comprehensive and unbiased literature mining method to predict which uncharacterized parasite-specific genes are likely to have roles in processes such as gliding motility, host-cell interactions, sporozoite stage, or rhoptry function.

View Article: PubMed Central - PubMed

Affiliation: Genomics Institute of the Novartis Research Foundation, San Diego, California, USA.

ABSTRACT
A fundamental problem in systems biology and whole genome sequence analysis is how to infer functions for the many uncharacterized proteins that are identified, whether they are conserved across organisms of different phyla or are phylum-specific. This problem is especially acute in pathogens, such as malaria parasites, where genetic and biochemical investigations are likely to be more difficult. Here we perform comparative expression analysis on Plasmodium parasite life cycle data derived from P. falciparum blood, sporozoite, zygote and ookinete stages, and P. yoelii mosquito oocyst and salivary gland sporozoites, blood and liver stages and show that type II fatty acid biosynthesis genes are upregulated in liver and insect stages relative to asexual blood stages. We also show that some universally uncharacterized genes with orthologs in Plasmodium species, Saccharomyces cerevisiae and humans show coordinated transcription patterns in large collections of human and yeast expression data and that the function of the uncharacterized genes can sometimes be predicted based on the expression patterns across these diverse organisms. We also use a comprehensive and unbiased literature mining method to predict which uncharacterized parasite-specific genes are likely to have roles in processes such as gliding motility, host-cell interactions, sporozoite stage, or rhoptry function. These analyses, together with protein-protein interaction data, provide probabilistic models that predict the function of 926 uncharacterized malaria genes and also suggest that malaria parasites may provide a simple model system for the study of some human processes. These data also provide a foundation for further studies of transcriptional regulation in malaria parasites.

Show MeSH
Related in: MedlinePlus