Limits...
Protein localization analysis of essential genes in prokaryotes.

Peng C, Gao F - Sci Rep (2014)

Bottom Line: Both statistical analysis of localization information in these genomes and GO (Gene Ontology) terms enriched in the essential genes show that proteins encoded by essential genes are enriched in internal location sites, while exist in cell envelope with a lower proportion compared with non-essential ones.Meanwhile, there are few essential proteins in the external subcellular location sites such as flagellum and fimbrium, and proteins encoded by non-essential genes tend to have diverse localizations.These results would provide further insights into the understanding of fundamental functions needed to support a cellular life and improve gene essentiality prediction by taking the protein localization and enriched GO terms into consideration.

View Article: PubMed Central - PubMed

Affiliation: Department of Physics, Tianjin University, Tianjin 300072, China.

ABSTRACT
Essential genes, those critical for the survival of an organism under certain conditions, play a significant role in pharmaceutics and synthetic biology. Knowledge of protein localization is invaluable for understanding their function as well as the interaction of different proteins. However, systematical examination of essential genes from the aspect of the localizations of proteins they encode has not been explored before. Here, a comprehensive protein localization analysis of essential genes in 27 prokaryotes including 24 bacteria, 2 mycoplasmas and 1 archaeon has been performed. Both statistical analysis of localization information in these genomes and GO (Gene Ontology) terms enriched in the essential genes show that proteins encoded by essential genes are enriched in internal location sites, while exist in cell envelope with a lower proportion compared with non-essential ones. Meanwhile, there are few essential proteins in the external subcellular location sites such as flagellum and fimbrium, and proteins encoded by non-essential genes tend to have diverse localizations. These results would provide further insights into the understanding of fundamental functions needed to support a cellular life and improve gene essentiality prediction by taking the protein localization and enriched GO terms into consideration.

Show MeSH

Related in: MedlinePlus

Distribution of essential proteins (the inner ring of the doughnut chart) and non-essential proteins (the outer ring of the doughnut chart) in (a) Bacillus subtilis 168, (b) Escherichia coli MG1655 and (c) Mycoplasma genitalium G37.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4126397&req=5

f3: Distribution of essential proteins (the inner ring of the doughnut chart) and non-essential proteins (the outer ring of the doughnut chart) in (a) Bacillus subtilis 168, (b) Escherichia coli MG1655 and (c) Mycoplasma genitalium G37.

Mentions: Other factors that may influence the protein localization differences, such as the multiple localization of a protein, the reliability of protein localization prediction and the source of non-essential genes, are also discussed here. On average, 2.47% of the essential proteins and 2.70% of the non-essential proteins in the prediction of PSORTb have been annotated with multiple localization sites (the percentages of multiple localization proteins in each dataset are listed in Table 1). Therefore, the issue of multiple localization of a protein only bring a very slight impact on the accuracy of the statistical results due to the low percentages. Since the prediction result might not be perfectly precise, some experimental data were also employed. The protein localization information was obtained from the Universal Protein Resource (UniProt; http://www.uniprot.org)27. Captured from literatures, the data in UniProt is credible. We selected Bacillus subtilis 168, Escherichia coli MG1655 and Mycoplasma genitalium G37 as model genomes for Gram-positive bacteria, Gram-negative bacteria and mycoplasmas respectively, due to their higher percentages of the proteins with localization information. On average, 47.03% of the essential genes and 44.91% of the non-essential genes in these genomes have annotated localization information. We defined “unknown” as subcellular location for the proteins without annotated localization information. Among the proteins with localization information, 3.54% of the essential proteins and 3.35% of the non-essential proteins have multiple localization sites, which is close to the statistical result obtained from the prediction of PSORTb. Since multiple localization protein can locate in any site mentioned in its annotation, all the related site groups counted the protein in the calculation here. Figure 3 shows the distribution of essential proteins (the inner ring of the doughnut chart) and non-essential proteins (the outer ring of the doughnut chart) in B. subtilis 168, E. coli MG1655 and M. genitalium G37. In all the three doughnut charts, the percentages of the essential proteins located in cytoplasm are higher than those of non-essential proteins, and the proteins encoded by essential genes exist in cell envelope with a lower proportion compared with non-essential ones. These conclusions are consistent with the prediction results of PSORTb. Comparisons were also made between groups classified according to the source of non-essential genes presented in Table 1. We found the differences are more significant in the organisms whose non-essential genes are obtained based on the original literatures. The reason may be that non-essential from the original literatures are more reliable than those from the complementary set of essential genes.


Protein localization analysis of essential genes in prokaryotes.

Peng C, Gao F - Sci Rep (2014)

Distribution of essential proteins (the inner ring of the doughnut chart) and non-essential proteins (the outer ring of the doughnut chart) in (a) Bacillus subtilis 168, (b) Escherichia coli MG1655 and (c) Mycoplasma genitalium G37.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4126397&req=5

f3: Distribution of essential proteins (the inner ring of the doughnut chart) and non-essential proteins (the outer ring of the doughnut chart) in (a) Bacillus subtilis 168, (b) Escherichia coli MG1655 and (c) Mycoplasma genitalium G37.
Mentions: Other factors that may influence the protein localization differences, such as the multiple localization of a protein, the reliability of protein localization prediction and the source of non-essential genes, are also discussed here. On average, 2.47% of the essential proteins and 2.70% of the non-essential proteins in the prediction of PSORTb have been annotated with multiple localization sites (the percentages of multiple localization proteins in each dataset are listed in Table 1). Therefore, the issue of multiple localization of a protein only bring a very slight impact on the accuracy of the statistical results due to the low percentages. Since the prediction result might not be perfectly precise, some experimental data were also employed. The protein localization information was obtained from the Universal Protein Resource (UniProt; http://www.uniprot.org)27. Captured from literatures, the data in UniProt is credible. We selected Bacillus subtilis 168, Escherichia coli MG1655 and Mycoplasma genitalium G37 as model genomes for Gram-positive bacteria, Gram-negative bacteria and mycoplasmas respectively, due to their higher percentages of the proteins with localization information. On average, 47.03% of the essential genes and 44.91% of the non-essential genes in these genomes have annotated localization information. We defined “unknown” as subcellular location for the proteins without annotated localization information. Among the proteins with localization information, 3.54% of the essential proteins and 3.35% of the non-essential proteins have multiple localization sites, which is close to the statistical result obtained from the prediction of PSORTb. Since multiple localization protein can locate in any site mentioned in its annotation, all the related site groups counted the protein in the calculation here. Figure 3 shows the distribution of essential proteins (the inner ring of the doughnut chart) and non-essential proteins (the outer ring of the doughnut chart) in B. subtilis 168, E. coli MG1655 and M. genitalium G37. In all the three doughnut charts, the percentages of the essential proteins located in cytoplasm are higher than those of non-essential proteins, and the proteins encoded by essential genes exist in cell envelope with a lower proportion compared with non-essential ones. These conclusions are consistent with the prediction results of PSORTb. Comparisons were also made between groups classified according to the source of non-essential genes presented in Table 1. We found the differences are more significant in the organisms whose non-essential genes are obtained based on the original literatures. The reason may be that non-essential from the original literatures are more reliable than those from the complementary set of essential genes.

Bottom Line: Both statistical analysis of localization information in these genomes and GO (Gene Ontology) terms enriched in the essential genes show that proteins encoded by essential genes are enriched in internal location sites, while exist in cell envelope with a lower proportion compared with non-essential ones.Meanwhile, there are few essential proteins in the external subcellular location sites such as flagellum and fimbrium, and proteins encoded by non-essential genes tend to have diverse localizations.These results would provide further insights into the understanding of fundamental functions needed to support a cellular life and improve gene essentiality prediction by taking the protein localization and enriched GO terms into consideration.

View Article: PubMed Central - PubMed

Affiliation: Department of Physics, Tianjin University, Tianjin 300072, China.

ABSTRACT
Essential genes, those critical for the survival of an organism under certain conditions, play a significant role in pharmaceutics and synthetic biology. Knowledge of protein localization is invaluable for understanding their function as well as the interaction of different proteins. However, systematical examination of essential genes from the aspect of the localizations of proteins they encode has not been explored before. Here, a comprehensive protein localization analysis of essential genes in 27 prokaryotes including 24 bacteria, 2 mycoplasmas and 1 archaeon has been performed. Both statistical analysis of localization information in these genomes and GO (Gene Ontology) terms enriched in the essential genes show that proteins encoded by essential genes are enriched in internal location sites, while exist in cell envelope with a lower proportion compared with non-essential ones. Meanwhile, there are few essential proteins in the external subcellular location sites such as flagellum and fimbrium, and proteins encoded by non-essential genes tend to have diverse localizations. These results would provide further insights into the understanding of fundamental functions needed to support a cellular life and improve gene essentiality prediction by taking the protein localization and enriched GO terms into consideration.

Show MeSH
Related in: MedlinePlus