Limits...
Distinct gene number-genome size relationships for eukaryotes and non-eukaryotes: gene content estimation for dinoflagellate genomes.

Hou Y, Lin S - PLoS ONE (2009)

Bottom Line: Distinct relationships between log(10)-transformed protein-coding gene number (Y') versus log(10)-transformed genome size (X', genome size in kbp) were found for eukaryotes and non-eukaryotes.The distinct correlations reflect lower and decreasing gene-coding percentages as genome size increases in eukaryotes (82%-1%) compared to higher and relatively stable percentages in prokaryotes and viruses (97%-47%).These estimates do not likely represent extraordinarily high functional diversity of the encoded proteome but rather highly redundant genomes as evidenced by high gene copy numbers documented for various dinoflagellate species.

View Article: PubMed Central - PubMed

Affiliation: Department of Marine Sciences, University of Connecticut, Groton, Connecticut, United States of America.

ABSTRACT
The ability to predict gene content is highly desirable for characterization of not-yet sequenced genomes like those of dinoflagellates. Using data from completely sequenced and annotated genomes from phylogenetically diverse lineages, we investigated the relationship between gene content and genome size using regression analyses. Distinct relationships between log(10)-transformed protein-coding gene number (Y') versus log(10)-transformed genome size (X', genome size in kbp) were found for eukaryotes and non-eukaryotes. Eukaryotes best fit a logarithmic model, Y' = ln(-46.200+22.678X', whereas non-eukaryotes a linear model, Y' = 0.045+0.977X', both with high significance (p<0.001, R(2)>0.91). Total gene number shows similar trends in both groups to their respective protein coding regressions. The distinct correlations reflect lower and decreasing gene-coding percentages as genome size increases in eukaryotes (82%-1%) compared to higher and relatively stable percentages in prokaryotes and viruses (97%-47%). The eukaryotic regression models project that the smallest dinoflagellate genome (3x10(6) kbp) contains 38,188 protein-coding (40,086 total) genes and the largest (245x10(6) kbp) 87,688 protein-coding (92,013 total) genes, corresponding to 1.8% and 0.05% gene-coding percentages. These estimates do not likely represent extraordinarily high functional diversity of the encoded proteome but rather highly redundant genomes as evidenced by high gene copy numbers documented for various dinoflagellate species.

Show MeSH

Related in: MedlinePlus

Logarithmic regression model for log10-transformed eukaryotic gene number (y′) versus log10-transformed genome size (x′).Range of dinoflagellate genome size (3×106–245×106 kbp) is indicated by the shaded areas. The predicted gene numbers for the recognized smallest (38,188) and largest (87,688) dinoflagellate genomes correspond to their gene-coding percentages shown in Fig. 2B.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC2737104&req=5

pone-0006978-g003: Logarithmic regression model for log10-transformed eukaryotic gene number (y′) versus log10-transformed genome size (x′).Range of dinoflagellate genome size (3×106–245×106 kbp) is indicated by the shaded areas. The predicted gene numbers for the recognized smallest (38,188) and largest (87,688) dinoflagellate genomes correspond to their gene-coding percentages shown in Fig. 2B.

Mentions: When the log10-transformed data of gene number were plotted against log10 genome size, two distinct relations appeared: eukaryotes in one and non-eukaryotes in the other, with markedly different slopes emerging from initial linear regressions (Fig. 2A). Therefore, further multi-model analyses were performed separately for these two groups. For non-eukaryotes, the linear regression model was best fit (p<0.001, highest R2) among all the different models examined (Table 1). For eukaryotes, the log10-transformed data best fit a natural logarithmic (ln) regression model (Table 1, Figure 3). As the protein-coding gene number was generally very close to the total gene number in each genome, similar significant positive correlations were found for total gene numbers in both eukaryotic and non-eukaryotic genomes (Table 1), although only the protein-coding gene number is shown in the figures (Figure 2A, 3).


Distinct gene number-genome size relationships for eukaryotes and non-eukaryotes: gene content estimation for dinoflagellate genomes.

Hou Y, Lin S - PLoS ONE (2009)

Logarithmic regression model for log10-transformed eukaryotic gene number (y′) versus log10-transformed genome size (x′).Range of dinoflagellate genome size (3×106–245×106 kbp) is indicated by the shaded areas. The predicted gene numbers for the recognized smallest (38,188) and largest (87,688) dinoflagellate genomes correspond to their gene-coding percentages shown in Fig. 2B.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC2737104&req=5

pone-0006978-g003: Logarithmic regression model for log10-transformed eukaryotic gene number (y′) versus log10-transformed genome size (x′).Range of dinoflagellate genome size (3×106–245×106 kbp) is indicated by the shaded areas. The predicted gene numbers for the recognized smallest (38,188) and largest (87,688) dinoflagellate genomes correspond to their gene-coding percentages shown in Fig. 2B.
Mentions: When the log10-transformed data of gene number were plotted against log10 genome size, two distinct relations appeared: eukaryotes in one and non-eukaryotes in the other, with markedly different slopes emerging from initial linear regressions (Fig. 2A). Therefore, further multi-model analyses were performed separately for these two groups. For non-eukaryotes, the linear regression model was best fit (p<0.001, highest R2) among all the different models examined (Table 1). For eukaryotes, the log10-transformed data best fit a natural logarithmic (ln) regression model (Table 1, Figure 3). As the protein-coding gene number was generally very close to the total gene number in each genome, similar significant positive correlations were found for total gene numbers in both eukaryotic and non-eukaryotic genomes (Table 1), although only the protein-coding gene number is shown in the figures (Figure 2A, 3).

Bottom Line: Distinct relationships between log(10)-transformed protein-coding gene number (Y') versus log(10)-transformed genome size (X', genome size in kbp) were found for eukaryotes and non-eukaryotes.The distinct correlations reflect lower and decreasing gene-coding percentages as genome size increases in eukaryotes (82%-1%) compared to higher and relatively stable percentages in prokaryotes and viruses (97%-47%).These estimates do not likely represent extraordinarily high functional diversity of the encoded proteome but rather highly redundant genomes as evidenced by high gene copy numbers documented for various dinoflagellate species.

View Article: PubMed Central - PubMed

Affiliation: Department of Marine Sciences, University of Connecticut, Groton, Connecticut, United States of America.

ABSTRACT
The ability to predict gene content is highly desirable for characterization of not-yet sequenced genomes like those of dinoflagellates. Using data from completely sequenced and annotated genomes from phylogenetically diverse lineages, we investigated the relationship between gene content and genome size using regression analyses. Distinct relationships between log(10)-transformed protein-coding gene number (Y') versus log(10)-transformed genome size (X', genome size in kbp) were found for eukaryotes and non-eukaryotes. Eukaryotes best fit a logarithmic model, Y' = ln(-46.200+22.678X', whereas non-eukaryotes a linear model, Y' = 0.045+0.977X', both with high significance (p<0.001, R(2)>0.91). Total gene number shows similar trends in both groups to their respective protein coding regressions. The distinct correlations reflect lower and decreasing gene-coding percentages as genome size increases in eukaryotes (82%-1%) compared to higher and relatively stable percentages in prokaryotes and viruses (97%-47%). The eukaryotic regression models project that the smallest dinoflagellate genome (3x10(6) kbp) contains 38,188 protein-coding (40,086 total) genes and the largest (245x10(6) kbp) 87,688 protein-coding (92,013 total) genes, corresponding to 1.8% and 0.05% gene-coding percentages. These estimates do not likely represent extraordinarily high functional diversity of the encoded proteome but rather highly redundant genomes as evidenced by high gene copy numbers documented for various dinoflagellate species.

Show MeSH
Related in: MedlinePlus