Limits...
Coevolution between simple sequence repeats (SSRs) and virus genome size.

Zhao X, Tian Y, Yang R, Feng H, Ouyang Q, Tian Y, Tan Z, Li M, Niu Y, Jiang J, Shen G, Yu R - BMC Genomics (2012)

Bottom Line: The results showed that simple sequence repeats (SSRs) is strongly, positively and significantly correlated with genome size.Certain repeat class is distributed in a certain range of genome sequence length.We concluded that genome size is an important factor in affecting the occurrence of SSRs; hosts are also responsible for the variances of SSRs content to a certain degree.

View Article: PubMed Central - HTML - PubMed

Affiliation: Chinese Academy of Inspection and Quarantine, Beijing, 100029, China.

ABSTRACT

Background: Relationship between the level of repetitiveness in genomic sequence and genome size has been investigated by making use of complete prokaryotic and eukaryotic genomes, but relevant studies have been rarely made in virus genomes.

Results: In this study, a total of 257 viruses were examined, which cover 90% of genera. The results showed that simple sequence repeats (SSRs) is strongly, positively and significantly correlated with genome size. Certain repeat class is distributed in a certain range of genome sequence length. Mono-, di- and tri- repeats are widely distributed in all virus genomes, tetra- SSRs as a common component consist in genomes which more than 100 kb in size; in the range of genome < 100 kb, genomes containing penta- and hexa- SSRs are not more than 50%. Principal components analysis (PCA) indicated that dinucleotide repeat affects the differences of SSRs most strongly among virus genomes. Results showed that SSRs tend to accumulate in larger virus genomes; and the longer genome sequence, the longer repeat units.

Conclusions: We conducted this research standing on the height of the whole virus. We concluded that genome size is an important factor in affecting the occurrence of SSRs; hosts are also responsible for the variances of SSRs content to a certain degree.

Show MeSH

Related in: MedlinePlus

Regression analysis of relationship between SSRs length and genome size.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3585866&req=5

Figure 3: Regression analysis of relationship between SSRs length and genome size.

Mentions: We constructed two sets of scatter plots and then performed regression analysis of SSRs (occurrence and length) versus complete genome size for all analyzed viruses to examine the relationship between SSRs and genome size. Above all, scatter plots were made, in which, genome size was taken as an independent variable, and all analyzed data were split into two groups (genome > 30000 bp and ≤ 30000 bp) to make the scatters and curves natural and visible (Figures2,3); and then 10 curves (linear, logarithmic, inverse, quadratic, cubic, compound, power, S, growth and exponential) were fitted according to their respective mathematical models by using the software SPSS 17.0. Parameter estimates and visual inspection showed that goodness fit of data varies greatly to different models; nevertheless, curves with the best goodness of fit were picked out for correlation analysis between SSRs (occurrence and length) and genome size (Figures2,3). The number of repeat arrays varies from 4 in Nodamura virus genome (S206-(+)ssRNA-36) to 3823 in Amsacta moorei entomopoxvirus 'L' genome (S33-dsDNA-33) (Additional file 2). The power function model provides the best fitted values towards all studied SSRs occurrence and genome size by regression analysis, and results display a very strong and significant positive relationship between the occurrence of SSRs and genome size clearly (R2 = 0.919, P < 0.001) (Figure2A). Power function and cubic model best fit for the data of genome > 30000 bp and ≤ 30000 bp group, respectively (Figure2B,C). Clearly, the SSRs occurrence is strongly, significantly and positively related to the genome size in both genome > 30000 bp (R2 = 0.815, P < 0.001) and ≤ 30000 bp (R2 = 0.718, P < 0.001) group. Especially in the group of genome ≤ 30000 bp, the values of SSR occurrences fluctuate with a relatively narrow range. An exceptional case is worth noting. One point of the scatter plot locating far above the fitted curve represents the value of SSRs in Amsacta moorei entomopoxvirus 'L' genome (S33-dsDNA-33, NC_002520) with the size of 232392 bp, in which the SSRs occurrence is a total of 3823, far more than SSRs in any other analyzed virus genome.


Coevolution between simple sequence repeats (SSRs) and virus genome size.

Zhao X, Tian Y, Yang R, Feng H, Ouyang Q, Tian Y, Tan Z, Li M, Niu Y, Jiang J, Shen G, Yu R - BMC Genomics (2012)

Regression analysis of relationship between SSRs length and genome size.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3585866&req=5

Figure 3: Regression analysis of relationship between SSRs length and genome size.
Mentions: We constructed two sets of scatter plots and then performed regression analysis of SSRs (occurrence and length) versus complete genome size for all analyzed viruses to examine the relationship between SSRs and genome size. Above all, scatter plots were made, in which, genome size was taken as an independent variable, and all analyzed data were split into two groups (genome > 30000 bp and ≤ 30000 bp) to make the scatters and curves natural and visible (Figures2,3); and then 10 curves (linear, logarithmic, inverse, quadratic, cubic, compound, power, S, growth and exponential) were fitted according to their respective mathematical models by using the software SPSS 17.0. Parameter estimates and visual inspection showed that goodness fit of data varies greatly to different models; nevertheless, curves with the best goodness of fit were picked out for correlation analysis between SSRs (occurrence and length) and genome size (Figures2,3). The number of repeat arrays varies from 4 in Nodamura virus genome (S206-(+)ssRNA-36) to 3823 in Amsacta moorei entomopoxvirus 'L' genome (S33-dsDNA-33) (Additional file 2). The power function model provides the best fitted values towards all studied SSRs occurrence and genome size by regression analysis, and results display a very strong and significant positive relationship between the occurrence of SSRs and genome size clearly (R2 = 0.919, P < 0.001) (Figure2A). Power function and cubic model best fit for the data of genome > 30000 bp and ≤ 30000 bp group, respectively (Figure2B,C). Clearly, the SSRs occurrence is strongly, significantly and positively related to the genome size in both genome > 30000 bp (R2 = 0.815, P < 0.001) and ≤ 30000 bp (R2 = 0.718, P < 0.001) group. Especially in the group of genome ≤ 30000 bp, the values of SSR occurrences fluctuate with a relatively narrow range. An exceptional case is worth noting. One point of the scatter plot locating far above the fitted curve represents the value of SSRs in Amsacta moorei entomopoxvirus 'L' genome (S33-dsDNA-33, NC_002520) with the size of 232392 bp, in which the SSRs occurrence is a total of 3823, far more than SSRs in any other analyzed virus genome.

Bottom Line: The results showed that simple sequence repeats (SSRs) is strongly, positively and significantly correlated with genome size.Certain repeat class is distributed in a certain range of genome sequence length.We concluded that genome size is an important factor in affecting the occurrence of SSRs; hosts are also responsible for the variances of SSRs content to a certain degree.

View Article: PubMed Central - HTML - PubMed

Affiliation: Chinese Academy of Inspection and Quarantine, Beijing, 100029, China.

ABSTRACT

Background: Relationship between the level of repetitiveness in genomic sequence and genome size has been investigated by making use of complete prokaryotic and eukaryotic genomes, but relevant studies have been rarely made in virus genomes.

Results: In this study, a total of 257 viruses were examined, which cover 90% of genera. The results showed that simple sequence repeats (SSRs) is strongly, positively and significantly correlated with genome size. Certain repeat class is distributed in a certain range of genome sequence length. Mono-, di- and tri- repeats are widely distributed in all virus genomes, tetra- SSRs as a common component consist in genomes which more than 100 kb in size; in the range of genome < 100 kb, genomes containing penta- and hexa- SSRs are not more than 50%. Principal components analysis (PCA) indicated that dinucleotide repeat affects the differences of SSRs most strongly among virus genomes. Results showed that SSRs tend to accumulate in larger virus genomes; and the longer genome sequence, the longer repeat units.

Conclusions: We conducted this research standing on the height of the whole virus. We concluded that genome size is an important factor in affecting the occurrence of SSRs; hosts are also responsible for the variances of SSRs content to a certain degree.

Show MeSH
Related in: MedlinePlus