Limits...
Relationship of SARS-CoV to other pathogenic RNA viruses explored by tetranucleotide usage profiling.

Yap YL, Zhang XW, Danchin A - BMC Bioinformatics (2003)

Bottom Line: The exact origin of the cause of the Severe Acute Respiratory Syndrome (SARS) is still an open question.Potential factors influencing these biases are discussed.The congruence of the relationship outcomes with the known classification indicates that there exist phylogenetic signals in the tetra-nucleotide usage patterns, that is most prominent in the replicase open reading frames.

View Article: PubMed Central - HTML - PubMed

Affiliation: HKU-Pasteur Research Centre, Dexter H,C, Man Building, 8 Sassoon Road Pokfulam, Hong Kong. daniely@hkusua.hku.hk

ABSTRACT

Background: The exact origin of the cause of the Severe Acute Respiratory Syndrome (SARS) is still an open question. The genomic sequence relationship of SARS-CoV with 30 different single-stranded RNA (ssRNA) viruses of various families was studied using two non-standard approaches. Both approaches began with the vectorial profiling of the tetra-nucleotide usage pattern V for each virus. In approach one, a distance measure of a vector V, based on correlation coefficient was devised to construct a relationship tree by the neighbor-joining algorithm. In approach two, a multivariate factor analysis was performed to derive the embedded tetra-nucleotide usage patterns. These patterns were subsequently used to classify the selected viruses.

Results: Both approaches yielded relationship outcomes that are consistent with the known virus classification. They also indicated that the genome of RNA viruses from the same family conform to a specific pattern of word usage. Based on the correlation of the overall tetra-nucleotide usage patterns, the Transmissible Gastroenteritis Virus (TGV) and the Feline CoronaVirus (FCoV) are closest to SARS-CoV. Surprisingly also, the RNA viruses that do not go through a DNA stage displayed a remarkable discrimination against the CpG and UpA di-nucleotide (z = -77.31, -52.48 respectively) and selection for UpG and CpA (z = 65.79,49.99 respectively). Potential factors influencing these biases are discussed.

Conclusion: The study of genomic word usage is a powerful method to classify RNA viruses. The congruence of the relationship outcomes with the known classification indicates that there exist phylogenetic signals in the tetra-nucleotide usage patterns, that is most prominent in the replicase open reading frames.

Show MeSH

Related in: MedlinePlus

Flowchart for studying the tetra-nucleotide usage pattern. The FA and NJ algorithms stand for factor analysis [21-23] and neighbor joining [29] algorithm.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC222961&req=5

Figure 7: Flowchart for studying the tetra-nucleotide usage pattern. The FA and NJ algorithms stand for factor analysis [21-23] and neighbor joining [29] algorithm.

Mentions: We focused our study on the genomic sequences (their translated strand) of ssRNA viruses (Table 1), which incorporated 20 species from the family of positive strand ssRNA viruses and 11 species from the family of negative strand ssRNA viruses. We are aware of the fact that these viruses constitute completely different species, most probably unrelated to one another. They are included in a common study in order to try to have means to identify relevant features from purely statistical background properties. The coverage included the viruses that are known to cause diseases to their corresponding hosts. The acronym for each virus is shown in the table and is referred to throughout this study. All sequences corresponding to their translated strand were retrieved from GenBank, and the accession numbers and genomic size (in nucleotides) for individual virus were provided for reference. For the present study, two sets of data were generated from the complete sequence for each virus. Dataset 1 covered the entire genome and dataset 2 covered only their replicase open reading frame. The flowchart for studying the tetra-nucleotide usage pattern in 31 viruses is shown in Figure 7.


Relationship of SARS-CoV to other pathogenic RNA viruses explored by tetranucleotide usage profiling.

Yap YL, Zhang XW, Danchin A - BMC Bioinformatics (2003)

Flowchart for studying the tetra-nucleotide usage pattern. The FA and NJ algorithms stand for factor analysis [21-23] and neighbor joining [29] algorithm.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC222961&req=5

Figure 7: Flowchart for studying the tetra-nucleotide usage pattern. The FA and NJ algorithms stand for factor analysis [21-23] and neighbor joining [29] algorithm.
Mentions: We focused our study on the genomic sequences (their translated strand) of ssRNA viruses (Table 1), which incorporated 20 species from the family of positive strand ssRNA viruses and 11 species from the family of negative strand ssRNA viruses. We are aware of the fact that these viruses constitute completely different species, most probably unrelated to one another. They are included in a common study in order to try to have means to identify relevant features from purely statistical background properties. The coverage included the viruses that are known to cause diseases to their corresponding hosts. The acronym for each virus is shown in the table and is referred to throughout this study. All sequences corresponding to their translated strand were retrieved from GenBank, and the accession numbers and genomic size (in nucleotides) for individual virus were provided for reference. For the present study, two sets of data were generated from the complete sequence for each virus. Dataset 1 covered the entire genome and dataset 2 covered only their replicase open reading frame. The flowchart for studying the tetra-nucleotide usage pattern in 31 viruses is shown in Figure 7.

Bottom Line: The exact origin of the cause of the Severe Acute Respiratory Syndrome (SARS) is still an open question.Potential factors influencing these biases are discussed.The congruence of the relationship outcomes with the known classification indicates that there exist phylogenetic signals in the tetra-nucleotide usage patterns, that is most prominent in the replicase open reading frames.

View Article: PubMed Central - HTML - PubMed

Affiliation: HKU-Pasteur Research Centre, Dexter H,C, Man Building, 8 Sassoon Road Pokfulam, Hong Kong. daniely@hkusua.hku.hk

ABSTRACT

Background: The exact origin of the cause of the Severe Acute Respiratory Syndrome (SARS) is still an open question. The genomic sequence relationship of SARS-CoV with 30 different single-stranded RNA (ssRNA) viruses of various families was studied using two non-standard approaches. Both approaches began with the vectorial profiling of the tetra-nucleotide usage pattern V for each virus. In approach one, a distance measure of a vector V, based on correlation coefficient was devised to construct a relationship tree by the neighbor-joining algorithm. In approach two, a multivariate factor analysis was performed to derive the embedded tetra-nucleotide usage patterns. These patterns were subsequently used to classify the selected viruses.

Results: Both approaches yielded relationship outcomes that are consistent with the known virus classification. They also indicated that the genome of RNA viruses from the same family conform to a specific pattern of word usage. Based on the correlation of the overall tetra-nucleotide usage patterns, the Transmissible Gastroenteritis Virus (TGV) and the Feline CoronaVirus (FCoV) are closest to SARS-CoV. Surprisingly also, the RNA viruses that do not go through a DNA stage displayed a remarkable discrimination against the CpG and UpA di-nucleotide (z = -77.31, -52.48 respectively) and selection for UpG and CpA (z = 65.79,49.99 respectively). Potential factors influencing these biases are discussed.

Conclusion: The study of genomic word usage is a powerful method to classify RNA viruses. The congruence of the relationship outcomes with the known classification indicates that there exist phylogenetic signals in the tetra-nucleotide usage patterns, that is most prominent in the replicase open reading frames.

Show MeSH
Related in: MedlinePlus