Limits...
Gene network visualization and quantitative synteny analysis of more than 300 marine T4-like phage scaffolds from the GOS metagenome.

Comeau AM, Arbiol C, Krisch HM - Mol. Biol. Evol. (2010)

Bottom Line: This assembly permits the examination of synteny (organization) of the genes on the scaffolds and their comparison with the genome sequences from cultured Cyano-T4s.We employ comparative genomics and a novel usage of network visualization software to show that the scaffold phylogenies are similar to those of the traditional marker genes they contain.Importantly, these uncultured metagenomic scaffolds quite closely match the organization of the "core genome" of the known Cyano-T4s.

View Article: PubMed Central - PubMed

Affiliation: Centre National de la Recherche Scientifique, UMR5100, Toulouse, France.

ABSTRACT
Bacteriophages (phages) are the most abundant biological entities in the biosphere and are the dominant "organisms" in marine environments, exerting an enormous influence on marine microbial populations. Metagenomic projects, such as the Global Ocean Sampling expedition (GOS), have demonstrated the predominance of tailed phages (Caudovirales), particularly T4 superfamily cyanophages (Cyano-T4s), in the marine milieu. Whereas previous metagenomic analyses were limited to gene content information, here we present a comparative analysis of over 300 phage scaffolds assembled from the viral fraction of the GOS data. This assembly permits the examination of synteny (organization) of the genes on the scaffolds and their comparison with the genome sequences from cultured Cyano-T4s. We employ comparative genomics and a novel usage of network visualization software to show that the scaffold phylogenies are similar to those of the traditional marker genes they contain. Importantly, these uncultured metagenomic scaffolds quite closely match the organization of the "core genome" of the known Cyano-T4s. This indicates that the current view of genome architecture in the Cyano-T4s is not seriously biased by being based on a small number of cultured phages, and we can be confident that they accurately reflect the diverse population of such viruses in marine surface waters.

Show MeSH

Related in: MedlinePlus

Synteny representations and Isynteny calculations. (A) Traditional arrow gene representations, with “core genes” in black and inserted novel ORFs in white. (B) Conversion of (A) into network representation (used by Cytoscape), with each line representing an occurrence/link between the respective genes/ORFs. (C) Reduction of (B), with the removal of non-core genes, showing either all (top) or a condensation (bottom) of the number of links. (D) Formulation of the Index of Synteny (Isynteny) (equation on right), which reports the proportion of links (L) to the left (in) and right (out) of a gene X that are to single sources (S) and targets (T). (E–G) Various examples of gene synteny, along with the corresponding Isynteny values and the total number of links (under the format L = n).
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2908710&req=5

fig1: Synteny representations and Isynteny calculations. (A) Traditional arrow gene representations, with “core genes” in black and inserted novel ORFs in white. (B) Conversion of (A) into network representation (used by Cytoscape), with each line representing an occurrence/link between the respective genes/ORFs. (C) Reduction of (B), with the removal of non-core genes, showing either all (top) or a condensation (bottom) of the number of links. (D) Formulation of the Index of Synteny (Isynteny) (equation on right), which reports the proportion of links (L) to the left (in) and right (out) of a gene X that are to single sources (S) and targets (T). (E–G) Various examples of gene synteny, along with the corresponding Isynteny values and the total number of links (under the format L = n).

Mentions: Traditional comparative genomic representations become increasingly unwieldy when the number of objects under consideration becomes large. For example, whereas traditional dot plots are efficient for comparing up to a few dozen genomes (e.g., Hatfull et al. 2010), they are less useful in this case of hundreds of scaffolds as they focus more on the “DNA sequence” (good for showing indels, inversions, etc.) versus the “gene,” which is our focus. In this analysis, we needed to represent the synteny of >300 scaffolds containing nearly 1 800 genes/ORFs. To do this, we applied the open-source Cytoscape program (version 2.6.1, http://www.cytoscape.org; Shannon et al. 2003) that has been developed for the visualization of complex metabolic pathways and “interactomes” to deal with the similar presentational problems posed by scaffold synteny. These visualizations convert multiple occurrences of the same gene/ORF to one “node” with multiple “links” (synteny) to its neighboring genes/ORFs. There are multiple advantages to using such well-established “network” programs, among which include 1) the capacity to handle very large data sets; 2) great flexibility in visualization control; and 3) the “compaction” of data into a smaller visual space for ease of analysis and presentation. Figure 1 illustrates how a traditional “arrow” representation of a genome (panel A) can be translated into a network of nodes and edges (panel B). The space required to represent the data is considerably reduced. For example, insertions of novel ORF database orphans (ORFans; white nodes) within groups of “core genes” (black nodes) can also be further removed in order to simplify and represent the “core synteny” (panel C). This type of pruning, which removes intervening genes/ORFs that are not part of the core genome, answers the fundamental question of whether (following fig. 1A–C) core gene B is invariably downstream of A, and upstream of C, regardless of the expansion/contraction of this genome by the addition/removal of the facultative intervening ORFs.


Gene network visualization and quantitative synteny analysis of more than 300 marine T4-like phage scaffolds from the GOS metagenome.

Comeau AM, Arbiol C, Krisch HM - Mol. Biol. Evol. (2010)

Synteny representations and Isynteny calculations. (A) Traditional arrow gene representations, with “core genes” in black and inserted novel ORFs in white. (B) Conversion of (A) into network representation (used by Cytoscape), with each line representing an occurrence/link between the respective genes/ORFs. (C) Reduction of (B), with the removal of non-core genes, showing either all (top) or a condensation (bottom) of the number of links. (D) Formulation of the Index of Synteny (Isynteny) (equation on right), which reports the proportion of links (L) to the left (in) and right (out) of a gene X that are to single sources (S) and targets (T). (E–G) Various examples of gene synteny, along with the corresponding Isynteny values and the total number of links (under the format L = n).
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2908710&req=5

fig1: Synteny representations and Isynteny calculations. (A) Traditional arrow gene representations, with “core genes” in black and inserted novel ORFs in white. (B) Conversion of (A) into network representation (used by Cytoscape), with each line representing an occurrence/link between the respective genes/ORFs. (C) Reduction of (B), with the removal of non-core genes, showing either all (top) or a condensation (bottom) of the number of links. (D) Formulation of the Index of Synteny (Isynteny) (equation on right), which reports the proportion of links (L) to the left (in) and right (out) of a gene X that are to single sources (S) and targets (T). (E–G) Various examples of gene synteny, along with the corresponding Isynteny values and the total number of links (under the format L = n).
Mentions: Traditional comparative genomic representations become increasingly unwieldy when the number of objects under consideration becomes large. For example, whereas traditional dot plots are efficient for comparing up to a few dozen genomes (e.g., Hatfull et al. 2010), they are less useful in this case of hundreds of scaffolds as they focus more on the “DNA sequence” (good for showing indels, inversions, etc.) versus the “gene,” which is our focus. In this analysis, we needed to represent the synteny of >300 scaffolds containing nearly 1 800 genes/ORFs. To do this, we applied the open-source Cytoscape program (version 2.6.1, http://www.cytoscape.org; Shannon et al. 2003) that has been developed for the visualization of complex metabolic pathways and “interactomes” to deal with the similar presentational problems posed by scaffold synteny. These visualizations convert multiple occurrences of the same gene/ORF to one “node” with multiple “links” (synteny) to its neighboring genes/ORFs. There are multiple advantages to using such well-established “network” programs, among which include 1) the capacity to handle very large data sets; 2) great flexibility in visualization control; and 3) the “compaction” of data into a smaller visual space for ease of analysis and presentation. Figure 1 illustrates how a traditional “arrow” representation of a genome (panel A) can be translated into a network of nodes and edges (panel B). The space required to represent the data is considerably reduced. For example, insertions of novel ORF database orphans (ORFans; white nodes) within groups of “core genes” (black nodes) can also be further removed in order to simplify and represent the “core synteny” (panel C). This type of pruning, which removes intervening genes/ORFs that are not part of the core genome, answers the fundamental question of whether (following fig. 1A–C) core gene B is invariably downstream of A, and upstream of C, regardless of the expansion/contraction of this genome by the addition/removal of the facultative intervening ORFs.

Bottom Line: This assembly permits the examination of synteny (organization) of the genes on the scaffolds and their comparison with the genome sequences from cultured Cyano-T4s.We employ comparative genomics and a novel usage of network visualization software to show that the scaffold phylogenies are similar to those of the traditional marker genes they contain.Importantly, these uncultured metagenomic scaffolds quite closely match the organization of the "core genome" of the known Cyano-T4s.

View Article: PubMed Central - PubMed

Affiliation: Centre National de la Recherche Scientifique, UMR5100, Toulouse, France.

ABSTRACT
Bacteriophages (phages) are the most abundant biological entities in the biosphere and are the dominant "organisms" in marine environments, exerting an enormous influence on marine microbial populations. Metagenomic projects, such as the Global Ocean Sampling expedition (GOS), have demonstrated the predominance of tailed phages (Caudovirales), particularly T4 superfamily cyanophages (Cyano-T4s), in the marine milieu. Whereas previous metagenomic analyses were limited to gene content information, here we present a comparative analysis of over 300 phage scaffolds assembled from the viral fraction of the GOS data. This assembly permits the examination of synteny (organization) of the genes on the scaffolds and their comparison with the genome sequences from cultured Cyano-T4s. We employ comparative genomics and a novel usage of network visualization software to show that the scaffold phylogenies are similar to those of the traditional marker genes they contain. Importantly, these uncultured metagenomic scaffolds quite closely match the organization of the "core genome" of the known Cyano-T4s. This indicates that the current view of genome architecture in the Cyano-T4s is not seriously biased by being based on a small number of cultured phages, and we can be confident that they accurately reflect the diverse population of such viruses in marine surface waters.

Show MeSH
Related in: MedlinePlus