Limits...
Automated classification of tailed bacteriophages according to their neck organization.

Lopes A, Tavares P, Petit MA, Guérois R, Zinn-Justin S - BMC Genomics (2014)

Bottom Line: Types and Clusters delineate consistent subgroups of Caudovirales, which correlate with several virion properties.Our method and webserver have the capacity to automatically classify most tailed phages, detect their structural module, assign a function to a set of their head, neck and tail genes, provide their morphologic subtype and localize these phages within a "head-neck-tail" based classification.It should enable analysis of large sets of phage genomes.

View Article: PubMed Central - PubMed

Affiliation: CEA, iBiTecS, Gif-sur-Yvette, F-91191 Paris, France. raphael.guerois@cea.fr.

ABSTRACT

Background: The genetic diversity observed among bacteriophages remains a major obstacle for the identification of homologs and the comparison of their functional modules. In the structural module, although several classes of homologous proteins contributing to the head and tail structure can be detected, proteins of the head-to-tail connection (or neck) are generally more divergent. Yet, molecular analyses of a few tailed phages belonging to different morphological classes suggested that only a limited number of structural solutions are used in order to produce a functional virion. To challenge this hypothesis and analyze proteins diversity at the virion neck, we developed a specific computational strategy to cope with sequence divergence in phage proteins. We searched for homologs of a set of proteins encoded in the structural module using a phage learning database.

Results: We show that using a combination of iterative profile-profile comparison and gene context analyses, we can identify a set of head, neck and tail proteins in most tailed bacteriophages of our database. Classification of phages based on neck protein sequences delineates 4 Types corresponding to known morphological subfamilies. Further analysis of the most abundant Type 1 yields 10 Clusters characterized by consistent sets of head, neck and tail proteins. We developed Virfam, a webserver that automatically identifies proteins of the phage head-neck-tail module and assign phages to the most closely related cluster of phages. This server was tested against 624 new phages from the NCBI database. 93% of the tailed and unclassified phages could be assigned to our head-neck-tail based categories, thus highlighting the large representativeness of the identified virion architectures. Types and Clusters delineate consistent subgroups of Caudovirales, which correlate with several virion properties.

Conclusions: Our method and webserver have the capacity to automatically classify most tailed phages, detect their structural module, assign a function to a set of their head, neck and tail genes, provide their morphologic subtype and localize these phages within a "head-neck-tail" based classification. It should enable analysis of large sets of phage genomes. In particular, it should contribute to the classification of the abundant unknown viruses found on assembled contigs of metagenomic samples.

Show MeSH

Related in: MedlinePlus

Classification of the Type 1 bacteriophages. (A) Tree representation of Type 1 phage similarities, built from a hierarchical agglomerative clustering procedure applied to a matrix of similarity scores between pairs of phages (combining HHsearch probabilities and percentage of identity) and represented using the ETE2 library[50]. The different branches of the tree were sorted into 10 Clusters, highlighted by different background colors. Phage names labelled by black circles filled in grey indicate the Myoviridae phages. Bacterial hosts of the phages are indicated in the bottom for each Cluster with the same color code as in the classification tree, in order to highlight the consistency between the Cluster and host phyla. (B) Gene organisation of a representative phage of each Cluster. A more complete view of gene organisation sorted by Clusters is presented on the Virfam webserver, in order to highlight the consistency between the Cluster and neck gene order distributions.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4362835&req=5

Fig3: Classification of the Type 1 bacteriophages. (A) Tree representation of Type 1 phage similarities, built from a hierarchical agglomerative clustering procedure applied to a matrix of similarity scores between pairs of phages (combining HHsearch probabilities and percentage of identity) and represented using the ETE2 library[50]. The different branches of the tree were sorted into 10 Clusters, highlighted by different background colors. Phage names labelled by black circles filled in grey indicate the Myoviridae phages. Bacterial hosts of the phages are indicated in the bottom for each Cluster with the same color code as in the classification tree, in order to highlight the consistency between the Cluster and host phyla. (B) Gene organisation of a representative phage of each Cluster. A more complete view of gene organisation sorted by Clusters is presented on the Virfam webserver, in order to highlight the consistency between the Cluster and neck gene order distributions.

Mentions: We analyzed the genome organisation of the head, neck and tail proteins in the 4 neck Types, and deduced their corresponding average gene organisation (Additional file1: Figure S1). We further explored whether any unannotated gene superfamily might emerge in the vicinity of the neck genes. Typically, in all Type 1 phages, an unannotated gene encoding a protein homologous to SPP1 gp16.1, designated hereafter Ne1 (for neck protein of Type 1), was detected between the head and tail genes, most frequently positioned as Ad1-Hc1-Ne1-Tc1 (Ne1 is displayed in yellow in a sample of phage genomes in Figure 3). Ne1 proteins exhibit an amazing versatility of sizes, ranging from 56 to 231 residues, which probably precluded their previous identification as belonging to the same protein superfamily. However, most Ne1 proteins were detected with a HHsearch confidence threshold higher than 95% (Additional file1: Figure S2). The remarkable systematic presence of their gene in Type 1 neck modules suggests a critical role in the head-to-tail connection assembly or function.Figure 3


Automated classification of tailed bacteriophages according to their neck organization.

Lopes A, Tavares P, Petit MA, Guérois R, Zinn-Justin S - BMC Genomics (2014)

Classification of the Type 1 bacteriophages. (A) Tree representation of Type 1 phage similarities, built from a hierarchical agglomerative clustering procedure applied to a matrix of similarity scores between pairs of phages (combining HHsearch probabilities and percentage of identity) and represented using the ETE2 library[50]. The different branches of the tree were sorted into 10 Clusters, highlighted by different background colors. Phage names labelled by black circles filled in grey indicate the Myoviridae phages. Bacterial hosts of the phages are indicated in the bottom for each Cluster with the same color code as in the classification tree, in order to highlight the consistency between the Cluster and host phyla. (B) Gene organisation of a representative phage of each Cluster. A more complete view of gene organisation sorted by Clusters is presented on the Virfam webserver, in order to highlight the consistency between the Cluster and neck gene order distributions.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4362835&req=5

Fig3: Classification of the Type 1 bacteriophages. (A) Tree representation of Type 1 phage similarities, built from a hierarchical agglomerative clustering procedure applied to a matrix of similarity scores between pairs of phages (combining HHsearch probabilities and percentage of identity) and represented using the ETE2 library[50]. The different branches of the tree were sorted into 10 Clusters, highlighted by different background colors. Phage names labelled by black circles filled in grey indicate the Myoviridae phages. Bacterial hosts of the phages are indicated in the bottom for each Cluster with the same color code as in the classification tree, in order to highlight the consistency between the Cluster and host phyla. (B) Gene organisation of a representative phage of each Cluster. A more complete view of gene organisation sorted by Clusters is presented on the Virfam webserver, in order to highlight the consistency between the Cluster and neck gene order distributions.
Mentions: We analyzed the genome organisation of the head, neck and tail proteins in the 4 neck Types, and deduced their corresponding average gene organisation (Additional file1: Figure S1). We further explored whether any unannotated gene superfamily might emerge in the vicinity of the neck genes. Typically, in all Type 1 phages, an unannotated gene encoding a protein homologous to SPP1 gp16.1, designated hereafter Ne1 (for neck protein of Type 1), was detected between the head and tail genes, most frequently positioned as Ad1-Hc1-Ne1-Tc1 (Ne1 is displayed in yellow in a sample of phage genomes in Figure 3). Ne1 proteins exhibit an amazing versatility of sizes, ranging from 56 to 231 residues, which probably precluded their previous identification as belonging to the same protein superfamily. However, most Ne1 proteins were detected with a HHsearch confidence threshold higher than 95% (Additional file1: Figure S2). The remarkable systematic presence of their gene in Type 1 neck modules suggests a critical role in the head-to-tail connection assembly or function.Figure 3

Bottom Line: Types and Clusters delineate consistent subgroups of Caudovirales, which correlate with several virion properties.Our method and webserver have the capacity to automatically classify most tailed phages, detect their structural module, assign a function to a set of their head, neck and tail genes, provide their morphologic subtype and localize these phages within a "head-neck-tail" based classification.It should enable analysis of large sets of phage genomes.

View Article: PubMed Central - PubMed

Affiliation: CEA, iBiTecS, Gif-sur-Yvette, F-91191 Paris, France. raphael.guerois@cea.fr.

ABSTRACT

Background: The genetic diversity observed among bacteriophages remains a major obstacle for the identification of homologs and the comparison of their functional modules. In the structural module, although several classes of homologous proteins contributing to the head and tail structure can be detected, proteins of the head-to-tail connection (or neck) are generally more divergent. Yet, molecular analyses of a few tailed phages belonging to different morphological classes suggested that only a limited number of structural solutions are used in order to produce a functional virion. To challenge this hypothesis and analyze proteins diversity at the virion neck, we developed a specific computational strategy to cope with sequence divergence in phage proteins. We searched for homologs of a set of proteins encoded in the structural module using a phage learning database.

Results: We show that using a combination of iterative profile-profile comparison and gene context analyses, we can identify a set of head, neck and tail proteins in most tailed bacteriophages of our database. Classification of phages based on neck protein sequences delineates 4 Types corresponding to known morphological subfamilies. Further analysis of the most abundant Type 1 yields 10 Clusters characterized by consistent sets of head, neck and tail proteins. We developed Virfam, a webserver that automatically identifies proteins of the phage head-neck-tail module and assign phages to the most closely related cluster of phages. This server was tested against 624 new phages from the NCBI database. 93% of the tailed and unclassified phages could be assigned to our head-neck-tail based categories, thus highlighting the large representativeness of the identified virion architectures. Types and Clusters delineate consistent subgroups of Caudovirales, which correlate with several virion properties.

Conclusions: Our method and webserver have the capacity to automatically classify most tailed phages, detect their structural module, assign a function to a set of their head, neck and tail genes, provide their morphologic subtype and localize these phages within a "head-neck-tail" based classification. It should enable analysis of large sets of phage genomes. In particular, it should contribute to the classification of the abundant unknown viruses found on assembled contigs of metagenomic samples.

Show MeSH
Related in: MedlinePlus