Limits...
ProfileGrids as a new visual representation of large multiple sequence alignments: a case study of the RecA protein family.

Roca AI, Almada AE, Abajian AC - BMC Bioinformatics (2008)

Bottom Line: Figures suitable for publication can be generated from the saved spreadsheet output of the colored matrices as well as by the export of conservation information for use in the PyMOL molecular visualization program.We demonstrate the utility of ProfileGrids on 300 bacterial homologs of the RecA family - a universally conserved protein involved in DNA recombination and repair.ProfileGrids allow large protein families to be visualized more effectively than the traditional stacked sequence alignment form.This new graphical representation facilitates the determination of the sequence conservation at residue positions of interest, enables the examination of structural patterns by using residue physical properties, and permits the display of rare sequence features within the context of an entire alignment.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Molecular Biology and Biochemistry, 560 Steinhaus Hall, University of California, Irvine, California 92697-3900, USA. aroca@uci.edu

ABSTRACT

Background: Multiple sequence alignments are a fundamental tool for the comparative analysis of proteins and nucleic acids. However, large data sets are no longer manageable for visualization and investigation using the traditional stacked sequence alignment representation.

Results: We introduce ProfileGrids that represent a multiple sequence alignment as a matrix color-coded according to the residue frequency occurring at each column position. JProfileGrid is a Java application for computing and analyzing ProfileGrids. A dynamic interaction with the alignment information is achieved by changing the ProfileGrid color scheme, by extracting sequence subsets at selected residues of interest, and by relating alignment information to residue physical properties. Conserved family motifs can be identified by the overlay of similarity plot calculations on a ProfileGrid. Figures suitable for publication can be generated from the saved spreadsheet output of the colored matrices as well as by the export of conservation information for use in the PyMOL molecular visualization program.We demonstrate the utility of ProfileGrids on 300 bacterial homologs of the RecA family - a universally conserved protein involved in DNA recombination and repair. Careful attention was paid to curating the collected RecA sequences since ProfileGrids allow the easy identification of rare residues in an alignment. We relate the RecA alignment sequence conservation to the following three topics: the recently identified DNA binding residues, the unexplored MAW motif, and a unique Bacillus subtilis RecA homolog sequence feature.

Conclusion: ProfileGrids allow large protein families to be visualized more effectively than the traditional stacked sequence alignment form. This new graphical representation facilitates the determination of the sequence conservation at residue positions of interest, enables the examination of structural patterns by using residue physical properties, and permits the display of rare sequence features within the context of an entire alignment. JProfileGrid is free for non-commercial use and is available from http://www.profilegrid.org. Furthermore, we present a curated RecA protein collection that is more diverse than previous data sets; and, therefore, this RecA ProfileGrid is a rich source of information for nanoanatomy analysis.

Show MeSH
B. subtilis RecA highlight sequence example with frequency colors and values turned off.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2663765&req=5

Figure 4: B. subtilis RecA highlight sequence example with frequency colors and values turned off.

Mentions: Two features allow one to visualize other sequences of the ProfileGrid besides the template sequence. First, the highlight sequence option allows one to detect and to represent unique features of one sequence with respect to the entire information content of a MSA. Such a feature may indicate specialization with respect to function or activity. When the highlight menu is used to select a sequence different from the template sequence, then the highlight feature is turned on (Figure 4). Specifically, the highlight sequence will appear immediately below the template sequence in the ProfileGrid. Furthermore, a pairwise comparison is made such that the corresponding residue is boxed if the highlight sequence differs from the template sequence. The user may choose other colors besides the default blue selection. Note that in the highlight sequence figure, the cell value identification feature (top left corner) reports the current cell frequency even when the ProfileGrid colors and values are hidden. The second feature to visualize MSA sequences is the alignment viewer window (Figure 5) that displays a traditional alignment representation of sequences from the currently selected ProfileGrid cell. In this example, the 21 homologs that have glycine in the third column are shown. For comparison purposes, the first row in the alignment is the template sequence.


ProfileGrids as a new visual representation of large multiple sequence alignments: a case study of the RecA protein family.

Roca AI, Almada AE, Abajian AC - BMC Bioinformatics (2008)

B. subtilis RecA highlight sequence example with frequency colors and values turned off.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2663765&req=5

Figure 4: B. subtilis RecA highlight sequence example with frequency colors and values turned off.
Mentions: Two features allow one to visualize other sequences of the ProfileGrid besides the template sequence. First, the highlight sequence option allows one to detect and to represent unique features of one sequence with respect to the entire information content of a MSA. Such a feature may indicate specialization with respect to function or activity. When the highlight menu is used to select a sequence different from the template sequence, then the highlight feature is turned on (Figure 4). Specifically, the highlight sequence will appear immediately below the template sequence in the ProfileGrid. Furthermore, a pairwise comparison is made such that the corresponding residue is boxed if the highlight sequence differs from the template sequence. The user may choose other colors besides the default blue selection. Note that in the highlight sequence figure, the cell value identification feature (top left corner) reports the current cell frequency even when the ProfileGrid colors and values are hidden. The second feature to visualize MSA sequences is the alignment viewer window (Figure 5) that displays a traditional alignment representation of sequences from the currently selected ProfileGrid cell. In this example, the 21 homologs that have glycine in the third column are shown. For comparison purposes, the first row in the alignment is the template sequence.

Bottom Line: Figures suitable for publication can be generated from the saved spreadsheet output of the colored matrices as well as by the export of conservation information for use in the PyMOL molecular visualization program.We demonstrate the utility of ProfileGrids on 300 bacterial homologs of the RecA family - a universally conserved protein involved in DNA recombination and repair.ProfileGrids allow large protein families to be visualized more effectively than the traditional stacked sequence alignment form.This new graphical representation facilitates the determination of the sequence conservation at residue positions of interest, enables the examination of structural patterns by using residue physical properties, and permits the display of rare sequence features within the context of an entire alignment.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Molecular Biology and Biochemistry, 560 Steinhaus Hall, University of California, Irvine, California 92697-3900, USA. aroca@uci.edu

ABSTRACT

Background: Multiple sequence alignments are a fundamental tool for the comparative analysis of proteins and nucleic acids. However, large data sets are no longer manageable for visualization and investigation using the traditional stacked sequence alignment representation.

Results: We introduce ProfileGrids that represent a multiple sequence alignment as a matrix color-coded according to the residue frequency occurring at each column position. JProfileGrid is a Java application for computing and analyzing ProfileGrids. A dynamic interaction with the alignment information is achieved by changing the ProfileGrid color scheme, by extracting sequence subsets at selected residues of interest, and by relating alignment information to residue physical properties. Conserved family motifs can be identified by the overlay of similarity plot calculations on a ProfileGrid. Figures suitable for publication can be generated from the saved spreadsheet output of the colored matrices as well as by the export of conservation information for use in the PyMOL molecular visualization program.We demonstrate the utility of ProfileGrids on 300 bacterial homologs of the RecA family - a universally conserved protein involved in DNA recombination and repair. Careful attention was paid to curating the collected RecA sequences since ProfileGrids allow the easy identification of rare residues in an alignment. We relate the RecA alignment sequence conservation to the following three topics: the recently identified DNA binding residues, the unexplored MAW motif, and a unique Bacillus subtilis RecA homolog sequence feature.

Conclusion: ProfileGrids allow large protein families to be visualized more effectively than the traditional stacked sequence alignment form. This new graphical representation facilitates the determination of the sequence conservation at residue positions of interest, enables the examination of structural patterns by using residue physical properties, and permits the display of rare sequence features within the context of an entire alignment. JProfileGrid is free for non-commercial use and is available from http://www.profilegrid.org. Furthermore, we present a curated RecA protein collection that is more diverse than previous data sets; and, therefore, this RecA ProfileGrid is a rich source of information for nanoanatomy analysis.

Show MeSH