Limits...
Bios2mds: an R package for comparing orthologous protein families by metric multidimensional scaling.

Pelé J, Bécu JM, Abdi H, Chabbert M - BMC Bioinformatics (2012)

Bottom Line: Orthologous sequence sets can thus be compared in a straightforward way.The bios2mds package provides the tools for a complete integrated pipeline aimed at the MDS analysis of multiple sets of orthologous sequences in the R statistical environment.In addition, as the analysis can be carried out from user provided matrices, the projection function can be widely used on any kind of data.

View Article: PubMed Central - HTML - PubMed

Affiliation: CNRS UMR 6214 - INSERM 1083, Faculté de Médecine, 3 rue Haute de Reculée, Angers, 49045, France.

ABSTRACT

Background: The distance matrix computed from multiple alignments of homologous sequences is widely used by distance-based phylogenetic methods to provide information on the evolution of protein families. This matrix can also be visualized in a low dimensional space by metric multidimensional scaling (MDS). Applied to protein families, MDS provides information complementary to the information derived from tree-based methods. Moreover, MDS gives a unique opportunity to compare orthologous sequence sets because it can add supplementary elements to a reference space.

Results: The R package bios2mds (from BIOlogical Sequences to MultiDimensional Scaling) has been designed to analyze multiple sequence alignments by MDS. Bios2mds starts with a sequence alignment, builds a matrix of distances between the aligned sequences, and represents this matrix by MDS to visualize a sequence space. This package also offers the possibility of performing K-means clustering in the MDS derived sequence space. Most importantly, bios2mds includes a function that projects supplementary elements (a.k.a. "out of sample" elements) onto the space defined by reference or "active" elements. Orthologous sequence sets can thus be compared in a straightforward way. The data analysis and visualization tools have been specifically designed for an easy monitoring of the evolutionary drift of protein sub-families.

Conclusions: The bios2mds package provides the tools for a complete integrated pipeline aimed at the MDS analysis of multiple sets of orthologous sequences in the R statistical environment. In addition, as the analysis can be carried out from user provided matrices, the projection function can be widely used on any kind of data.

Show MeSH
3D representation of the GPCR sequence space. A typical multiple sequence alignment of 283 GPCRs from H. sapiens was analyzed by MDS, with distances based on difference scores. The 3D space is defined by the first three components of the MDS analysis. The color code refers to the different sub-families of human GPCRs, with unclassified receptors colored in black. Plot obtained with the mmds.3D.plot function after coloring by GPCR sub-families with the col.group function.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3403911&req=5

Figure 1: 3D representation of the GPCR sequence space. A typical multiple sequence alignment of 283 GPCRs from H. sapiens was analyzed by MDS, with distances based on difference scores. The 3D space is defined by the first three components of the MDS analysis. The color code refers to the different sub-families of human GPCRs, with unclassified receptors colored in black. Plot obtained with the mmds.3D.plot function after coloring by GPCR sub-families with the col.group function.

Mentions: The human set includes 283 aligned sequences of GPCRs [9]. The MDS analysis of this set provides a typical sequence space (Figure 1). In this example, the distances between sequences are equal to their difference scores and the 3D sequence space of human GPCRs is displayed with the plot3D command from the rgl package [32] that allows interactive 3D representation within the R environment. The elements are colored using the color.group function based on the prior knowledge of the twelve GPCR sub-families present in humans [9,33,34]. Clustering allows the grouping of these sub-families into four groups that correspond to major pathways of GPCR evolution [9].


Bios2mds: an R package for comparing orthologous protein families by metric multidimensional scaling.

Pelé J, Bécu JM, Abdi H, Chabbert M - BMC Bioinformatics (2012)

3D representation of the GPCR sequence space. A typical multiple sequence alignment of 283 GPCRs from H. sapiens was analyzed by MDS, with distances based on difference scores. The 3D space is defined by the first three components of the MDS analysis. The color code refers to the different sub-families of human GPCRs, with unclassified receptors colored in black. Plot obtained with the mmds.3D.plot function after coloring by GPCR sub-families with the col.group function.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3403911&req=5

Figure 1: 3D representation of the GPCR sequence space. A typical multiple sequence alignment of 283 GPCRs from H. sapiens was analyzed by MDS, with distances based on difference scores. The 3D space is defined by the first three components of the MDS analysis. The color code refers to the different sub-families of human GPCRs, with unclassified receptors colored in black. Plot obtained with the mmds.3D.plot function after coloring by GPCR sub-families with the col.group function.
Mentions: The human set includes 283 aligned sequences of GPCRs [9]. The MDS analysis of this set provides a typical sequence space (Figure 1). In this example, the distances between sequences are equal to their difference scores and the 3D sequence space of human GPCRs is displayed with the plot3D command from the rgl package [32] that allows interactive 3D representation within the R environment. The elements are colored using the color.group function based on the prior knowledge of the twelve GPCR sub-families present in humans [9,33,34]. Clustering allows the grouping of these sub-families into four groups that correspond to major pathways of GPCR evolution [9].

Bottom Line: Orthologous sequence sets can thus be compared in a straightforward way.The bios2mds package provides the tools for a complete integrated pipeline aimed at the MDS analysis of multiple sets of orthologous sequences in the R statistical environment.In addition, as the analysis can be carried out from user provided matrices, the projection function can be widely used on any kind of data.

View Article: PubMed Central - HTML - PubMed

Affiliation: CNRS UMR 6214 - INSERM 1083, Faculté de Médecine, 3 rue Haute de Reculée, Angers, 49045, France.

ABSTRACT

Background: The distance matrix computed from multiple alignments of homologous sequences is widely used by distance-based phylogenetic methods to provide information on the evolution of protein families. This matrix can also be visualized in a low dimensional space by metric multidimensional scaling (MDS). Applied to protein families, MDS provides information complementary to the information derived from tree-based methods. Moreover, MDS gives a unique opportunity to compare orthologous sequence sets because it can add supplementary elements to a reference space.

Results: The R package bios2mds (from BIOlogical Sequences to MultiDimensional Scaling) has been designed to analyze multiple sequence alignments by MDS. Bios2mds starts with a sequence alignment, builds a matrix of distances between the aligned sequences, and represents this matrix by MDS to visualize a sequence space. This package also offers the possibility of performing K-means clustering in the MDS derived sequence space. Most importantly, bios2mds includes a function that projects supplementary elements (a.k.a. "out of sample" elements) onto the space defined by reference or "active" elements. Orthologous sequence sets can thus be compared in a straightforward way. The data analysis and visualization tools have been specifically designed for an easy monitoring of the evolutionary drift of protein sub-families.

Conclusions: The bios2mds package provides the tools for a complete integrated pipeline aimed at the MDS analysis of multiple sets of orthologous sequences in the R statistical environment. In addition, as the analysis can be carried out from user provided matrices, the projection function can be widely used on any kind of data.

Show MeSH