Limits...
Using sequence similarity networks for visualization of relationships across diverse protein superfamilies.

Atkinson HJ, Morris JH, Ferrin TE, Babbitt PC - PLoS ONE (2009)

Bottom Line: In comparison to other primary methods, networks provide both a good representation of group-wise sequence similarity relationships and a strong visual and quantitative correlation with phylogenetic trees, while enabling analysis and visualization of much larger sets of sequences than trees or multiple sequence alignments can easily accommodate.We also define important limitations and caveats in the application of these networks.As a broadly accessible and effective tool for the exploration of protein superfamilies, sequence similarity networks show great potential for generating testable hypotheses about protein structure-function relationships.

View Article: PubMed Central - PubMed

Affiliation: Graduate Program in Biological and Medical Informatics, University of California San Francisco, San Francisco, California, United States of America.

ABSTRACT
The dramatic increase in heterogeneous types of biological data--in particular, the abundance of new protein sequences--requires fast and user-friendly methods for organizing this information in a way that enables functional inference. The most widely used strategy to link sequence or structure to function, homology-based function prediction, relies on the fundamental assumption that sequence or structural similarity implies functional similarity. New tools that extend this approach are still urgently needed to associate sequence data with biological information in ways that accommodate the real complexity of the problem, while being accessible to experimental as well as computational biologists. To address this, we have examined the application of sequence similarity networks for visualizing functional trends across protein superfamilies from the context of sequence similarity. Using three large groups of homologous proteins of varying types of structural and functional diversity--GPCRs and kinases from humans, and the crotonase superfamily of enzymes--we show that overlaying networks with orthogonal information is a powerful approach for observing functional themes and revealing outliers. In comparison to other primary methods, networks provide both a good representation of group-wise sequence similarity relationships and a strong visual and quantitative correlation with phylogenetic trees, while enabling analysis and visualization of much larger sets of sequences than trees or multiple sequence alignments can easily accommodate. We also define important limitations and caveats in the application of these networks. As a broadly accessible and effective tool for the exploration of protein superfamilies, sequence similarity networks show great potential for generating testable hypotheses about protein structure-function relationships.

Show MeSH

Related in: MedlinePlus

Sequence similarity networks are useful tools for exploration of the kinase superfamily.Two ways of coloring the same network of 513 human kinase domains are shown. The network is thresholded at a BLAST E-value of 1×10−25. The worst edges displayed correspond to a median of 29% identity over alignments of 260 residues. A. Network colored by kinase class. B. Network colored by the presence of a catalytic Lys in the “VAIK” motif: Each of the 513 sequences was aligned to a sequence model of the kinase domain, and the identity of the residue at the catalytic Lys position is mapped to the network. *Note that MAP2K1 and MAP2K2 registered a Lys to Arg substitution due to a sequence alignment error. The other labeled kinases truly do not contain a homologous catalytic K, but only the WNK kinases have been shown to have kinase activity. See Table II for statistics.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC2631154&req=5

pone-0004345-g003: Sequence similarity networks are useful tools for exploration of the kinase superfamily.Two ways of coloring the same network of 513 human kinase domains are shown. The network is thresholded at a BLAST E-value of 1×10−25. The worst edges displayed correspond to a median of 29% identity over alignments of 260 residues. A. Network colored by kinase class. B. Network colored by the presence of a catalytic Lys in the “VAIK” motif: Each of the 513 sequences was aligned to a sequence model of the kinase domain, and the identity of the residue at the catalytic Lys position is mapped to the network. *Note that MAP2K1 and MAP2K2 registered a Lys to Arg substitution due to a sequence alignment error. The other labeled kinases truly do not contain a homologous catalytic K, but only the WNK kinases have been shown to have kinase activity. See Table II for statistics.

Mentions: In order to assess the correspondence between a very large phylogenetic tree and sequence similarity networks, we used a dendrogram of the human kinome[18], which uses sequence similarity to classify all of the kinase domains in the human genome into a number of broad classes. This tree depicting the classification of each kinase has been enormously useful to researchers since being published; in particular, it gives a sense of how a kinase of interest relates to all others. Although the pairwise relationships between the CK1 kinase class and the other canonical kinase domains are not significant enough to be connected at the E-value threshold chosen for Fig. 3, the pairwise distances between the large connected group are still strongly correlated with the distances in the seminal Manning kinase tree[18] (R is 0.628 when comparing the laid out distances in the connected cluster in Fig. 3 to the tree distances for the 419 sequences in common from the full Manning tree, which contains 491 kinase domains; see Table 2 for more statistics).


Using sequence similarity networks for visualization of relationships across diverse protein superfamilies.

Atkinson HJ, Morris JH, Ferrin TE, Babbitt PC - PLoS ONE (2009)

Sequence similarity networks are useful tools for exploration of the kinase superfamily.Two ways of coloring the same network of 513 human kinase domains are shown. The network is thresholded at a BLAST E-value of 1×10−25. The worst edges displayed correspond to a median of 29% identity over alignments of 260 residues. A. Network colored by kinase class. B. Network colored by the presence of a catalytic Lys in the “VAIK” motif: Each of the 513 sequences was aligned to a sequence model of the kinase domain, and the identity of the residue at the catalytic Lys position is mapped to the network. *Note that MAP2K1 and MAP2K2 registered a Lys to Arg substitution due to a sequence alignment error. The other labeled kinases truly do not contain a homologous catalytic K, but only the WNK kinases have been shown to have kinase activity. See Table II for statistics.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC2631154&req=5

pone-0004345-g003: Sequence similarity networks are useful tools for exploration of the kinase superfamily.Two ways of coloring the same network of 513 human kinase domains are shown. The network is thresholded at a BLAST E-value of 1×10−25. The worst edges displayed correspond to a median of 29% identity over alignments of 260 residues. A. Network colored by kinase class. B. Network colored by the presence of a catalytic Lys in the “VAIK” motif: Each of the 513 sequences was aligned to a sequence model of the kinase domain, and the identity of the residue at the catalytic Lys position is mapped to the network. *Note that MAP2K1 and MAP2K2 registered a Lys to Arg substitution due to a sequence alignment error. The other labeled kinases truly do not contain a homologous catalytic K, but only the WNK kinases have been shown to have kinase activity. See Table II for statistics.
Mentions: In order to assess the correspondence between a very large phylogenetic tree and sequence similarity networks, we used a dendrogram of the human kinome[18], which uses sequence similarity to classify all of the kinase domains in the human genome into a number of broad classes. This tree depicting the classification of each kinase has been enormously useful to researchers since being published; in particular, it gives a sense of how a kinase of interest relates to all others. Although the pairwise relationships between the CK1 kinase class and the other canonical kinase domains are not significant enough to be connected at the E-value threshold chosen for Fig. 3, the pairwise distances between the large connected group are still strongly correlated with the distances in the seminal Manning kinase tree[18] (R is 0.628 when comparing the laid out distances in the connected cluster in Fig. 3 to the tree distances for the 419 sequences in common from the full Manning tree, which contains 491 kinase domains; see Table 2 for more statistics).

Bottom Line: In comparison to other primary methods, networks provide both a good representation of group-wise sequence similarity relationships and a strong visual and quantitative correlation with phylogenetic trees, while enabling analysis and visualization of much larger sets of sequences than trees or multiple sequence alignments can easily accommodate.We also define important limitations and caveats in the application of these networks.As a broadly accessible and effective tool for the exploration of protein superfamilies, sequence similarity networks show great potential for generating testable hypotheses about protein structure-function relationships.

View Article: PubMed Central - PubMed

Affiliation: Graduate Program in Biological and Medical Informatics, University of California San Francisco, San Francisco, California, United States of America.

ABSTRACT
The dramatic increase in heterogeneous types of biological data--in particular, the abundance of new protein sequences--requires fast and user-friendly methods for organizing this information in a way that enables functional inference. The most widely used strategy to link sequence or structure to function, homology-based function prediction, relies on the fundamental assumption that sequence or structural similarity implies functional similarity. New tools that extend this approach are still urgently needed to associate sequence data with biological information in ways that accommodate the real complexity of the problem, while being accessible to experimental as well as computational biologists. To address this, we have examined the application of sequence similarity networks for visualizing functional trends across protein superfamilies from the context of sequence similarity. Using three large groups of homologous proteins of varying types of structural and functional diversity--GPCRs and kinases from humans, and the crotonase superfamily of enzymes--we show that overlaying networks with orthogonal information is a powerful approach for observing functional themes and revealing outliers. In comparison to other primary methods, networks provide both a good representation of group-wise sequence similarity relationships and a strong visual and quantitative correlation with phylogenetic trees, while enabling analysis and visualization of much larger sets of sequences than trees or multiple sequence alignments can easily accommodate. We also define important limitations and caveats in the application of these networks. As a broadly accessible and effective tool for the exploration of protein superfamilies, sequence similarity networks show great potential for generating testable hypotheses about protein structure-function relationships.

Show MeSH
Related in: MedlinePlus