Limits...
A network of SCOP hidden Markov models and its analysis.

Zhang L, Watson LT, Heath LS - BMC Bioinformatics (2011)

Bottom Line: Specifically, we perform an all-against-all HMM comparison using the HHsearch program (similar to BLAST) and construct a network where the nodes are HMMs and the edges connect similar HMMs. We hypothesize that the HMMs in a connected component belong to the same family or superfamily more often than expected under a random network connection model.Results show a pattern consistent with this working hypothesis.Moreover, the HMM network possesses features distinctly different from the previously documented biological networks, exemplified by the exceptionally high clustering coefficient and the large number of connected components.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Computer Science, Virginia Tech, Blacksburg, VA 24061, USA. lqzhang@cs.vt.edu

ABSTRACT

Background: The Structural Classification of Proteins (SCOP) database uses a large number of hidden Markov models (HMMs) to represent families and superfamilies composed of proteins that presumably share the same evolutionary origin. However, how the HMMs are related to one another has not been examined before.

Results: In this work, taking into account the processes used to build the HMMs, we propose a working hypothesis to examine the relationships between HMMs and the families and superfamilies that they represent. Specifically, we perform an all-against-all HMM comparison using the HHsearch program (similar to BLAST) and construct a network where the nodes are HMMs and the edges connect similar HMMs. We hypothesize that the HMMs in a connected component belong to the same family or superfamily more often than expected under a random network connection model. Results show a pattern consistent with this working hypothesis. Moreover, the HMM network possesses features distinctly different from the previously documented biological networks, exemplified by the exceptionally high clustering coefficient and the large number of connected components.

Conclusions: The current finding may provide guidance in devising computational methods to reduce the degree of overlaps between the HMMs representing the same superfamilies, which may in turn enable more efficient large-scale sequence searches against the database of HMMs.

Show MeSH

Related in: MedlinePlus

The 20 largest connected components and e-value. For clarity, only the curves for some e-value cutoffs from 10-20 to 10-3 are shown.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3119068&req=5

Figure 9: The 20 largest connected components and e-value. For clarity, only the curves for some e-value cutoffs from 10-20 to 10-3 are shown.

Mentions: Figure 9 shows the sizes of the 20 largest CCs with varying e-value cutoffs. The e-value cutoff has a more pronounced effect on the sizes of the largest CCs than on those of the smaller CCs. For example, there are almost twice as many vertices in the largest CC for e-value cutoff of 0.01 as for 0.001. Thus the low e-value of 0.01 allows the formation of really large CCs that may include some low similarities between HMMs. The number of vertices contained in the same ranked CCs shows less difference after the second largest CC.


A network of SCOP hidden Markov models and its analysis.

Zhang L, Watson LT, Heath LS - BMC Bioinformatics (2011)

The 20 largest connected components and e-value. For clarity, only the curves for some e-value cutoffs from 10-20 to 10-3 are shown.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3119068&req=5

Figure 9: The 20 largest connected components and e-value. For clarity, only the curves for some e-value cutoffs from 10-20 to 10-3 are shown.
Mentions: Figure 9 shows the sizes of the 20 largest CCs with varying e-value cutoffs. The e-value cutoff has a more pronounced effect on the sizes of the largest CCs than on those of the smaller CCs. For example, there are almost twice as many vertices in the largest CC for e-value cutoff of 0.01 as for 0.001. Thus the low e-value of 0.01 allows the formation of really large CCs that may include some low similarities between HMMs. The number of vertices contained in the same ranked CCs shows less difference after the second largest CC.

Bottom Line: Specifically, we perform an all-against-all HMM comparison using the HHsearch program (similar to BLAST) and construct a network where the nodes are HMMs and the edges connect similar HMMs. We hypothesize that the HMMs in a connected component belong to the same family or superfamily more often than expected under a random network connection model.Results show a pattern consistent with this working hypothesis.Moreover, the HMM network possesses features distinctly different from the previously documented biological networks, exemplified by the exceptionally high clustering coefficient and the large number of connected components.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Computer Science, Virginia Tech, Blacksburg, VA 24061, USA. lqzhang@cs.vt.edu

ABSTRACT

Background: The Structural Classification of Proteins (SCOP) database uses a large number of hidden Markov models (HMMs) to represent families and superfamilies composed of proteins that presumably share the same evolutionary origin. However, how the HMMs are related to one another has not been examined before.

Results: In this work, taking into account the processes used to build the HMMs, we propose a working hypothesis to examine the relationships between HMMs and the families and superfamilies that they represent. Specifically, we perform an all-against-all HMM comparison using the HHsearch program (similar to BLAST) and construct a network where the nodes are HMMs and the edges connect similar HMMs. We hypothesize that the HMMs in a connected component belong to the same family or superfamily more often than expected under a random network connection model. Results show a pattern consistent with this working hypothesis. Moreover, the HMM network possesses features distinctly different from the previously documented biological networks, exemplified by the exceptionally high clustering coefficient and the large number of connected components.

Conclusions: The current finding may provide guidance in devising computational methods to reduce the degree of overlaps between the HMMs representing the same superfamilies, which may in turn enable more efficient large-scale sequence searches against the database of HMMs.

Show MeSH
Related in: MedlinePlus