Limits...
A model-based information sharing protocol for profile Hidden Markov Models used for HIV-1 recombination detection.

Bulla I, Schultz AK, Chesneau C, Mark T, Serea F - BMC Bioinformatics (2014)

Bottom Line: In order to implement the proposed protocol, we make use of an existing HMM architecture and its associated inference engine.Thereby, we demonstrate that the performance of pHMMs can be significantly improved by the proposed technique.Moreover, we show that our algorithm performs significantly better than Simplot and Bootscanning.

View Article: PubMed Central - HTML - PubMed

Affiliation: Institut für Mathematik und Informatik, Universität Greifswald, Walther-Rathenau-Straße 47, 17487 Greifswald, Germany. ingobulla@gmail.com.

ABSTRACT

Background: In many applications, a family of nucleotide or protein sequences classified into several subfamilies has to be modeled. Profile Hidden Markov Models (pHMMs) are widely used for this task, modeling each subfamily separately by one pHMM. However, a major drawback of this approach is the difficulty of dealing with subfamilies composed of very few sequences. One of the most crucial bioinformatical tasks affected by the problem of small-size subfamilies is the subtyping of human immunodeficiency virus type 1 (HIV-1) sequences, i.e., HIV-1 subtypes for which only a small number of sequences is known.

Results: To deal with small samples for particular subfamilies of HIV-1, we introduce a novel model-based information sharing protocol. It estimates the emission probabilities of the pHMM modeling a particular subfamily not only based on the nucleotide frequencies of the respective subfamily but also incorporating the nucleotide frequencies of all available subfamilies. To this end, the underlying probabilistic model mimics the pattern of commonality and variation between the subtypes with regards to the biological characteristics of HI viruses. In order to implement the proposed protocol, we make use of an existing HMM architecture and its associated inference engine.

Conclusions: We apply the modified algorithm to classify HIV-1 sequence data in the form of partial HIV-1 sequences and semi-artificial recombinants. Thereby, we demonstrate that the performance of pHMMs can be significantly improved by the proposed technique. Moreover, we show that our algorithm performs significantly better than Simplot and Bootscanning.

Show MeSH

Related in: MedlinePlus

Calculation ofssegm between two subtype-wise segmentations. The arrows show how the segments of both classifications are trimmed. Hereby, the colored bar between two arrows on the bottom side of the upper classification indicates which positions of the upper classification have to be assigned to a subtype as given by the bar such that the corresponding segment of the lower classification is counted as conforming to the upper classification (same applies with the roles of the lower and upper classification being switched). Segments which do not conform to the other classification are shaded.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4230192&req=5

Figure 7: Calculation ofssegm between two subtype-wise segmentations. The arrows show how the segments of both classifications are trimmed. Hereby, the colored bar between two arrows on the bottom side of the upper classification indicates which positions of the upper classification have to be assigned to a subtype as given by the bar such that the corresponding segment of the lower classification is counted as conforming to the upper classification (same applies with the roles of the lower and upper classification being switched). Segments which do not conform to the other classification are shaded.

Mentions: We evaluate the performance of our algorithm by two measures: i) the fraction spos of sequence positions correctly classified, ii) the conformance ssegm of the predicted and the correct subtype pattern. Hereby, ssegm is computed like follows (see Figure 7). All segments of the correct subtype pattern are trimmed by 50 bps on both ends. If a segment is too short for this procedure, only its middle point is kept. Then each of these trimmed segments is checked whether all its position coincide with the subtype in the predicted pattern. The same is done with the roles of the correct and the predicted pattern switched. The score ssegm is then defined as the fraction of all trimmed segments which match their counterpart.


A model-based information sharing protocol for profile Hidden Markov Models used for HIV-1 recombination detection.

Bulla I, Schultz AK, Chesneau C, Mark T, Serea F - BMC Bioinformatics (2014)

Calculation ofssegm between two subtype-wise segmentations. The arrows show how the segments of both classifications are trimmed. Hereby, the colored bar between two arrows on the bottom side of the upper classification indicates which positions of the upper classification have to be assigned to a subtype as given by the bar such that the corresponding segment of the lower classification is counted as conforming to the upper classification (same applies with the roles of the lower and upper classification being switched). Segments which do not conform to the other classification are shaded.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4230192&req=5

Figure 7: Calculation ofssegm between two subtype-wise segmentations. The arrows show how the segments of both classifications are trimmed. Hereby, the colored bar between two arrows on the bottom side of the upper classification indicates which positions of the upper classification have to be assigned to a subtype as given by the bar such that the corresponding segment of the lower classification is counted as conforming to the upper classification (same applies with the roles of the lower and upper classification being switched). Segments which do not conform to the other classification are shaded.
Mentions: We evaluate the performance of our algorithm by two measures: i) the fraction spos of sequence positions correctly classified, ii) the conformance ssegm of the predicted and the correct subtype pattern. Hereby, ssegm is computed like follows (see Figure 7). All segments of the correct subtype pattern are trimmed by 50 bps on both ends. If a segment is too short for this procedure, only its middle point is kept. Then each of these trimmed segments is checked whether all its position coincide with the subtype in the predicted pattern. The same is done with the roles of the correct and the predicted pattern switched. The score ssegm is then defined as the fraction of all trimmed segments which match their counterpart.

Bottom Line: In order to implement the proposed protocol, we make use of an existing HMM architecture and its associated inference engine.Thereby, we demonstrate that the performance of pHMMs can be significantly improved by the proposed technique.Moreover, we show that our algorithm performs significantly better than Simplot and Bootscanning.

View Article: PubMed Central - HTML - PubMed

Affiliation: Institut für Mathematik und Informatik, Universität Greifswald, Walther-Rathenau-Straße 47, 17487 Greifswald, Germany. ingobulla@gmail.com.

ABSTRACT

Background: In many applications, a family of nucleotide or protein sequences classified into several subfamilies has to be modeled. Profile Hidden Markov Models (pHMMs) are widely used for this task, modeling each subfamily separately by one pHMM. However, a major drawback of this approach is the difficulty of dealing with subfamilies composed of very few sequences. One of the most crucial bioinformatical tasks affected by the problem of small-size subfamilies is the subtyping of human immunodeficiency virus type 1 (HIV-1) sequences, i.e., HIV-1 subtypes for which only a small number of sequences is known.

Results: To deal with small samples for particular subfamilies of HIV-1, we introduce a novel model-based information sharing protocol. It estimates the emission probabilities of the pHMM modeling a particular subfamily not only based on the nucleotide frequencies of the respective subfamily but also incorporating the nucleotide frequencies of all available subfamilies. To this end, the underlying probabilistic model mimics the pattern of commonality and variation between the subtypes with regards to the biological characteristics of HI viruses. In order to implement the proposed protocol, we make use of an existing HMM architecture and its associated inference engine.

Conclusions: We apply the modified algorithm to classify HIV-1 sequence data in the form of partial HIV-1 sequences and semi-artificial recombinants. Thereby, we demonstrate that the performance of pHMMs can be significantly improved by the proposed technique. Moreover, we show that our algorithm performs significantly better than Simplot and Bootscanning.

Show MeSH
Related in: MedlinePlus