Limits...
FOAM (Functional Ontology Assignments for Metagenomes): a Hidden Markov Model (HMM) database with environmental focus.

Prestat E, David MM, Hultman J, Taş N, Lamendella R, Dvornik J, Mackelprang R, Myrold DD, Jumpponen A, Tringe SG, Holman E, Mavromatis K, Jansson JK - Nucleic Acids Res. (2014)

Bottom Line: The alignments were checked and curated to make them specific to the targeted KO.Within this process, sequence profiles were enriched with the most abundant sequences available to maximize the yield of accurate classifier models.An associated functional ontology was built to describe the functional groups and hierarchy.

View Article: PubMed Central - PubMed

Affiliation: Earth Sciences Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA Division of Biology, Kansas State University, Manhattan, Kansas 66506, USA.

Show MeSH

Related in: MedlinePlus

HMM building pipeline: example with KO:K16157 (methane monooxygenase). Step 1—find Pfam(s) combination assigned to the KO of interest (a) and (b) check for redundancy. Step 2—fetch IMG peptide sequences which hit the retrieved Pfam(s). Step 3—fetch from Pfam-A database the HMM of interest. Step 4—alignment (hmmalign) and filter each Pfam from extra sequences obtained in IMG. Step 5—stitch filtered alignments. Step 6—draw a Maximum Likelihood tree (fasttree). Step 7—find clusters in tree with same KO. Step 8—split alignment (step 5 output) by cluster (step 7 output) and build HMM for each, and process the ‘Trusted Cutoff’ computation.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4231724&req=5

Figure 1: HMM building pipeline: example with KO:K16157 (methane monooxygenase). Step 1—find Pfam(s) combination assigned to the KO of interest (a) and (b) check for redundancy. Step 2—fetch IMG peptide sequences which hit the retrieved Pfam(s). Step 3—fetch from Pfam-A database the HMM of interest. Step 4—alignment (hmmalign) and filter each Pfam from extra sequences obtained in IMG. Step 5—stitch filtered alignments. Step 6—draw a Maximum Likelihood tree (fasttree). Step 7—find clusters in tree with same KO. Step 8—split alignment (step 5 output) by cluster (step 7 output) and build HMM for each, and process the ‘Trusted Cutoff’ computation.

Mentions: The reduced size of the resultant FOAM database, compared to non-specific sequence databases, was a first step towards significant improvement in the speed and specificity of similarity searches. In addition, to improve upon the sensitivity of conventional heuristic alignment programs, we turned each KO set into Hidden Markov Models (HMMs; 19) by fetching their corresponding protein family (Pfam) profiles (20) as described in Figure 1. This step generated a sizeable number of conflicts (several Pfam per KO and vice versa) that were automatically resolved by functional assignments to KO. For the few remaining unresolved assignations, the corresponding set of sequences was manually split according to the topology of their phylogenetic trees. At this point the HMMs were re-trained from the new pool of sequences.


FOAM (Functional Ontology Assignments for Metagenomes): a Hidden Markov Model (HMM) database with environmental focus.

Prestat E, David MM, Hultman J, Taş N, Lamendella R, Dvornik J, Mackelprang R, Myrold DD, Jumpponen A, Tringe SG, Holman E, Mavromatis K, Jansson JK - Nucleic Acids Res. (2014)

HMM building pipeline: example with KO:K16157 (methane monooxygenase). Step 1—find Pfam(s) combination assigned to the KO of interest (a) and (b) check for redundancy. Step 2—fetch IMG peptide sequences which hit the retrieved Pfam(s). Step 3—fetch from Pfam-A database the HMM of interest. Step 4—alignment (hmmalign) and filter each Pfam from extra sequences obtained in IMG. Step 5—stitch filtered alignments. Step 6—draw a Maximum Likelihood tree (fasttree). Step 7—find clusters in tree with same KO. Step 8—split alignment (step 5 output) by cluster (step 7 output) and build HMM for each, and process the ‘Trusted Cutoff’ computation.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4231724&req=5

Figure 1: HMM building pipeline: example with KO:K16157 (methane monooxygenase). Step 1—find Pfam(s) combination assigned to the KO of interest (a) and (b) check for redundancy. Step 2—fetch IMG peptide sequences which hit the retrieved Pfam(s). Step 3—fetch from Pfam-A database the HMM of interest. Step 4—alignment (hmmalign) and filter each Pfam from extra sequences obtained in IMG. Step 5—stitch filtered alignments. Step 6—draw a Maximum Likelihood tree (fasttree). Step 7—find clusters in tree with same KO. Step 8—split alignment (step 5 output) by cluster (step 7 output) and build HMM for each, and process the ‘Trusted Cutoff’ computation.
Mentions: The reduced size of the resultant FOAM database, compared to non-specific sequence databases, was a first step towards significant improvement in the speed and specificity of similarity searches. In addition, to improve upon the sensitivity of conventional heuristic alignment programs, we turned each KO set into Hidden Markov Models (HMMs; 19) by fetching their corresponding protein family (Pfam) profiles (20) as described in Figure 1. This step generated a sizeable number of conflicts (several Pfam per KO and vice versa) that were automatically resolved by functional assignments to KO. For the few remaining unresolved assignations, the corresponding set of sequences was manually split according to the topology of their phylogenetic trees. At this point the HMMs were re-trained from the new pool of sequences.

Bottom Line: The alignments were checked and curated to make them specific to the targeted KO.Within this process, sequence profiles were enriched with the most abundant sequences available to maximize the yield of accurate classifier models.An associated functional ontology was built to describe the functional groups and hierarchy.

View Article: PubMed Central - PubMed

Affiliation: Earth Sciences Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA Division of Biology, Kansas State University, Manhattan, Kansas 66506, USA.

Show MeSH
Related in: MedlinePlus