Limits...
Metagenome fragment classification based on multiple motif-occurrence profiles.

Matsushita N, Seno S, Takenaka Y, Matsuda H - PeerJ (2014)

Bottom Line: The Naïve Bayes Classifier is a method for this classification.To address this issue, we have updated the Naïve Bayes Classifier method using multiple sets of occurrence profiles for each reference genome by normalizing the genome sizes, dividing each genome sequence into a set of subsequences of similar length and generating profiles for each subsequence.This multiple profile strategy improves the accuracy of the results generated by the Naïve Bayes Classifier method for simulated and Sargasso Sea datasets.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Bioinformatic Engineering, Graduate School of Information Science and Technology, Osaka University , Yamadaoka, Suita, Osaka , Japan.

ABSTRACT
A vast amount of metagenomic data has been obtained by extracting multiple genomes simultaneously from microbial communities, including genomes from uncultivable microbes. By analyzing these metagenomic data, novel microbes are discovered and new microbial functions are elucidated. The first step in analyzing these data is sequenced-read classification into reference genomes from which each read can be derived. The Naïve Bayes Classifier is a method for this classification. To identify the derivation of the reads, this method calculates a score based on the occurrence of a DNA sequence motif in each reference genome. However, large differences in the sizes of the reference genomes can bias the scoring of the reads. This bias might cause erroneous classification and decrease the classification accuracy. To address this issue, we have updated the Naïve Bayes Classifier method using multiple sets of occurrence profiles for each reference genome by normalizing the genome sizes, dividing each genome sequence into a set of subsequences of similar length and generating profiles for each subsequence. This multiple profile strategy improves the accuracy of the results generated by the Naïve Bayes Classifier method for simulated and Sargasso Sea datasets.

No MeSH data available.


Differences in the motif profiles between the NBC (previous) and NBC-MP (proposed) methods.NBC-MP (proposed) generates multiple profiles from each genome, whereas NBC (previous) generates a single profile for each genome.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4157293&req=5

fig-3: Differences in the motif profiles between the NBC (previous) and NBC-MP (proposed) methods.NBC-MP (proposed) generates multiple profiles from each genome, whereas NBC (previous) generates a single profile for each genome.

Mentions: In Step 4, NBC assigns every fragment to the genome with the highest score without a threshold. We have updated the NBC method using multiple profiles for each reference genome, referred to as NBC-MP (multiple profiles). NBC generates a single motif-occurrence profile, the recorded frequencies of the fixed-length sub-sequences of a reference genome, for each genome. In NBC-MP, each reference genome sequence is separated into multiple sub-sequences of similar length according to the size of the given genome. Thereafter, NBC-MP generates motif-occurrence profiles for each sub-sequence. Thus, the NBC-MP method includes multiple profiles from each genome (Fig. 3).


Metagenome fragment classification based on multiple motif-occurrence profiles.

Matsushita N, Seno S, Takenaka Y, Matsuda H - PeerJ (2014)

Differences in the motif profiles between the NBC (previous) and NBC-MP (proposed) methods.NBC-MP (proposed) generates multiple profiles from each genome, whereas NBC (previous) generates a single profile for each genome.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4157293&req=5

fig-3: Differences in the motif profiles between the NBC (previous) and NBC-MP (proposed) methods.NBC-MP (proposed) generates multiple profiles from each genome, whereas NBC (previous) generates a single profile for each genome.
Mentions: In Step 4, NBC assigns every fragment to the genome with the highest score without a threshold. We have updated the NBC method using multiple profiles for each reference genome, referred to as NBC-MP (multiple profiles). NBC generates a single motif-occurrence profile, the recorded frequencies of the fixed-length sub-sequences of a reference genome, for each genome. In NBC-MP, each reference genome sequence is separated into multiple sub-sequences of similar length according to the size of the given genome. Thereafter, NBC-MP generates motif-occurrence profiles for each sub-sequence. Thus, the NBC-MP method includes multiple profiles from each genome (Fig. 3).

Bottom Line: The Naïve Bayes Classifier is a method for this classification.To address this issue, we have updated the Naïve Bayes Classifier method using multiple sets of occurrence profiles for each reference genome by normalizing the genome sizes, dividing each genome sequence into a set of subsequences of similar length and generating profiles for each subsequence.This multiple profile strategy improves the accuracy of the results generated by the Naïve Bayes Classifier method for simulated and Sargasso Sea datasets.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Bioinformatic Engineering, Graduate School of Information Science and Technology, Osaka University , Yamadaoka, Suita, Osaka , Japan.

ABSTRACT
A vast amount of metagenomic data has been obtained by extracting multiple genomes simultaneously from microbial communities, including genomes from uncultivable microbes. By analyzing these metagenomic data, novel microbes are discovered and new microbial functions are elucidated. The first step in analyzing these data is sequenced-read classification into reference genomes from which each read can be derived. The Naïve Bayes Classifier is a method for this classification. To identify the derivation of the reads, this method calculates a score based on the occurrence of a DNA sequence motif in each reference genome. However, large differences in the sizes of the reference genomes can bias the scoring of the reads. This bias might cause erroneous classification and decrease the classification accuracy. To address this issue, we have updated the Naïve Bayes Classifier method using multiple sets of occurrence profiles for each reference genome by normalizing the genome sizes, dividing each genome sequence into a set of subsequences of similar length and generating profiles for each subsequence. This multiple profile strategy improves the accuracy of the results generated by the Naïve Bayes Classifier method for simulated and Sargasso Sea datasets.

No MeSH data available.