Limits...
Predictive computational phenotyping and biomarker discovery using reference-free genome comparisons

View Article: PubMed Central - PubMed

ABSTRACT

Background: The identification of genomic biomarkers is a key step towards improving diagnostic tests and therapies. We present a reference-free method for this task that relies on a k-mer representation of genomes and a machine learning algorithm that produces intelligible models. The method is computationally scalable and well-suited for whole genome sequencing studies.

Results: The method was validated by generating models that predict the antibiotic resistance of C. difficile, M. tuberculosis, P. aeruginosa, and S. pneumoniae for 17 antibiotics. The obtained models are accurate, faithful to the biological pathways targeted by the antibiotics, and they provide insight into the process of resistance acquisition. Moreover, a theoretical analysis of the method revealed tight statistical guarantees on the accuracy of the obtained models, supporting its relevance for genomic biomarker discovery.

Conclusions: Our method allows the generation of accurate and interpretable predictive models of phenotypes, which rely on a small set of genomic variations. The method is not limited to predicting antibiotic resistance in bacteria and is applicable to a variety of organisms and phenotypes. Kover, an efficient implementation of our method, is open-source and should guide biological efforts to understand a plethora of phenotypes (http://github.com/aldro61/kover/).

Electronic supplementary material: The online version of this article (doi:10.1186/s12864-016-2889-6) contains supplementary material, which is available to authorized users.

No MeSH data available.


Dataset summary: distribution of resistant and sensitive isolates in each dataset
© Copyright Policy - OpenAccess
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC5037627&req=5

Fig1: Dataset summary: distribution of resistant and sensitive isolates in each dataset

Mentions: Each (pathogen, antibiotic) combination was considered individually, yielding 17 datasets in which the number of examples (m) ranged from 111 to 556 and the number of k-mers () ranged from 10 to 123 millions. Figure 1 shows the distribution of resistant and sensitive isolates in each dataset. The datasets are further detailed in Additional file 3: Table S2.Fig. 1


Predictive computational phenotyping and biomarker discovery using reference-free genome comparisons
Dataset summary: distribution of resistant and sensitive isolates in each dataset
© Copyright Policy - OpenAccess
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC5037627&req=5

Fig1: Dataset summary: distribution of resistant and sensitive isolates in each dataset
Mentions: Each (pathogen, antibiotic) combination was considered individually, yielding 17 datasets in which the number of examples (m) ranged from 111 to 556 and the number of k-mers () ranged from 10 to 123 millions. Figure 1 shows the distribution of resistant and sensitive isolates in each dataset. The datasets are further detailed in Additional file 3: Table S2.Fig. 1

View Article: PubMed Central - PubMed

ABSTRACT

Background: The identification of genomic biomarkers is a key step towards improving diagnostic tests and therapies. We present a reference-free method for this task that relies on a k-mer representation of genomes and a machine learning algorithm that produces intelligible models. The method is computationally scalable and well-suited for whole genome sequencing studies.

Results: The method was validated by generating models that predict the antibiotic resistance of C. difficile, M. tuberculosis, P. aeruginosa, and S. pneumoniae for 17 antibiotics. The obtained models are accurate, faithful to the biological pathways targeted by the antibiotics, and they provide insight into the process of resistance acquisition. Moreover, a theoretical analysis of the method revealed tight statistical guarantees on the accuracy of the obtained models, supporting its relevance for genomic biomarker discovery.

Conclusions: Our method allows the generation of accurate and interpretable predictive models of phenotypes, which rely on a small set of genomic variations. The method is not limited to predicting antibiotic resistance in bacteria and is applicable to a variety of organisms and phenotypes. Kover, an efficient implementation of our method, is open-source and should guide biological efforts to understand a plethora of phenotypes (http://github.com/aldro61/kover/).

Electronic supplementary material: The online version of this article (doi:10.1186/s12864-016-2889-6) contains supplementary material, which is available to authorized users.

No MeSH data available.