Limits...
Genomics and Machine Learning for Taxonomy Consensus: The Mycobacterium tuberculosis Complex Paradigm.

Azé J, Sola C, Zhang J, Lafosse-Marin F, Yasmin M, Siddiqui R, Kremer K, van Soolingen D, Refrégier G - PLoS ONE (2015)

Bottom Line: All assignations were reproduced with very good sensibilities and specificities.When applied to independent datasets, it was able to suggest new sublineages such as pseudo-Beijing.Additional developments using SNPs will help stabilizing it.

View Article: PubMed Central - PubMed

Affiliation: LIRMM UM CNRS, UMR 5506, 860 rue de St Priest, 34095 Montpellier cedex 5, France.

ABSTRACT
Infra-species taxonomy is a prerequisite to compare features such as virulence in different pathogen lineages. Mycobacterium tuberculosis complex taxonomy has rapidly evolved in the last 20 years through intensive clinical isolation, advances in sequencing and in the description of fast-evolving loci (CRISPR and MIRU-VNTR). On-line tools to describe new isolates have been set up based on known diversity either on CRISPRs (also known as spoligotypes) or on MIRU-VNTR profiles. The underlying taxonomies are largely concordant but use different names and offer different depths. The objectives of this study were 1) to explicit the consensus that exists between the alternative taxonomies, and 2) to provide an on-line tool to ease classification of new isolates. Genotyping (24-VNTR, 43-spacers spoligotypes, IS6110-RFLP) was undertaken for 3,454 clinical isolates from the Netherlands (2004-2008). The resulting database was enlarged with African isolates to include most human tuberculosis diversity. Assignations were obtained using TB-Lineage, MIRU-VNTRPlus, SITVITWEB and an algorithm from Borile et al. By identifying the recurrent concordances between the alternative taxonomies, we proposed a consensus including 22 sublineages. Original and consensus assignations of the all isolates from the database were subsequently implemented into an ensemble learning approach based on Machine Learning tool Weka to derive a classification scheme. All assignations were reproduced with very good sensibilities and specificities. When applied to independent datasets, it was able to suggest new sublineages such as pseudo-Beijing. This Lineage Prediction tool, efficient on 15-MIRU, 24-VNTR and spoligotype data is available on the web interface "TBminer." Another section of this website helps summarizing key molecular epidemiological data, easing tuberculosis surveillance. Altogether, we successfully used Machine Learning on a large dataset to set up and make available the first consensual taxonomy for human Mycobacterium tuberculosis complex. Additional developments using SNPs will help stabilizing it.

No MeSH data available.


Related in: MedlinePlus

TBminer Prediction tool performance on Miru-VntrPlus database.A. Concordance between TBminer Pred2_Miru-Vntr and Miru-VntrPlus assignations. B. Concordance between Pred6 and manual expert assignation accounting for original labels.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4496040&req=5

pone.0130912.g004: TBminer Prediction tool performance on Miru-VntrPlus database.A. Concordance between TBminer Pred2_Miru-Vntr and Miru-VntrPlus assignations. B. Concordance between Pred6 and manual expert assignation accounting for original labels.

Mentions: We observed that the prediction of MIRU-VNTRPlus assignations (Pred2 tool) had an accuracy of 100% when assigning human isolates from lineages 1 to 6 (Fig 4A). It failed in predicting animal sublineages other than M. bovis as expected due to the absence of such isolates in the training dataset. Most of them (n = 14 out of 21; 67%) were assigned to the closely related lineage Lineage6_Afri1(WestAfrican2). Altogether, diversity picture of the whole sample as provided by Pred2 tool in Lineage Prediction module of TBminer was very similar to that of MIRU-VNTRPlus tool, overestimating only the prevalence of West African 2 lineage and being unable to identify peculiar animal isolates as well as M. canettii (Fig 4A).


Genomics and Machine Learning for Taxonomy Consensus: The Mycobacterium tuberculosis Complex Paradigm.

Azé J, Sola C, Zhang J, Lafosse-Marin F, Yasmin M, Siddiqui R, Kremer K, van Soolingen D, Refrégier G - PLoS ONE (2015)

TBminer Prediction tool performance on Miru-VntrPlus database.A. Concordance between TBminer Pred2_Miru-Vntr and Miru-VntrPlus assignations. B. Concordance between Pred6 and manual expert assignation accounting for original labels.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4496040&req=5

pone.0130912.g004: TBminer Prediction tool performance on Miru-VntrPlus database.A. Concordance between TBminer Pred2_Miru-Vntr and Miru-VntrPlus assignations. B. Concordance between Pred6 and manual expert assignation accounting for original labels.
Mentions: We observed that the prediction of MIRU-VNTRPlus assignations (Pred2 tool) had an accuracy of 100% when assigning human isolates from lineages 1 to 6 (Fig 4A). It failed in predicting animal sublineages other than M. bovis as expected due to the absence of such isolates in the training dataset. Most of them (n = 14 out of 21; 67%) were assigned to the closely related lineage Lineage6_Afri1(WestAfrican2). Altogether, diversity picture of the whole sample as provided by Pred2 tool in Lineage Prediction module of TBminer was very similar to that of MIRU-VNTRPlus tool, overestimating only the prevalence of West African 2 lineage and being unable to identify peculiar animal isolates as well as M. canettii (Fig 4A).

Bottom Line: All assignations were reproduced with very good sensibilities and specificities.When applied to independent datasets, it was able to suggest new sublineages such as pseudo-Beijing.Additional developments using SNPs will help stabilizing it.

View Article: PubMed Central - PubMed

Affiliation: LIRMM UM CNRS, UMR 5506, 860 rue de St Priest, 34095 Montpellier cedex 5, France.

ABSTRACT
Infra-species taxonomy is a prerequisite to compare features such as virulence in different pathogen lineages. Mycobacterium tuberculosis complex taxonomy has rapidly evolved in the last 20 years through intensive clinical isolation, advances in sequencing and in the description of fast-evolving loci (CRISPR and MIRU-VNTR). On-line tools to describe new isolates have been set up based on known diversity either on CRISPRs (also known as spoligotypes) or on MIRU-VNTR profiles. The underlying taxonomies are largely concordant but use different names and offer different depths. The objectives of this study were 1) to explicit the consensus that exists between the alternative taxonomies, and 2) to provide an on-line tool to ease classification of new isolates. Genotyping (24-VNTR, 43-spacers spoligotypes, IS6110-RFLP) was undertaken for 3,454 clinical isolates from the Netherlands (2004-2008). The resulting database was enlarged with African isolates to include most human tuberculosis diversity. Assignations were obtained using TB-Lineage, MIRU-VNTRPlus, SITVITWEB and an algorithm from Borile et al. By identifying the recurrent concordances between the alternative taxonomies, we proposed a consensus including 22 sublineages. Original and consensus assignations of the all isolates from the database were subsequently implemented into an ensemble learning approach based on Machine Learning tool Weka to derive a classification scheme. All assignations were reproduced with very good sensibilities and specificities. When applied to independent datasets, it was able to suggest new sublineages such as pseudo-Beijing. This Lineage Prediction tool, efficient on 15-MIRU, 24-VNTR and spoligotype data is available on the web interface "TBminer." Another section of this website helps summarizing key molecular epidemiological data, easing tuberculosis surveillance. Altogether, we successfully used Machine Learning on a large dataset to set up and make available the first consensual taxonomy for human Mycobacterium tuberculosis complex. Additional developments using SNPs will help stabilizing it.

No MeSH data available.


Related in: MedlinePlus