Limits...
A phylogeny-based sampling strategy and power calculator informs genome-wide associations study design for microbial pathogens.

Farhat MR, Shapiro BJ, Sheppard SK, Colijn C, Murray M - Genome Med (2014)

Bottom Line: Here we consider general methodological questions related to sampling and analysis focusing on clonal to moderately recombining pathogens.We propose that a matched sampling scheme constitutes an efficient study design, and provide a power calculator based on phylogenetic convergence.We demonstrate this approach by applying it to genomic datasets for two microbial pathogens: Mycobacterium tuberculosis and Campylobacter species.

View Article: PubMed Central - PubMed

Affiliation: Department of Pulmonary and Critical Care, Massachusetts General Hospital, Harvard Medical School, Boston, MA USA ; Department of Global Health and Social Medicine, Harvard Medical School, 641 Huntington Avenue Suite 4A, Boston, MA 02115 USA.

ABSTRACT
Whole genome sequencing is increasingly used to study phenotypic variation among infectious pathogens and to evaluate their relative transmissibility, virulence, and immunogenicity. To date, relatively little has been published on how and how many pathogen strains should be selected for studies associating phenotype and genotype. There are specific challenges when identifying genetic associations in bacteria which often comprise highly structured populations. Here we consider general methodological questions related to sampling and analysis focusing on clonal to moderately recombining pathogens. We propose that a matched sampling scheme constitutes an efficient study design, and provide a power calculator based on phylogenetic convergence. We demonstrate this approach by applying it to genomic datasets for two microbial pathogens: Mycobacterium tuberculosis and Campylobacter species.

No MeSH data available.


Related in: MedlinePlus

Demonstration of the selection strategy. (A) Example initial MIRU-VNTR phylogeny constructed for selection of strains for sequencing and analysis. Grey circles represent strains with the phenotype of interest (ph+ strains), the white circles represent strains without the phenotype of interest (ph- strains). The Table with columns L1-5 represent the variable number of tandem repeat at each locus L. (B) Example of selection methodology: For each ph+ strain (grey circle) a neighboring ph- strain is selected such that the distance between the two strains in the phylogeny is minimized. Each control or study strain is only sampled once. The resultant tree of selected strains will consist of matched study and control strains.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4256898&req=5

Fig2: Demonstration of the selection strategy. (A) Example initial MIRU-VNTR phylogeny constructed for selection of strains for sequencing and analysis. Grey circles represent strains with the phenotype of interest (ph+ strains), the white circles represent strains without the phenotype of interest (ph- strains). The Table with columns L1-5 represent the variable number of tandem repeat at each locus L. (B) Example of selection methodology: For each ph+ strain (grey circle) a neighboring ph- strain is selected such that the distance between the two strains in the phylogeny is minimized. Each control or study strain is only sampled once. The resultant tree of selected strains will consist of matched study and control strains.

Mentions: Assuming a binary phenotype of interest that has been clearly defined, we propose to match strains using data from traditional strain typing such as pulsed-field gel electrophoresis and multi-locus sequence typing that is often already available for the banked strains, especially under surveillance for public health purposes. Using this lower resolution typing data, a phylogenetic tree can be constructed, accounting for recombination as needed using methods such as ClonalFrame [16,25]. Figure 2A displays a hypothetical tree topology obtained for a sample of 16 MTB clinical strains constructed using their MIRU-VNTR pattern [52]. Figure 2B demonstrates the matched sampling strategy. For each phenotype positive (ph+) strain, a neighboring phenotype negative (ph-) strain is selected such that the phylogenetic distance between the pair of strains is minimized. Only one ph- and one ph+ strain is sampled per clade. If more than one strain is equidistant, then one is selected at random. The larger phylogenetic tree is thus reduced to a set of matched ph+ and ph- pairs.Figure 2


A phylogeny-based sampling strategy and power calculator informs genome-wide associations study design for microbial pathogens.

Farhat MR, Shapiro BJ, Sheppard SK, Colijn C, Murray M - Genome Med (2014)

Demonstration of the selection strategy. (A) Example initial MIRU-VNTR phylogeny constructed for selection of strains for sequencing and analysis. Grey circles represent strains with the phenotype of interest (ph+ strains), the white circles represent strains without the phenotype of interest (ph- strains). The Table with columns L1-5 represent the variable number of tandem repeat at each locus L. (B) Example of selection methodology: For each ph+ strain (grey circle) a neighboring ph- strain is selected such that the distance between the two strains in the phylogeny is minimized. Each control or study strain is only sampled once. The resultant tree of selected strains will consist of matched study and control strains.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4256898&req=5

Fig2: Demonstration of the selection strategy. (A) Example initial MIRU-VNTR phylogeny constructed for selection of strains for sequencing and analysis. Grey circles represent strains with the phenotype of interest (ph+ strains), the white circles represent strains without the phenotype of interest (ph- strains). The Table with columns L1-5 represent the variable number of tandem repeat at each locus L. (B) Example of selection methodology: For each ph+ strain (grey circle) a neighboring ph- strain is selected such that the distance between the two strains in the phylogeny is minimized. Each control or study strain is only sampled once. The resultant tree of selected strains will consist of matched study and control strains.
Mentions: Assuming a binary phenotype of interest that has been clearly defined, we propose to match strains using data from traditional strain typing such as pulsed-field gel electrophoresis and multi-locus sequence typing that is often already available for the banked strains, especially under surveillance for public health purposes. Using this lower resolution typing data, a phylogenetic tree can be constructed, accounting for recombination as needed using methods such as ClonalFrame [16,25]. Figure 2A displays a hypothetical tree topology obtained for a sample of 16 MTB clinical strains constructed using their MIRU-VNTR pattern [52]. Figure 2B demonstrates the matched sampling strategy. For each phenotype positive (ph+) strain, a neighboring phenotype negative (ph-) strain is selected such that the phylogenetic distance between the pair of strains is minimized. Only one ph- and one ph+ strain is sampled per clade. If more than one strain is equidistant, then one is selected at random. The larger phylogenetic tree is thus reduced to a set of matched ph+ and ph- pairs.Figure 2

Bottom Line: Here we consider general methodological questions related to sampling and analysis focusing on clonal to moderately recombining pathogens.We propose that a matched sampling scheme constitutes an efficient study design, and provide a power calculator based on phylogenetic convergence.We demonstrate this approach by applying it to genomic datasets for two microbial pathogens: Mycobacterium tuberculosis and Campylobacter species.

View Article: PubMed Central - PubMed

Affiliation: Department of Pulmonary and Critical Care, Massachusetts General Hospital, Harvard Medical School, Boston, MA USA ; Department of Global Health and Social Medicine, Harvard Medical School, 641 Huntington Avenue Suite 4A, Boston, MA 02115 USA.

ABSTRACT
Whole genome sequencing is increasingly used to study phenotypic variation among infectious pathogens and to evaluate their relative transmissibility, virulence, and immunogenicity. To date, relatively little has been published on how and how many pathogen strains should be selected for studies associating phenotype and genotype. There are specific challenges when identifying genetic associations in bacteria which often comprise highly structured populations. Here we consider general methodological questions related to sampling and analysis focusing on clonal to moderately recombining pathogens. We propose that a matched sampling scheme constitutes an efficient study design, and provide a power calculator based on phylogenetic convergence. We demonstrate this approach by applying it to genomic datasets for two microbial pathogens: Mycobacterium tuberculosis and Campylobacter species.

No MeSH data available.


Related in: MedlinePlus