Limits...
Characterization and modeling of the Haemophilus influenzae core and supragenomes based on the complete genomic sequences of Rd and 12 clinical nontypeable strains.

Hogg JS, Hu FZ, Janto B, Boissy R, Hayes J, Keefe R, Post JC, Ehrlich GD - Genome Biol. (2007)

Bottom Line: This relatively large number of small rearrangements among strains is in keeping with what is known about the transformation mechanisms in this naturally competent pathogen.A finite supragenome model was developed to explain the distribution of genes among strains.The model predicts that the NTHi supragenome contains between 4,425 and 6,052 genes with most uncertainty regarding the number of rare genes, those that have a frequency of <0.1 among strains; collectively, these results support the DGH.

View Article: PubMed Central - HTML - PubMed

Affiliation: Allegheny General Hospital, Allegheny-Singer Research Institute, Center for Genomic Sciences, Pittsburgh, Pennsylvania 15212, USA.

ABSTRACT

Background: The distributed genome hypothesis (DGH) posits that chronic bacterial pathogens utilize polyclonal infection and reassortment of genic characters to ensure persistence in the face of adaptive host defenses. Studies based on random sequencing of multiple strain libraries suggested that free-living bacterial species possess a supragenome that is much larger than the genome of any single bacterium.

Results: We derived high depth genomic coverage of nine nontypeable Haemophilus influenzae (NTHi) clinical isolates, bringing to 13 the number of sequenced NTHi genomes. Clustering identified 2,786 genes, of which 1,461 were common to all strains, with each of the remaining 1,328 found in a subset of strains; the number of clusters ranged from 1,686 to 1,878 per strain. Genic differences of between 96 and 585 were identified per strain pair. Comparisons of each of the NTHi strains with the Rd strain revealed between 107 and 158 insertions and 100 and 213 deletions per genome. The mean insertion and deletion sizes were 1,356 and 1,020 base-pairs, respectively, with mean maximum insertions and deletions of 26,977 and 37,299 base-pairs. This relatively large number of small rearrangements among strains is in keeping with what is known about the transformation mechanisms in this naturally competent pathogen.

Conclusion: A finite supragenome model was developed to explain the distribution of genes among strains. The model predicts that the NTHi supragenome contains between 4,425 and 6,052 genes with most uncertainty regarding the number of rare genes, those that have a frequency of <0.1 among strains; collectively, these results support the DGH.

Show MeSH

Related in: MedlinePlus

A plate diagram of the H. influenzae supragenome model. Each node in the diagram represents a random variable, and the arrows indicate dependence between the variables. Independent, identically distributed (IID) nodes appear in boxes with an index listed in the corner.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2394751&req=5

Figure 13: A plate diagram of the H. influenzae supragenome model. Each node in the diagram represents a random variable, and the arrows indicate dependence between the variables. Independent, identically distributed (IID) nodes appear in boxes with an index listed in the corner.

Mentions: The complete model is depicted in plate notation in Figure 13. 'Z' is the hidden class variable in which zn corresponds to the class of gene n. 'X' is the observed gene variable, where xn,s corresponds to the presence or absence of gene n in strain s. The outer plate represents the supragenome, while the inner plate represents instances of specific genomes. The model requires 2 × K + 2 parameters: N, K, a mixture coefficient πk for each class, and a Bernoulli probability μk for each class. The number of gene classes, K, and their associated Bernoulli probabilities, μk, are fixed in advance. Care must be taken to choose classes that represent low and high population frequencies. Seven classes were selected for this study (K = 7) with associated probabilities μ = <0.01, 0.1, 0.3, 0.5, 0.7, 0.9, 1.0>. The class with probability 1.00 represents 'core' genes that appear in all strains.


Characterization and modeling of the Haemophilus influenzae core and supragenomes based on the complete genomic sequences of Rd and 12 clinical nontypeable strains.

Hogg JS, Hu FZ, Janto B, Boissy R, Hayes J, Keefe R, Post JC, Ehrlich GD - Genome Biol. (2007)

A plate diagram of the H. influenzae supragenome model. Each node in the diagram represents a random variable, and the arrows indicate dependence between the variables. Independent, identically distributed (IID) nodes appear in boxes with an index listed in the corner.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2394751&req=5

Figure 13: A plate diagram of the H. influenzae supragenome model. Each node in the diagram represents a random variable, and the arrows indicate dependence between the variables. Independent, identically distributed (IID) nodes appear in boxes with an index listed in the corner.
Mentions: The complete model is depicted in plate notation in Figure 13. 'Z' is the hidden class variable in which zn corresponds to the class of gene n. 'X' is the observed gene variable, where xn,s corresponds to the presence or absence of gene n in strain s. The outer plate represents the supragenome, while the inner plate represents instances of specific genomes. The model requires 2 × K + 2 parameters: N, K, a mixture coefficient πk for each class, and a Bernoulli probability μk for each class. The number of gene classes, K, and their associated Bernoulli probabilities, μk, are fixed in advance. Care must be taken to choose classes that represent low and high population frequencies. Seven classes were selected for this study (K = 7) with associated probabilities μ = <0.01, 0.1, 0.3, 0.5, 0.7, 0.9, 1.0>. The class with probability 1.00 represents 'core' genes that appear in all strains.

Bottom Line: This relatively large number of small rearrangements among strains is in keeping with what is known about the transformation mechanisms in this naturally competent pathogen.A finite supragenome model was developed to explain the distribution of genes among strains.The model predicts that the NTHi supragenome contains between 4,425 and 6,052 genes with most uncertainty regarding the number of rare genes, those that have a frequency of <0.1 among strains; collectively, these results support the DGH.

View Article: PubMed Central - HTML - PubMed

Affiliation: Allegheny General Hospital, Allegheny-Singer Research Institute, Center for Genomic Sciences, Pittsburgh, Pennsylvania 15212, USA.

ABSTRACT

Background: The distributed genome hypothesis (DGH) posits that chronic bacterial pathogens utilize polyclonal infection and reassortment of genic characters to ensure persistence in the face of adaptive host defenses. Studies based on random sequencing of multiple strain libraries suggested that free-living bacterial species possess a supragenome that is much larger than the genome of any single bacterium.

Results: We derived high depth genomic coverage of nine nontypeable Haemophilus influenzae (NTHi) clinical isolates, bringing to 13 the number of sequenced NTHi genomes. Clustering identified 2,786 genes, of which 1,461 were common to all strains, with each of the remaining 1,328 found in a subset of strains; the number of clusters ranged from 1,686 to 1,878 per strain. Genic differences of between 96 and 585 were identified per strain pair. Comparisons of each of the NTHi strains with the Rd strain revealed between 107 and 158 insertions and 100 and 213 deletions per genome. The mean insertion and deletion sizes were 1,356 and 1,020 base-pairs, respectively, with mean maximum insertions and deletions of 26,977 and 37,299 base-pairs. This relatively large number of small rearrangements among strains is in keeping with what is known about the transformation mechanisms in this naturally competent pathogen.

Conclusion: A finite supragenome model was developed to explain the distribution of genes among strains. The model predicts that the NTHi supragenome contains between 4,425 and 6,052 genes with most uncertainty regarding the number of rare genes, those that have a frequency of <0.1 among strains; collectively, these results support the DGH.

Show MeSH
Related in: MedlinePlus