Limits...
Complexity of the MSG gene family of Pneumocystis carinii.

Keely SP, Stringer JR - BMC Genomics (2009)

Bottom Line: Accounting for error reduced the number of truly distinct sequences observed to 158, roughly twice the number expected if the gene family contains 80 members.A set of sequences that represents most if not all of the members of the P. carinii MSG gene family was obtained.The protein-changing nature of the variation among these sequences suggests that the family has been shaped by selection for protein variation, which is consistent with the hypothesis that the MSG gene family functions to enhance phenotypic variation among the members of a population of P. carinii.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Molecular Genetics, Biochemistry and Microbiology, University of Cincinnati College of Medicine, Cincinnati, Ohio, 45220, USA. keelysp@ucmail.uc.edu

ABSTRACT

Background: The relationship between the parasitic fungus Pneumocystis carinii and its host, the laboratory rat, presumably involves features that allow the fungus to circumvent attacks by the immune system. It is hypothesized that the major surface glycoprotein (MSG) gene family endows Pneumocystis with the capacity to vary its surface. This gene family is comprised of approximately 80 genes, which each are approximately 3 kb long. Expression of the MSG gene family is regulated by a cis-dependent mechanism that involves a unique telomeric site in the genome called the expression site. Only the MSG gene adjacent to the expression site is represented by messenger RNA. Several P. carinii MSG genes have been sequenced, which showed that genes in the family can encode distinct isoforms of MSG. The vast majority of family members have not been characterized at the sequence level.

Results: The first 300 basepairs of MSG genes were subjected to analysis herein. Analysis of 581 MSG sequence reads from P. carinii genomic DNA yielded 281 different sequences. However, many of the sequence reads differed from others at only one site, a degree of variation consistent with that expected to be caused by error. Accounting for error reduced the number of truly distinct sequences observed to 158, roughly twice the number expected if the gene family contains 80 members. The size of the gene family was verified by PCR. The excess of distinct sequences appeared to be due to allelic variation. Discounting alleles, there were 73 different MSG genes observed. The 73 genes differed by 19% on average. Variable regions were rich in nucleotide differences that changed the encoded protein. The genes shared three regions in which at least 16 consecutive basepairs were invariant. There were numerous cases where two different genes were identical within a region that was variable among family members as a whole, suggesting recombination among family members.

Conclusion: A set of sequences that represents most if not all of the members of the P. carinii MSG gene family was obtained. The protein-changing nature of the variation among these sequences suggests that the family has been shaped by selection for protein variation, which is consistent with the hypothesis that the MSG gene family functions to enhance phenotypic variation among the members of a population of P. carinii.

Show MeSH
Location and types of variation exhibited by closely-related sequence reads. Data derived from the top three groups of sequence reads (Table 1) are shown. An open bar represents a group of aligned sequences. The black boxes within an open bar indicate the locations where variable bases were observed among the reads in the group. The sequences observed are shown above each black box. Dots represent identity.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2743713&req=5

Figure 6: Location and types of variation exhibited by closely-related sequence reads. Data derived from the top three groups of sequence reads (Table 1) are shown. An open bar represents a group of aligned sequences. The black boxes within an open bar indicate the locations where variable bases were observed among the reads in the group. The sequences observed are shown above each black box. Dots represent identity.

Mentions: Five hyper-variable regions are depicted in Figure 5. All of the hyper-variable regions exhibited a relatively high frequency of base substitution. Some hypervariable regions also exhibited frequent and extensive insertions and deletions (INDELS). To illustrate, hypervariable region 1 (HV1) began at site 28, where INDELS were very common. After the INDEL region, 15 of the next 20 nucleotide sites exhibited very frequent substitution. The types and locations of the substitutions in region HV1 are shown in Table 2, which shows the 31 different HV1 sequences that were observed in the 42 groups. Figure 6 shows that most of the nucleotide variation in groups 1, 2 and 3 occurred in HV1. In addition, in nearly half of the groups with 10 or more reads, HV1 variation was observed among the sequence reads in the group (Table 1).


Complexity of the MSG gene family of Pneumocystis carinii.

Keely SP, Stringer JR - BMC Genomics (2009)

Location and types of variation exhibited by closely-related sequence reads. Data derived from the top three groups of sequence reads (Table 1) are shown. An open bar represents a group of aligned sequences. The black boxes within an open bar indicate the locations where variable bases were observed among the reads in the group. The sequences observed are shown above each black box. Dots represent identity.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2743713&req=5

Figure 6: Location and types of variation exhibited by closely-related sequence reads. Data derived from the top three groups of sequence reads (Table 1) are shown. An open bar represents a group of aligned sequences. The black boxes within an open bar indicate the locations where variable bases were observed among the reads in the group. The sequences observed are shown above each black box. Dots represent identity.
Mentions: Five hyper-variable regions are depicted in Figure 5. All of the hyper-variable regions exhibited a relatively high frequency of base substitution. Some hypervariable regions also exhibited frequent and extensive insertions and deletions (INDELS). To illustrate, hypervariable region 1 (HV1) began at site 28, where INDELS were very common. After the INDEL region, 15 of the next 20 nucleotide sites exhibited very frequent substitution. The types and locations of the substitutions in region HV1 are shown in Table 2, which shows the 31 different HV1 sequences that were observed in the 42 groups. Figure 6 shows that most of the nucleotide variation in groups 1, 2 and 3 occurred in HV1. In addition, in nearly half of the groups with 10 or more reads, HV1 variation was observed among the sequence reads in the group (Table 1).

Bottom Line: Accounting for error reduced the number of truly distinct sequences observed to 158, roughly twice the number expected if the gene family contains 80 members.A set of sequences that represents most if not all of the members of the P. carinii MSG gene family was obtained.The protein-changing nature of the variation among these sequences suggests that the family has been shaped by selection for protein variation, which is consistent with the hypothesis that the MSG gene family functions to enhance phenotypic variation among the members of a population of P. carinii.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Molecular Genetics, Biochemistry and Microbiology, University of Cincinnati College of Medicine, Cincinnati, Ohio, 45220, USA. keelysp@ucmail.uc.edu

ABSTRACT

Background: The relationship between the parasitic fungus Pneumocystis carinii and its host, the laboratory rat, presumably involves features that allow the fungus to circumvent attacks by the immune system. It is hypothesized that the major surface glycoprotein (MSG) gene family endows Pneumocystis with the capacity to vary its surface. This gene family is comprised of approximately 80 genes, which each are approximately 3 kb long. Expression of the MSG gene family is regulated by a cis-dependent mechanism that involves a unique telomeric site in the genome called the expression site. Only the MSG gene adjacent to the expression site is represented by messenger RNA. Several P. carinii MSG genes have been sequenced, which showed that genes in the family can encode distinct isoforms of MSG. The vast majority of family members have not been characterized at the sequence level.

Results: The first 300 basepairs of MSG genes were subjected to analysis herein. Analysis of 581 MSG sequence reads from P. carinii genomic DNA yielded 281 different sequences. However, many of the sequence reads differed from others at only one site, a degree of variation consistent with that expected to be caused by error. Accounting for error reduced the number of truly distinct sequences observed to 158, roughly twice the number expected if the gene family contains 80 members. The size of the gene family was verified by PCR. The excess of distinct sequences appeared to be due to allelic variation. Discounting alleles, there were 73 different MSG genes observed. The 73 genes differed by 19% on average. Variable regions were rich in nucleotide differences that changed the encoded protein. The genes shared three regions in which at least 16 consecutive basepairs were invariant. There were numerous cases where two different genes were identical within a region that was variable among family members as a whole, suggesting recombination among family members.

Conclusion: A set of sequences that represents most if not all of the members of the P. carinii MSG gene family was obtained. The protein-changing nature of the variation among these sequences suggests that the family has been shaped by selection for protein variation, which is consistent with the hypothesis that the MSG gene family functions to enhance phenotypic variation among the members of a population of P. carinii.

Show MeSH