Limits...
Chromosome level assembly of the hybrid Trypanosoma cruzi genome.

Weatherly DB, Boehlke C, Tarleton RL - BMC Genomics (2009)

Bottom Line: The approach was substantiated through the use of Southern blot analysis to confirm the mapping of BAC clones using as probes the genes they are predicted to contain, and each chromosome construction was visually validated to ensure sufficient evidence was present to support the organization.Now assembled, these chromosomes bring T. cruzi to the same level of organization as its kinetoplastid relatives and have been used as the basis for the T. cruzi genome in TriTrypDB, a trypanosome database of EuPathDB.In addition, they will provide the foundation for analyses such as reverse genetics, where the location of genes and their alleles and/or paralogues is necessary and comparative genome hybridization analyses (CGH), where a chromosome-level view of the genome is ideal.

View Article: PubMed Central - HTML - PubMed

Affiliation: Center for Tropical and Emerging Global Diseases, University of Georgia, Athens, GA, USA. dbrentw@uga.edu

ABSTRACT

Background: In contrast to the essentially fully assembled genome sequences of the kinetoplastid pathogens Leishmania major and Trypanosoma brucei the assembly of the Trypanosoma cruzi genome has been hindered by its repetitive nature and the fact that the reference strain (CL Brener) is a hybrid of two distinct lineages. In this work, the majority of the contigs and scaffolds were assembled into pairs of homologous chromosomes based on predicted parental haplotype, inference from TriTryp synteny maps and the use of end sequences from T. cruzi BAC libraries.

Results: Ultimately, 41 pairs of chromosomes were assembled using this approach, a number in agreement with the predicted number of T. cruzi chromosomes based upon pulse field gel analysis, with over 90% (21133 of 23216) of the genes annotated in the genome represented. The approach was substantiated through the use of Southern blot analysis to confirm the mapping of BAC clones using as probes the genes they are predicted to contain, and each chromosome construction was visually validated to ensure sufficient evidence was present to support the organization. While many members of large gene families are incorporated into the chromosome assemblies, the majority of genes excluded from the chromosomes belong to gene families, as these genes are frequently impossible to accurately position.

Conclusion: Now assembled, these chromosomes bring T. cruzi to the same level of organization as its kinetoplastid relatives and have been used as the basis for the T. cruzi genome in TriTrypDB, a trypanosome database of EuPathDB. In addition, they will provide the foundation for analyses such as reverse genetics, where the location of genes and their alleles and/or paralogues is necessary and comparative genome hybridization analyses (CGH), where a chromosome-level view of the genome is ideal.

Show MeSH

Related in: MedlinePlus

Differentiation of alleles from paralogues. a) The solid boxes highlight assembly issues with the current genome. The solid gray boxes (locus "A") show a helicase gene where 1 allele is fully sequenced on the P chromosome and the other is in 2 pieces on S. The solid brown boxes (locus "B") show a hypothetical protein with no fully sequenced copy: the 2 genes on the P chromosome and 2 genes on the S chromosome both reside at contig boundaries and should be merged. Finally, the solid blue boxes (locus "C") show a case of 2 copies (at least partially sequenced) on the P chromosome and 1 presumed fully sequenced copy on S of the same hypothetical protein. In all cases, the dotted line boxes indicate the predicted correct coding sequences. Note that the circled genes show other cases of gene truncations at contig boundaries. b) Alignment of 3 annotated genes in the A locus (a helicase gene). The A1 allele is full length, while A2 and A3 are small pieces which exist on the ends of contigs and are not fully sequenced. Note that A2 has near perfect identity with the N-terminus of A1, while A3 has perfect identity with the C-terminus suggesting that these are pieces of the same gene.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2698008&req=5

Figure 4: Differentiation of alleles from paralogues. a) The solid boxes highlight assembly issues with the current genome. The solid gray boxes (locus "A") show a helicase gene where 1 allele is fully sequenced on the P chromosome and the other is in 2 pieces on S. The solid brown boxes (locus "B") show a hypothetical protein with no fully sequenced copy: the 2 genes on the P chromosome and 2 genes on the S chromosome both reside at contig boundaries and should be merged. Finally, the solid blue boxes (locus "C") show a case of 2 copies (at least partially sequenced) on the P chromosome and 1 presumed fully sequenced copy on S of the same hypothetical protein. In all cases, the dotted line boxes indicate the predicted correct coding sequences. Note that the circled genes show other cases of gene truncations at contig boundaries. b) Alignment of 3 annotated genes in the A locus (a helicase gene). The A1 allele is full length, while A2 and A3 are small pieces which exist on the ends of contigs and are not fully sequenced. Note that A2 has near perfect identity with the N-terminus of A1, while A3 has perfect identity with the C-terminus suggesting that these are pieces of the same gene.

Mentions: Figure 4 illustrates how the assembled T. cruzi chromosomes can help clarify gene assembly issues. In the region of TcChr39 (Figure 4a) the genes flanking those labeled A through C show strong synteny between the assembled P and S chromosomes. However, genes A and C each have one copy on one of the homologous chromosomes but two annotated genes on the corresponding chromosome, while B has 2 differently sized copies on each chromosome. In all cases, it appears that the "2 copies" are actually truncated pieces of the full-length gene, as shown by sequence alignment for gene A, which encodes a helicase protein (Figure 4b). These truncations all occur at contig boundaries – and indeed there are at least 6 additional sets of truncations apparent in this one section of chromosome. Careful comparisons of the assembled chromosomes should allow for many of these events to be corrected.


Chromosome level assembly of the hybrid Trypanosoma cruzi genome.

Weatherly DB, Boehlke C, Tarleton RL - BMC Genomics (2009)

Differentiation of alleles from paralogues. a) The solid boxes highlight assembly issues with the current genome. The solid gray boxes (locus "A") show a helicase gene where 1 allele is fully sequenced on the P chromosome and the other is in 2 pieces on S. The solid brown boxes (locus "B") show a hypothetical protein with no fully sequenced copy: the 2 genes on the P chromosome and 2 genes on the S chromosome both reside at contig boundaries and should be merged. Finally, the solid blue boxes (locus "C") show a case of 2 copies (at least partially sequenced) on the P chromosome and 1 presumed fully sequenced copy on S of the same hypothetical protein. In all cases, the dotted line boxes indicate the predicted correct coding sequences. Note that the circled genes show other cases of gene truncations at contig boundaries. b) Alignment of 3 annotated genes in the A locus (a helicase gene). The A1 allele is full length, while A2 and A3 are small pieces which exist on the ends of contigs and are not fully sequenced. Note that A2 has near perfect identity with the N-terminus of A1, while A3 has perfect identity with the C-terminus suggesting that these are pieces of the same gene.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2698008&req=5

Figure 4: Differentiation of alleles from paralogues. a) The solid boxes highlight assembly issues with the current genome. The solid gray boxes (locus "A") show a helicase gene where 1 allele is fully sequenced on the P chromosome and the other is in 2 pieces on S. The solid brown boxes (locus "B") show a hypothetical protein with no fully sequenced copy: the 2 genes on the P chromosome and 2 genes on the S chromosome both reside at contig boundaries and should be merged. Finally, the solid blue boxes (locus "C") show a case of 2 copies (at least partially sequenced) on the P chromosome and 1 presumed fully sequenced copy on S of the same hypothetical protein. In all cases, the dotted line boxes indicate the predicted correct coding sequences. Note that the circled genes show other cases of gene truncations at contig boundaries. b) Alignment of 3 annotated genes in the A locus (a helicase gene). The A1 allele is full length, while A2 and A3 are small pieces which exist on the ends of contigs and are not fully sequenced. Note that A2 has near perfect identity with the N-terminus of A1, while A3 has perfect identity with the C-terminus suggesting that these are pieces of the same gene.
Mentions: Figure 4 illustrates how the assembled T. cruzi chromosomes can help clarify gene assembly issues. In the region of TcChr39 (Figure 4a) the genes flanking those labeled A through C show strong synteny between the assembled P and S chromosomes. However, genes A and C each have one copy on one of the homologous chromosomes but two annotated genes on the corresponding chromosome, while B has 2 differently sized copies on each chromosome. In all cases, it appears that the "2 copies" are actually truncated pieces of the full-length gene, as shown by sequence alignment for gene A, which encodes a helicase protein (Figure 4b). These truncations all occur at contig boundaries – and indeed there are at least 6 additional sets of truncations apparent in this one section of chromosome. Careful comparisons of the assembled chromosomes should allow for many of these events to be corrected.

Bottom Line: The approach was substantiated through the use of Southern blot analysis to confirm the mapping of BAC clones using as probes the genes they are predicted to contain, and each chromosome construction was visually validated to ensure sufficient evidence was present to support the organization.Now assembled, these chromosomes bring T. cruzi to the same level of organization as its kinetoplastid relatives and have been used as the basis for the T. cruzi genome in TriTrypDB, a trypanosome database of EuPathDB.In addition, they will provide the foundation for analyses such as reverse genetics, where the location of genes and their alleles and/or paralogues is necessary and comparative genome hybridization analyses (CGH), where a chromosome-level view of the genome is ideal.

View Article: PubMed Central - HTML - PubMed

Affiliation: Center for Tropical and Emerging Global Diseases, University of Georgia, Athens, GA, USA. dbrentw@uga.edu

ABSTRACT

Background: In contrast to the essentially fully assembled genome sequences of the kinetoplastid pathogens Leishmania major and Trypanosoma brucei the assembly of the Trypanosoma cruzi genome has been hindered by its repetitive nature and the fact that the reference strain (CL Brener) is a hybrid of two distinct lineages. In this work, the majority of the contigs and scaffolds were assembled into pairs of homologous chromosomes based on predicted parental haplotype, inference from TriTryp synteny maps and the use of end sequences from T. cruzi BAC libraries.

Results: Ultimately, 41 pairs of chromosomes were assembled using this approach, a number in agreement with the predicted number of T. cruzi chromosomes based upon pulse field gel analysis, with over 90% (21133 of 23216) of the genes annotated in the genome represented. The approach was substantiated through the use of Southern blot analysis to confirm the mapping of BAC clones using as probes the genes they are predicted to contain, and each chromosome construction was visually validated to ensure sufficient evidence was present to support the organization. While many members of large gene families are incorporated into the chromosome assemblies, the majority of genes excluded from the chromosomes belong to gene families, as these genes are frequently impossible to accurately position.

Conclusion: Now assembled, these chromosomes bring T. cruzi to the same level of organization as its kinetoplastid relatives and have been used as the basis for the T. cruzi genome in TriTrypDB, a trypanosome database of EuPathDB. In addition, they will provide the foundation for analyses such as reverse genetics, where the location of genes and their alleles and/or paralogues is necessary and comparative genome hybridization analyses (CGH), where a chromosome-level view of the genome is ideal.

Show MeSH
Related in: MedlinePlus