Limits...
Long-Read Single Molecule Sequencing to Resolve Tandem Gene Copies: The Mst77Y Region on the Drosophila melanogaster Y Chromosome.

Krsticevic FJ, Schrago CG, Carvalho AB - G3 (Bethesda) (2015)

Bottom Line: We found that the Mst77Y region spans 96 kb and originated from a 3.4-kb transposition from chromosome 3L to the Y chromosome, followed by tandem duplications inside the Y chromosome and invasion of transposable elements, which account for 48% of its length.Twelve of the 18 Mst77Y genes found in 2010 were confirmed in the PacBio assembly, the remaining six being polymerase chain reaction-induced artifacts.Besides providing a detailed picture of the Mst77Y region, our results highlight the utility of PacBio technology in assembling difficult genomic regions such as tandemly repeated genes.

View Article: PubMed Central - PubMed

Affiliation: Centro Internacional Franco Argentino de Ciencias de la Información y de Sistemas, CONICET, Ocampo y Esmeralda, S2000EZP Rosario, Argentina.

No MeSH data available.


Related in: MedlinePlus

General view of the Mst77Y region (MHAP assembly). All 18 Mst77Y genes are located in a single contig (JSAE01000257). Gene names were abridged (Mst77Y-1 as “Y1,” Mst77Y-17ψ as “Y17,” and so forth). All genes have the same orientation (not visible at this scale). The red tick near 110 kb marks the unmatched k-mer found in this region (caused by a C/T substitution in an intergenic region). The pseudogenes of Pka-R1 and CG3618, which flank each Mst77Y gene, were omitted for the sake of clarity. Repeats (mostly retrotransposons) occupy 48% of the sequence.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4478544&req=5

fig1: General view of the Mst77Y region (MHAP assembly). All 18 Mst77Y genes are located in a single contig (JSAE01000257). Gene names were abridged (Mst77Y-1 as “Y1,” Mst77Y-17ψ as “Y17,” and so forth). All genes have the same orientation (not visible at this scale). The red tick near 110 kb marks the unmatched k-mer found in this region (caused by a C/T substitution in an intergenic region). The pseudogenes of Pka-R1 and CG3618, which flank each Mst77Y gene, were omitted for the sake of clarity. Repeats (mostly retrotransposons) occupy 48% of the sequence.

Mentions: As summarized in Figure 1, the Mst77Y genes are located in tandem over 96 kb, with the same orientation. Some genes are present in identical multiple copies: Mst77Y-4 and Mst77Y-12 have three copies, whereas Mst77Y-6ψ and Mst77Y-7 have two copies. As Krsticevic et al. (2010) noted, the “gene sequence variant counting” method they used could not detect identical copies, so their discovery is somewhat expected. On the other hand, we could not find six genes described in Krsticevic et al. (2010): Mst77Y-2, Mst77Y-5ψ, Mst77Y-8, Mst77Y-9, Mst77Y-11ψ, and Mst77Y-14ψ. These missing genes may be due a misassembly in MHAP or to an experimental artifact in Krsticevic et al. (2010). Two lines of evidence strongly suggest that the second hypothesis is true. First, these six genes also are missing in the other assemblies listed in Table 1. Second, supposing that they were misassembled in MHAP, they must be present in the PacBio reads, because of their high coverage of the genome (∼90× for the autosomes, 45× for the sex-chromosomes). Therefore, we aligned with bwa these raw reads to the 18 Mst77Y genes described by Krsticevic et al. (2010), plus the autosomal Mst77F, and measured the coverage of each gene. The result (Figure 2) is a stunning confirmation of the findings reported above: the six missing genes are absent from the reads (their coverage is essentially zero). Furthermore, the multiple copy Mst77Y genes have a much greater coverage, similar to the autosomal (hence, diploid) Mst77F, whereas the remaining Mst77Y genes have the lower coverage expected for single-copy Y-linked genes (hence, haploid). We have not carried an analogous test using Illumina reads because they are too short to be unambiguously mapped to each Mst77Y gene.


Long-Read Single Molecule Sequencing to Resolve Tandem Gene Copies: The Mst77Y Region on the Drosophila melanogaster Y Chromosome.

Krsticevic FJ, Schrago CG, Carvalho AB - G3 (Bethesda) (2015)

General view of the Mst77Y region (MHAP assembly). All 18 Mst77Y genes are located in a single contig (JSAE01000257). Gene names were abridged (Mst77Y-1 as “Y1,” Mst77Y-17ψ as “Y17,” and so forth). All genes have the same orientation (not visible at this scale). The red tick near 110 kb marks the unmatched k-mer found in this region (caused by a C/T substitution in an intergenic region). The pseudogenes of Pka-R1 and CG3618, which flank each Mst77Y gene, were omitted for the sake of clarity. Repeats (mostly retrotransposons) occupy 48% of the sequence.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4478544&req=5

fig1: General view of the Mst77Y region (MHAP assembly). All 18 Mst77Y genes are located in a single contig (JSAE01000257). Gene names were abridged (Mst77Y-1 as “Y1,” Mst77Y-17ψ as “Y17,” and so forth). All genes have the same orientation (not visible at this scale). The red tick near 110 kb marks the unmatched k-mer found in this region (caused by a C/T substitution in an intergenic region). The pseudogenes of Pka-R1 and CG3618, which flank each Mst77Y gene, were omitted for the sake of clarity. Repeats (mostly retrotransposons) occupy 48% of the sequence.
Mentions: As summarized in Figure 1, the Mst77Y genes are located in tandem over 96 kb, with the same orientation. Some genes are present in identical multiple copies: Mst77Y-4 and Mst77Y-12 have three copies, whereas Mst77Y-6ψ and Mst77Y-7 have two copies. As Krsticevic et al. (2010) noted, the “gene sequence variant counting” method they used could not detect identical copies, so their discovery is somewhat expected. On the other hand, we could not find six genes described in Krsticevic et al. (2010): Mst77Y-2, Mst77Y-5ψ, Mst77Y-8, Mst77Y-9, Mst77Y-11ψ, and Mst77Y-14ψ. These missing genes may be due a misassembly in MHAP or to an experimental artifact in Krsticevic et al. (2010). Two lines of evidence strongly suggest that the second hypothesis is true. First, these six genes also are missing in the other assemblies listed in Table 1. Second, supposing that they were misassembled in MHAP, they must be present in the PacBio reads, because of their high coverage of the genome (∼90× for the autosomes, 45× for the sex-chromosomes). Therefore, we aligned with bwa these raw reads to the 18 Mst77Y genes described by Krsticevic et al. (2010), plus the autosomal Mst77F, and measured the coverage of each gene. The result (Figure 2) is a stunning confirmation of the findings reported above: the six missing genes are absent from the reads (their coverage is essentially zero). Furthermore, the multiple copy Mst77Y genes have a much greater coverage, similar to the autosomal (hence, diploid) Mst77F, whereas the remaining Mst77Y genes have the lower coverage expected for single-copy Y-linked genes (hence, haploid). We have not carried an analogous test using Illumina reads because they are too short to be unambiguously mapped to each Mst77Y gene.

Bottom Line: We found that the Mst77Y region spans 96 kb and originated from a 3.4-kb transposition from chromosome 3L to the Y chromosome, followed by tandem duplications inside the Y chromosome and invasion of transposable elements, which account for 48% of its length.Twelve of the 18 Mst77Y genes found in 2010 were confirmed in the PacBio assembly, the remaining six being polymerase chain reaction-induced artifacts.Besides providing a detailed picture of the Mst77Y region, our results highlight the utility of PacBio technology in assembling difficult genomic regions such as tandemly repeated genes.

View Article: PubMed Central - PubMed

Affiliation: Centro Internacional Franco Argentino de Ciencias de la Información y de Sistemas, CONICET, Ocampo y Esmeralda, S2000EZP Rosario, Argentina.

No MeSH data available.


Related in: MedlinePlus