Limits...
Characterization of repetitive DNA landscape in wheat homeologous group 4 chromosomes.

Garbus I, Romero JR, Valarik M, Vanžurová H, Karafiátová M, Cáccamo M, Doležel J, Tranquilli G, Helguera M, Echenique V - BMC Genomics (2015)

Bottom Line: SSR frequency was found one per 24 to 27 kb for all chromosome assemblies except 4DLI, where it was three times higher.Their physical distribution in wheat genome was analyzed by fluorescent in situ hybridization (FISH) and one of them, the Carmen retrotransposon, was found specific for centromeric regions of all wheat chromosomes.The presented work is the first deep report of wheat repetitive sequences analyzed at the chromosome arm level, revealing the first insight into the repeatome of T. aestivum chromosomes of homeologous group 4.

View Article: PubMed Central - PubMed

Affiliation: CERZOS (CCT - CONICET Bahía Blanca) and Universidad Nacional del Sur, Bahía Blanca, Argentina. igarbus@criba.edu.ar.

ABSTRACT

Background: The number and complexity of repetitive elements varies between species, being in general most represented in those with larger genomes. Combining the flow-sorted chromosome arms approach to genome analysis with second generation DNA sequencing technologies provides a unique opportunity to study the repetitive portion of each chromosome, enabling comparisons among them. Additionally, different sequencing approaches may produce different depth of insight to repeatome content and structure. In this work we analyze and characterize the repetitive sequences of Triticum aestivum cv. Chinese Spring homeologous group 4 chromosome arms, obtained through Roche 454 and Illumina sequencing technologies, hereinafter marked by subscripts 454 and I, respectively. Repetitive sequences were identified with the RepeatMasker software using the interspersed repeat database mips-REdat_v9.0p. The input sequences consisted of our 4DS454 and 4DL454 scaffolds and 4ASI, 4ALI, 4BSI, 4BLI, 4DSI and 4DLI contigs, downloaded from the International Wheat Genome Sequencing Consortium (IWGSC).

Results: Repetitive sequences content varied from 55% to 63% for all chromosome arm assemblies except for 4DLI, in which the repeat content was 38%. Transposable elements, small RNA, satellites, simple repeats and low complexity sequences were analyzed. SSR frequency was found one per 24 to 27 kb for all chromosome assemblies except 4DLI, where it was three times higher. Dinucleotides and trinucleotides were the most abundant SSR repeat units. (GA)n/(TC)n was the most abundant SSR except for 4DLI where the most frequently identified SSR was (CCG/CGG)n. Retrotransposons followed by DNA transposons were the most highly represented sequence repeats, mainly composed of CACTA/En-Spm and Gypsy superfamilies, respectively. This whole chromosome sequence analysis allowed identification of three new LTR retrotransposon families belonging to the Copia superfamily, one belonging to the Gypsy superfamily and two TRIM retrotransposon families. Their physical distribution in wheat genome was analyzed by fluorescent in situ hybridization (FISH) and one of them, the Carmen retrotransposon, was found specific for centromeric regions of all wheat chromosomes.

Conclusion: The presented work is the first deep report of wheat repetitive sequences analyzed at the chromosome arm level, revealing the first insight into the repeatome of T. aestivum chromosomes of homeologous group 4.

No MeSH data available.


Annotation of novel LTR retrotransposons. a) The scheme depicts the steps followed for the identification of novel TEs, starting from the assembled 4DS and 4DL scaffolds, according to the criteria proposed by Wicker et al. [6]. b) Graphic representation of the structural features identified in each LTR retrotransposon drawn to scale. GAG: capsid proteins; AP: aspartic proteinase; INT: integrase; RT: reverse transcriptase; RH: RNase H. ChrD: Chromodomain.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4440537&req=5

Fig3: Annotation of novel LTR retrotransposons. a) The scheme depicts the steps followed for the identification of novel TEs, starting from the assembled 4DS and 4DL scaffolds, according to the criteria proposed by Wicker et al. [6]. b) Graphic representation of the structural features identified in each LTR retrotransposon drawn to scale. GAG: capsid proteins; AP: aspartic proteinase; INT: integrase; RT: reverse transcriptase; RH: RNase H. ChrD: Chromodomain.

Mentions: The whole 4DS and 4DL scaffolds were further scanned for LTR retrotransposons using the bioinformatics tools LTR_FINDER [46] and LTR_STRUC [47]. The mentioned subset of data was chosen for novel LTR identification due to the larger size when compared to the Illumina contigs, as revealed by size frequency histograms (Additional file 4: Figure S1). The LTR_FINDER and LTR_STRUC outputs lead to 234 candidate sequences (Figure 3a), that were clustered using the CD-HIT interface [48], resulting in 214 unique LTR retrotransposon candidates. After manually search for previously defined elements against MIPS database following the criteria of [6], 171 putative retrotransposons were excluded (Figure 3a). The remaining 43 candidate elements were analyzed for the presence of LTR retrotransposon features using BLASTX searches at NCBI and GyDB [49], reducing the number of candidates for newly identified retrotransposons to six (Table 4). The BLASTX analysis also revealed that likely complete transposon-related proteins were present in four out of the six candidates (JROL01007197, JROL01007734, JROL01000922 and JROL01008273), as judged by the coverage of the alignments with reported proteins, whereas the other two, JROL01006440 and JROL01007833, showed small protein fragments and thus non-coding capacity. The fact that two out of four retrotransposon protein coding regions lack stop codons whereas the other two showed only one indicate that such candidates could encode functional protein sequences. Notice that the presence of few stop codons may not directly imply the absence of functionality of a TE family since only recently inserted elements have not been subject to mutations and could be taken as functional. The identity and coverage of the alignments demonstrate that the novel LTR retrotransposons are members of known superfamilies but constitute novel LTR retrotransposon families (Table 5). Their classification was carried out following the current proposed system [6], revealing that three of the newly identified LTR retrotransposons belonged to the Copia superfamily, one was Gypsy and the other two were non autonomous terminal repeat retrotransposons in miniature (TRIMs) and thus designations were assigned to the six new families (Tables 5 and 6, Figure 3b). The insertion time of the six newly identified LTR retrotransposons was estimated based on the assumption that the sequences of the two LTRs were identical at the time of integration and accumulated point mutations independently with time. Thus, the nucleotide substitution rate between the two LTRs, considered to reflect the time elapsed since the insertion event, was estimated to be in the range of 0.27 106 to 6.11 106 years (Table 4).Figure 3


Characterization of repetitive DNA landscape in wheat homeologous group 4 chromosomes.

Garbus I, Romero JR, Valarik M, Vanžurová H, Karafiátová M, Cáccamo M, Doležel J, Tranquilli G, Helguera M, Echenique V - BMC Genomics (2015)

Annotation of novel LTR retrotransposons. a) The scheme depicts the steps followed for the identification of novel TEs, starting from the assembled 4DS and 4DL scaffolds, according to the criteria proposed by Wicker et al. [6]. b) Graphic representation of the structural features identified in each LTR retrotransposon drawn to scale. GAG: capsid proteins; AP: aspartic proteinase; INT: integrase; RT: reverse transcriptase; RH: RNase H. ChrD: Chromodomain.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4440537&req=5

Fig3: Annotation of novel LTR retrotransposons. a) The scheme depicts the steps followed for the identification of novel TEs, starting from the assembled 4DS and 4DL scaffolds, according to the criteria proposed by Wicker et al. [6]. b) Graphic representation of the structural features identified in each LTR retrotransposon drawn to scale. GAG: capsid proteins; AP: aspartic proteinase; INT: integrase; RT: reverse transcriptase; RH: RNase H. ChrD: Chromodomain.
Mentions: The whole 4DS and 4DL scaffolds were further scanned for LTR retrotransposons using the bioinformatics tools LTR_FINDER [46] and LTR_STRUC [47]. The mentioned subset of data was chosen for novel LTR identification due to the larger size when compared to the Illumina contigs, as revealed by size frequency histograms (Additional file 4: Figure S1). The LTR_FINDER and LTR_STRUC outputs lead to 234 candidate sequences (Figure 3a), that were clustered using the CD-HIT interface [48], resulting in 214 unique LTR retrotransposon candidates. After manually search for previously defined elements against MIPS database following the criteria of [6], 171 putative retrotransposons were excluded (Figure 3a). The remaining 43 candidate elements were analyzed for the presence of LTR retrotransposon features using BLASTX searches at NCBI and GyDB [49], reducing the number of candidates for newly identified retrotransposons to six (Table 4). The BLASTX analysis also revealed that likely complete transposon-related proteins were present in four out of the six candidates (JROL01007197, JROL01007734, JROL01000922 and JROL01008273), as judged by the coverage of the alignments with reported proteins, whereas the other two, JROL01006440 and JROL01007833, showed small protein fragments and thus non-coding capacity. The fact that two out of four retrotransposon protein coding regions lack stop codons whereas the other two showed only one indicate that such candidates could encode functional protein sequences. Notice that the presence of few stop codons may not directly imply the absence of functionality of a TE family since only recently inserted elements have not been subject to mutations and could be taken as functional. The identity and coverage of the alignments demonstrate that the novel LTR retrotransposons are members of known superfamilies but constitute novel LTR retrotransposon families (Table 5). Their classification was carried out following the current proposed system [6], revealing that three of the newly identified LTR retrotransposons belonged to the Copia superfamily, one was Gypsy and the other two were non autonomous terminal repeat retrotransposons in miniature (TRIMs) and thus designations were assigned to the six new families (Tables 5 and 6, Figure 3b). The insertion time of the six newly identified LTR retrotransposons was estimated based on the assumption that the sequences of the two LTRs were identical at the time of integration and accumulated point mutations independently with time. Thus, the nucleotide substitution rate between the two LTRs, considered to reflect the time elapsed since the insertion event, was estimated to be in the range of 0.27 106 to 6.11 106 years (Table 4).Figure 3

Bottom Line: SSR frequency was found one per 24 to 27 kb for all chromosome assemblies except 4DLI, where it was three times higher.Their physical distribution in wheat genome was analyzed by fluorescent in situ hybridization (FISH) and one of them, the Carmen retrotransposon, was found specific for centromeric regions of all wheat chromosomes.The presented work is the first deep report of wheat repetitive sequences analyzed at the chromosome arm level, revealing the first insight into the repeatome of T. aestivum chromosomes of homeologous group 4.

View Article: PubMed Central - PubMed

Affiliation: CERZOS (CCT - CONICET Bahía Blanca) and Universidad Nacional del Sur, Bahía Blanca, Argentina. igarbus@criba.edu.ar.

ABSTRACT

Background: The number and complexity of repetitive elements varies between species, being in general most represented in those with larger genomes. Combining the flow-sorted chromosome arms approach to genome analysis with second generation DNA sequencing technologies provides a unique opportunity to study the repetitive portion of each chromosome, enabling comparisons among them. Additionally, different sequencing approaches may produce different depth of insight to repeatome content and structure. In this work we analyze and characterize the repetitive sequences of Triticum aestivum cv. Chinese Spring homeologous group 4 chromosome arms, obtained through Roche 454 and Illumina sequencing technologies, hereinafter marked by subscripts 454 and I, respectively. Repetitive sequences were identified with the RepeatMasker software using the interspersed repeat database mips-REdat_v9.0p. The input sequences consisted of our 4DS454 and 4DL454 scaffolds and 4ASI, 4ALI, 4BSI, 4BLI, 4DSI and 4DLI contigs, downloaded from the International Wheat Genome Sequencing Consortium (IWGSC).

Results: Repetitive sequences content varied from 55% to 63% for all chromosome arm assemblies except for 4DLI, in which the repeat content was 38%. Transposable elements, small RNA, satellites, simple repeats and low complexity sequences were analyzed. SSR frequency was found one per 24 to 27 kb for all chromosome assemblies except 4DLI, where it was three times higher. Dinucleotides and trinucleotides were the most abundant SSR repeat units. (GA)n/(TC)n was the most abundant SSR except for 4DLI where the most frequently identified SSR was (CCG/CGG)n. Retrotransposons followed by DNA transposons were the most highly represented sequence repeats, mainly composed of CACTA/En-Spm and Gypsy superfamilies, respectively. This whole chromosome sequence analysis allowed identification of three new LTR retrotransposon families belonging to the Copia superfamily, one belonging to the Gypsy superfamily and two TRIM retrotransposon families. Their physical distribution in wheat genome was analyzed by fluorescent in situ hybridization (FISH) and one of them, the Carmen retrotransposon, was found specific for centromeric regions of all wheat chromosomes.

Conclusion: The presented work is the first deep report of wheat repetitive sequences analyzed at the chromosome arm level, revealing the first insight into the repeatome of T. aestivum chromosomes of homeologous group 4.

No MeSH data available.