Limits...
Independent, rapid and targeted loss of highly repetitive DNA in natural and synthetic allopolyploids of Nicotiana tabacum.

Renny-Byfield S, Kovařík A, Chester M, Nichols RA, Macas J, Novák P, Leitch AR - PLoS ONE (2012)

Bottom Line: Allopolyploidy (interspecific hybridisation and polyploidy) has played a significant role in the evolutionary history of angiosperms and can result in genomic, epigenetic and transcriptomic perturbations.We examine the immediate effects of allopolyploidy on repetitive DNA by comparing the genomes of synthetic and natural Nicotiana tabacum with diploid progenitors N. tomentosiformis (paternal progenitor) and N. sylvestris (maternal progenitor).Abundance estimates, based on sequencing depth, indicate NicCL3 is almost absent in N. sylvestris and has been dramatically reduced in copy number in the allopolyploid N. tabacum.

View Article: PubMed Central - PubMed

Affiliation: School of Biological and Chemical Sciences, Queen Mary University of London, London, United Kingdom.

ABSTRACT
Allopolyploidy (interspecific hybridisation and polyploidy) has played a significant role in the evolutionary history of angiosperms and can result in genomic, epigenetic and transcriptomic perturbations. We examine the immediate effects of allopolyploidy on repetitive DNA by comparing the genomes of synthetic and natural Nicotiana tabacum with diploid progenitors N. tomentosiformis (paternal progenitor) and N. sylvestris (maternal progenitor). Using next generation sequencing, a recently developed graph-based repeat identification pipeline, Southern blot and fluorescence in situ hybridisation (FISH) we characterise two highly repetitive DNA sequences (NicCL3 and NicCL7/30). Analysis of two independent high-throughput DNA sequencing datasets indicates NicCL3 forms 1.6-1.9% of the genome in N. tomentosiformis, sequences that occur in multiple, discontinuous tandem arrays scattered over several chromosomes. Abundance estimates, based on sequencing depth, indicate NicCL3 is almost absent in N. sylvestris and has been dramatically reduced in copy number in the allopolyploid N. tabacum. Surprisingly elimination of NicCL3 is repeated in some synthetic lines of N. tabacum in their forth generation. The retroelement NicCL7/30, which occurs interspersed with NicCL3, is also under-represented but to a much lesser degree, revealing targeted elimination of the latter. Analysis of paired-end sequencing data indicates the tandem component of NicCL3 has been preferentially removed in natural N. tabacum, increasing the proportion of the dispersed component. This occurs across multiple blocks of discontinuous repeats and based on the distribution of nucleotide similarity among NicCL3 units, was concurrent with rounds of sequence homogenisation.

Show MeSH
Structure and copy number of NicCL3.(a) Graphical 2D projection of a three dimensional network where each node represents a single 454 sequence within NicCL3. Nodes are placed according to sequence similarity, where similar sequences are placed close together, and more distantly related sequences further away. Sequence similarity is indicated by edges (connecting lines). Red nodes represent sequence reads originating from N. tomentosiformis and blue are reads originating from N. tabacum.(b) A diagrammatic representation of the consensus sequence of the most abundant contig (contig 8) of CL3, here called NicCL3. The line (top) indicates the NicCL3 monomer, the greyed regions represents those regions of the contig that are repeated because it contains part of a second monomer. Copy-number estimates (estimated by 454 read-depth) for allopolyploid N. tabacum and the progenitor diploids are shown. The approximate positions of primer sets 1 (black arrows) and primer set 2 (open arrows) are shown (see Experimental Procedures). Regions in NicCL3 matching the d and j-locus found flanking a endogenous pararetrovirus (NtoEPRV) described in [42] are highlighted in black. (c) Paired-end reads were used to determine the occurrence of dispersed NicCL3 sequence and/or insertion of other sequences within NicCL3. The proportion of solo HSPs (NicCL3 sequences whose paired read does not match NicCL3) is shown mapped along the monomer of NicCL3 contig 8 for N. tabacum and N. tomentosiformis. Note there are regions along the monomer that are more likely to be associated with sequences other than NicCL3 (solo HSPs) and that the proportion of solo HSPs is considerably higher in N. tabacum.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC3351487&req=5

pone-0036963-g001: Structure and copy number of NicCL3.(a) Graphical 2D projection of a three dimensional network where each node represents a single 454 sequence within NicCL3. Nodes are placed according to sequence similarity, where similar sequences are placed close together, and more distantly related sequences further away. Sequence similarity is indicated by edges (connecting lines). Red nodes represent sequence reads originating from N. tomentosiformis and blue are reads originating from N. tabacum.(b) A diagrammatic representation of the consensus sequence of the most abundant contig (contig 8) of CL3, here called NicCL3. The line (top) indicates the NicCL3 monomer, the greyed regions represents those regions of the contig that are repeated because it contains part of a second monomer. Copy-number estimates (estimated by 454 read-depth) for allopolyploid N. tabacum and the progenitor diploids are shown. The approximate positions of primer sets 1 (black arrows) and primer set 2 (open arrows) are shown (see Experimental Procedures). Regions in NicCL3 matching the d and j-locus found flanking a endogenous pararetrovirus (NtoEPRV) described in [42] are highlighted in black. (c) Paired-end reads were used to determine the occurrence of dispersed NicCL3 sequence and/or insertion of other sequences within NicCL3. The proportion of solo HSPs (NicCL3 sequences whose paired read does not match NicCL3) is shown mapped along the monomer of NicCL3 contig 8 for N. tabacum and N. tomentosiformis. Note there are regions along the monomer that are more likely to be associated with sequences other than NicCL3 (solo HSPs) and that the proportion of solo HSPs is considerably higher in N. tabacum.

Mentions: A graph-based clustering approach described in was used to identify and reconstruct, in silico, the major repeat types present in the genomes of N. tabacum, N. sylvestris and N. tomentosiformis as described in Renny-Byfield etal. [15]. A combined dataset of 454 sequence reads from all three species was used to generate clusters and contigs representing repetitive DNA sequences. Mutual similarities can then be visualised in graph form (Fig. 1 a and Fig 2 b) in which nodes correspond to sequence reads, and a Fruchterman-Reingold algorithm is used to position nodes. Reads that are most similar are placed closest together whilst those that are less closely related are more distal (described in detail in Novak etal. [31]). Contig assembly is performed with reads from each cluster and the contigs are named according to the number of the cluster from which they derive (X) and Nic designates Nicotiana, i.e. NicCLX. Each cluster typically generates multiple contigs, each of which is designated a number (Y), giving a format NicCLX contigY. All contigs assembled in this work are available via our websites: http://webspace.qmul.ac.uk/sbyfield/Simon_Renny-Byfield/Data.html and http://webspace.qmul.ac.uk/arleitch/Site/Home.html.


Independent, rapid and targeted loss of highly repetitive DNA in natural and synthetic allopolyploids of Nicotiana tabacum.

Renny-Byfield S, Kovařík A, Chester M, Nichols RA, Macas J, Novák P, Leitch AR - PLoS ONE (2012)

Structure and copy number of NicCL3.(a) Graphical 2D projection of a three dimensional network where each node represents a single 454 sequence within NicCL3. Nodes are placed according to sequence similarity, where similar sequences are placed close together, and more distantly related sequences further away. Sequence similarity is indicated by edges (connecting lines). Red nodes represent sequence reads originating from N. tomentosiformis and blue are reads originating from N. tabacum.(b) A diagrammatic representation of the consensus sequence of the most abundant contig (contig 8) of CL3, here called NicCL3. The line (top) indicates the NicCL3 monomer, the greyed regions represents those regions of the contig that are repeated because it contains part of a second monomer. Copy-number estimates (estimated by 454 read-depth) for allopolyploid N. tabacum and the progenitor diploids are shown. The approximate positions of primer sets 1 (black arrows) and primer set 2 (open arrows) are shown (see Experimental Procedures). Regions in NicCL3 matching the d and j-locus found flanking a endogenous pararetrovirus (NtoEPRV) described in [42] are highlighted in black. (c) Paired-end reads were used to determine the occurrence of dispersed NicCL3 sequence and/or insertion of other sequences within NicCL3. The proportion of solo HSPs (NicCL3 sequences whose paired read does not match NicCL3) is shown mapped along the monomer of NicCL3 contig 8 for N. tabacum and N. tomentosiformis. Note there are regions along the monomer that are more likely to be associated with sequences other than NicCL3 (solo HSPs) and that the proportion of solo HSPs is considerably higher in N. tabacum.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC3351487&req=5

pone-0036963-g001: Structure and copy number of NicCL3.(a) Graphical 2D projection of a three dimensional network where each node represents a single 454 sequence within NicCL3. Nodes are placed according to sequence similarity, where similar sequences are placed close together, and more distantly related sequences further away. Sequence similarity is indicated by edges (connecting lines). Red nodes represent sequence reads originating from N. tomentosiformis and blue are reads originating from N. tabacum.(b) A diagrammatic representation of the consensus sequence of the most abundant contig (contig 8) of CL3, here called NicCL3. The line (top) indicates the NicCL3 monomer, the greyed regions represents those regions of the contig that are repeated because it contains part of a second monomer. Copy-number estimates (estimated by 454 read-depth) for allopolyploid N. tabacum and the progenitor diploids are shown. The approximate positions of primer sets 1 (black arrows) and primer set 2 (open arrows) are shown (see Experimental Procedures). Regions in NicCL3 matching the d and j-locus found flanking a endogenous pararetrovirus (NtoEPRV) described in [42] are highlighted in black. (c) Paired-end reads were used to determine the occurrence of dispersed NicCL3 sequence and/or insertion of other sequences within NicCL3. The proportion of solo HSPs (NicCL3 sequences whose paired read does not match NicCL3) is shown mapped along the monomer of NicCL3 contig 8 for N. tabacum and N. tomentosiformis. Note there are regions along the monomer that are more likely to be associated with sequences other than NicCL3 (solo HSPs) and that the proportion of solo HSPs is considerably higher in N. tabacum.
Mentions: A graph-based clustering approach described in was used to identify and reconstruct, in silico, the major repeat types present in the genomes of N. tabacum, N. sylvestris and N. tomentosiformis as described in Renny-Byfield etal. [15]. A combined dataset of 454 sequence reads from all three species was used to generate clusters and contigs representing repetitive DNA sequences. Mutual similarities can then be visualised in graph form (Fig. 1 a and Fig 2 b) in which nodes correspond to sequence reads, and a Fruchterman-Reingold algorithm is used to position nodes. Reads that are most similar are placed closest together whilst those that are less closely related are more distal (described in detail in Novak etal. [31]). Contig assembly is performed with reads from each cluster and the contigs are named according to the number of the cluster from which they derive (X) and Nic designates Nicotiana, i.e. NicCLX. Each cluster typically generates multiple contigs, each of which is designated a number (Y), giving a format NicCLX contigY. All contigs assembled in this work are available via our websites: http://webspace.qmul.ac.uk/sbyfield/Simon_Renny-Byfield/Data.html and http://webspace.qmul.ac.uk/arleitch/Site/Home.html.

Bottom Line: Allopolyploidy (interspecific hybridisation and polyploidy) has played a significant role in the evolutionary history of angiosperms and can result in genomic, epigenetic and transcriptomic perturbations.We examine the immediate effects of allopolyploidy on repetitive DNA by comparing the genomes of synthetic and natural Nicotiana tabacum with diploid progenitors N. tomentosiformis (paternal progenitor) and N. sylvestris (maternal progenitor).Abundance estimates, based on sequencing depth, indicate NicCL3 is almost absent in N. sylvestris and has been dramatically reduced in copy number in the allopolyploid N. tabacum.

View Article: PubMed Central - PubMed

Affiliation: School of Biological and Chemical Sciences, Queen Mary University of London, London, United Kingdom.

ABSTRACT
Allopolyploidy (interspecific hybridisation and polyploidy) has played a significant role in the evolutionary history of angiosperms and can result in genomic, epigenetic and transcriptomic perturbations. We examine the immediate effects of allopolyploidy on repetitive DNA by comparing the genomes of synthetic and natural Nicotiana tabacum with diploid progenitors N. tomentosiformis (paternal progenitor) and N. sylvestris (maternal progenitor). Using next generation sequencing, a recently developed graph-based repeat identification pipeline, Southern blot and fluorescence in situ hybridisation (FISH) we characterise two highly repetitive DNA sequences (NicCL3 and NicCL7/30). Analysis of two independent high-throughput DNA sequencing datasets indicates NicCL3 forms 1.6-1.9% of the genome in N. tomentosiformis, sequences that occur in multiple, discontinuous tandem arrays scattered over several chromosomes. Abundance estimates, based on sequencing depth, indicate NicCL3 is almost absent in N. sylvestris and has been dramatically reduced in copy number in the allopolyploid N. tabacum. Surprisingly elimination of NicCL3 is repeated in some synthetic lines of N. tabacum in their forth generation. The retroelement NicCL7/30, which occurs interspersed with NicCL3, is also under-represented but to a much lesser degree, revealing targeted elimination of the latter. Analysis of paired-end sequencing data indicates the tandem component of NicCL3 has been preferentially removed in natural N. tabacum, increasing the proportion of the dispersed component. This occurs across multiple blocks of discontinuous repeats and based on the distribution of nucleotide similarity among NicCL3 units, was concurrent with rounds of sequence homogenisation.

Show MeSH