Limits...
Maize (Zea mays L.) genome diversity as revealed by RNA-sequencing.

Hansey CN, Vaillancourt B, Sekhon RS, de Leon N, Kaeppler SM, Buell CR - PLoS ONE (2012)

Bottom Line: However, the transcribed gene set among the 21 lines varied, with 48.7% expressed in all of the lines, 27.9% expressed in one to 20 lines, and 23.4% expressed in none of the lines.De novo assembly of RNA-seq reads that did not map to the reference B73 genome sequence revealed 1,321 high confidence novel transcripts, of which, 564 loci were present in all 21 lines, including B73, and 757 loci were restricted to a subset of the lines.RT-PCR validation demonstrated 87.5% concordance with the computational prediction of these expressed novel transcripts.

View Article: PubMed Central - PubMed

Affiliation: Department of Plant Biology, Michigan State University, East Lansing, Michigan, United States of America.

ABSTRACT
Maize is rich in genetic and phenotypic diversity. Understanding the sequence, structural, and expression variation that contributes to phenotypic diversity would facilitate more efficient varietal improvement. RNA based sequencing (RNA-seq) is a powerful approach for transcriptional analysis, assessing sequence variation, and identifying novel transcript sequences, particularly in large, complex, repetitive genomes such as maize. In this study, we sequenced RNA from whole seedlings of 21 maize inbred lines representing diverse North American and exotic germplasm. Single nucleotide polymorphism (SNP) detection identified 351,710 polymorphic loci distributed throughout the genome covering 22,830 annotated genes. Tight clustering of two distinct heterotic groups and exotic lines was evident using these SNPs as genetic markers. Transcript abundance analysis revealed minimal variation in the total number of genes expressed across these 21 lines (57.1% to 66.0%). However, the transcribed gene set among the 21 lines varied, with 48.7% expressed in all of the lines, 27.9% expressed in one to 20 lines, and 23.4% expressed in none of the lines. De novo assembly of RNA-seq reads that did not map to the reference B73 genome sequence revealed 1,321 high confidence novel transcripts, of which, 564 loci were present in all 21 lines, including B73, and 757 loci were restricted to a subset of the lines. RT-PCR validation demonstrated 87.5% concordance with the computational prediction of these expressed novel transcripts. Intriguingly, 145 of the novel de novo assembled loci were present in lines from only one of the two heterotic groups consistent with the hypothesis that, in addition to sequence polymorphisms and transcript abundance, transcript presence/absence variation is present and, thereby, may be a mechanism contributing to the genetic basis of heterosis.

Show MeSH

Related in: MedlinePlus

Distribution of genes in the maize seedling core and dispensable transcriptomes determined using a semi-qualitative approach.Reads were mapped to the 5b pseudomolecules (http://ftp.maizesequence.org/) using Bowtie version 0.12.7 [50] and TopHat version 1.2.0 [51], and fragments per kilobase of exon model per million fragments mapped (FPKM) were determined with Cufflinks version 0.9.3 [56] and the 5b annotation (http://ftp.maizesequence.org/).For each gene, a line was considered not expressed if the low confidence FPKM value was equal to zero, low expressed if the low confidence interval was greater than zero and the FPKM value was less than 5, medium expressed if the low confidence interval was greater than zero and the FPKM value was greater than or equal to 5 and less than or equal to 200, and high expressed if the low confidence interval was greater than zero and the FPKM value was greater than 200.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC3306378&req=5

pone-0033071-g003: Distribution of genes in the maize seedling core and dispensable transcriptomes determined using a semi-qualitative approach.Reads were mapped to the 5b pseudomolecules (http://ftp.maizesequence.org/) using Bowtie version 0.12.7 [50] and TopHat version 1.2.0 [51], and fragments per kilobase of exon model per million fragments mapped (FPKM) were determined with Cufflinks version 0.9.3 [56] and the 5b annotation (http://ftp.maizesequence.org/).For each gene, a line was considered not expressed if the low confidence FPKM value was equal to zero, low expressed if the low confidence interval was greater than zero and the FPKM value was less than 5, medium expressed if the low confidence interval was greater than zero and the FPKM value was greater than or equal to 5 and less than or equal to 200, and high expressed if the low confidence interval was greater than zero and the FPKM value was greater than 200.

Mentions: Transcriptome profile variation can extend beyond PAV for each transcript. Using a semi-quantitative approach where inbred lines were categorized as having no, low, medium, or high expression for each gene, we also observed variation in transcript abundance between the lines (Figure 3). In this classification approach, a gene could have constitutive expression across all 21 lines within any one of the four categories. Alternatively, a gene could have variable expression, with inbred lines categorized into multiple expression level categories. Using this method, the no, medium, and high expression categories had a similar distribution to that observed in the expressed/not expressed based analysis described above (Figure 3; Figure S3), where a large number of genes (24,378) had constitutive expression across all 21 inbred lines and the remainder of genes were variable in their expression across the lines. For the low expression category, the majority of the genes had only 5 or fewer lines with low expression, and the other lines were predominantly no or medium expression. A mere 12 genes had low expression in all 21 lines. It is possible that lowly expressed genes are less frequently expressed across all 21 genotypes, or that there is erroneous transcription in a small number of lines. However, this altered distribution is most likely a technical limitation attributable to sampling limitations. While there are a large number of genes with constitutive expression, there are also many genes with variable transcript abundance, both quantitatively and qualitatively that may contribute to observed phenotypic diversity.


Maize (Zea mays L.) genome diversity as revealed by RNA-sequencing.

Hansey CN, Vaillancourt B, Sekhon RS, de Leon N, Kaeppler SM, Buell CR - PLoS ONE (2012)

Distribution of genes in the maize seedling core and dispensable transcriptomes determined using a semi-qualitative approach.Reads were mapped to the 5b pseudomolecules (http://ftp.maizesequence.org/) using Bowtie version 0.12.7 [50] and TopHat version 1.2.0 [51], and fragments per kilobase of exon model per million fragments mapped (FPKM) were determined with Cufflinks version 0.9.3 [56] and the 5b annotation (http://ftp.maizesequence.org/).For each gene, a line was considered not expressed if the low confidence FPKM value was equal to zero, low expressed if the low confidence interval was greater than zero and the FPKM value was less than 5, medium expressed if the low confidence interval was greater than zero and the FPKM value was greater than or equal to 5 and less than or equal to 200, and high expressed if the low confidence interval was greater than zero and the FPKM value was greater than 200.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC3306378&req=5

pone-0033071-g003: Distribution of genes in the maize seedling core and dispensable transcriptomes determined using a semi-qualitative approach.Reads were mapped to the 5b pseudomolecules (http://ftp.maizesequence.org/) using Bowtie version 0.12.7 [50] and TopHat version 1.2.0 [51], and fragments per kilobase of exon model per million fragments mapped (FPKM) were determined with Cufflinks version 0.9.3 [56] and the 5b annotation (http://ftp.maizesequence.org/).For each gene, a line was considered not expressed if the low confidence FPKM value was equal to zero, low expressed if the low confidence interval was greater than zero and the FPKM value was less than 5, medium expressed if the low confidence interval was greater than zero and the FPKM value was greater than or equal to 5 and less than or equal to 200, and high expressed if the low confidence interval was greater than zero and the FPKM value was greater than 200.
Mentions: Transcriptome profile variation can extend beyond PAV for each transcript. Using a semi-quantitative approach where inbred lines were categorized as having no, low, medium, or high expression for each gene, we also observed variation in transcript abundance between the lines (Figure 3). In this classification approach, a gene could have constitutive expression across all 21 lines within any one of the four categories. Alternatively, a gene could have variable expression, with inbred lines categorized into multiple expression level categories. Using this method, the no, medium, and high expression categories had a similar distribution to that observed in the expressed/not expressed based analysis described above (Figure 3; Figure S3), where a large number of genes (24,378) had constitutive expression across all 21 inbred lines and the remainder of genes were variable in their expression across the lines. For the low expression category, the majority of the genes had only 5 or fewer lines with low expression, and the other lines were predominantly no or medium expression. A mere 12 genes had low expression in all 21 lines. It is possible that lowly expressed genes are less frequently expressed across all 21 genotypes, or that there is erroneous transcription in a small number of lines. However, this altered distribution is most likely a technical limitation attributable to sampling limitations. While there are a large number of genes with constitutive expression, there are also many genes with variable transcript abundance, both quantitatively and qualitatively that may contribute to observed phenotypic diversity.

Bottom Line: However, the transcribed gene set among the 21 lines varied, with 48.7% expressed in all of the lines, 27.9% expressed in one to 20 lines, and 23.4% expressed in none of the lines.De novo assembly of RNA-seq reads that did not map to the reference B73 genome sequence revealed 1,321 high confidence novel transcripts, of which, 564 loci were present in all 21 lines, including B73, and 757 loci were restricted to a subset of the lines.RT-PCR validation demonstrated 87.5% concordance with the computational prediction of these expressed novel transcripts.

View Article: PubMed Central - PubMed

Affiliation: Department of Plant Biology, Michigan State University, East Lansing, Michigan, United States of America.

ABSTRACT
Maize is rich in genetic and phenotypic diversity. Understanding the sequence, structural, and expression variation that contributes to phenotypic diversity would facilitate more efficient varietal improvement. RNA based sequencing (RNA-seq) is a powerful approach for transcriptional analysis, assessing sequence variation, and identifying novel transcript sequences, particularly in large, complex, repetitive genomes such as maize. In this study, we sequenced RNA from whole seedlings of 21 maize inbred lines representing diverse North American and exotic germplasm. Single nucleotide polymorphism (SNP) detection identified 351,710 polymorphic loci distributed throughout the genome covering 22,830 annotated genes. Tight clustering of two distinct heterotic groups and exotic lines was evident using these SNPs as genetic markers. Transcript abundance analysis revealed minimal variation in the total number of genes expressed across these 21 lines (57.1% to 66.0%). However, the transcribed gene set among the 21 lines varied, with 48.7% expressed in all of the lines, 27.9% expressed in one to 20 lines, and 23.4% expressed in none of the lines. De novo assembly of RNA-seq reads that did not map to the reference B73 genome sequence revealed 1,321 high confidence novel transcripts, of which, 564 loci were present in all 21 lines, including B73, and 757 loci were restricted to a subset of the lines. RT-PCR validation demonstrated 87.5% concordance with the computational prediction of these expressed novel transcripts. Intriguingly, 145 of the novel de novo assembled loci were present in lines from only one of the two heterotic groups consistent with the hypothesis that, in addition to sequence polymorphisms and transcript abundance, transcript presence/absence variation is present and, thereby, may be a mechanism contributing to the genetic basis of heterosis.

Show MeSH
Related in: MedlinePlus