Limits...
Efficient targeted transcript discovery via array-based normalization of RACE libraries.

Djebali S, Kapranov P, Foissac S, Lagarde J, Reymond A, Ucla C, Wyss C, Drenkow J, Dumais E, Murray RR, Lin C, Szeto D, Denoeud F, Calvo M, Frankish A, Harrow J, Makrythanasis P, Vidal M, Salehi-Ashtiani K, Antonarakis SE, Gingeras TR, Guigó R - Nat. Methods (2008)

Bottom Line: Random clone selection from the RACE mixture, however, is an ineffective sampling strategy if the dynamic range of transcript abundances is large.This approach, RACEarray, is superior to direct cloning and sequencing of RACE products because it specifically targets new transcripts and often results in overall normalization of transcript abundance.We show theoretically and experimentally that this strategy leads indeed to efficient sampling of new transcripts, and we investigated multiplexing the strategy by pooling RACE reactions from multiple interrogated loci before hybridization.

View Article: PubMed Central - PubMed

Affiliation: Grup de Recerca en Informàtica Biomèdica, Institut Municipal d'Investigació Mèdica/Universitat Pompeu Fabra, Dr. Aiguader 88, 08003 Barcelona, Spain.

ABSTRACT
Rapid amplification of cDNA ends (RACE) is a widely used approach for transcript identification. Random clone selection from the RACE mixture, however, is an ineffective sampling strategy if the dynamic range of transcript abundances is large. To improve sampling efficiency of human transcripts, we hybridized the products of the RACE reaction onto tiling arrays and used the detected exons to delineate a series of reverse-transcriptase (RT)-PCRs, through which the original RACE transcript population was segregated into simpler transcript populations. We independently cloned the products and sequenced randomly selected clones. This approach, RACEarray, is superior to direct cloning and sequencing of RACE products because it specifically targets new transcripts and often results in overall normalization of transcript abundance. We show theoretically and experimentally that this strategy leads indeed to efficient sampling of new transcripts, and we investigated multiplexing the strategy by pooling RACE reactions from multiple interrogated loci before hybridization.

Show MeSH
Examples of novel RACEfrags verified by RT-PCR, cloning and sequencing(a) RACEarray interrogation of the MECP2 locus. Probe intensity values of RACE products originating from annotated exons of the MECP2 locus hybridized into the ENCODE tiling-array including the region in which the MECP2 locus resides. Two isoforms are known for this gene. Fifteen new transcript sequences have been discovered through RACEarray normalization (cloned RT-PCR products).(b) RACEarray interrogation of the CHAF1B locus (UCSC Genome Browser screenshot). RACEfrags are depicted in orange, RT-PCR primer pairs in cyan, sequenced RT-PCR products in purple, and index genes (“Q3 target genes” track) in blue. A 5’ RACE reaction originating from an exon of gene CHAF1B (a.k.a. NM_005441) produced the RACEfrags showed in orange. The most distal RACEfrag—overlapping an exon from upstream gene ZCWCC3 (a.k.a. NM_015358, on the same strand as CHAF1B)—was chosen for RT-PCR verification. RT-PCR products were cloned. Sixteen clones were selected at random and sequenced. Eight different sequences were obtained (under “RT-PCR products”; two end sequences from one clone could not be assembled and are not shown here). All the novel sequences connect the two loci using a variety of novel exon combinations. No previous evidence existed supporting these transcripts. For reference, various UCSC annotation tracks are also represented at the bottom of each screenshot. Some tracks (“Q3 RxFRAGS”, “Human mRNAs + ESTs”) were collapsed (“Dense” mode) for clarity purposes (a more detailed figure as well as other examples are available at http://genome.imim.es/datasets/racearrays2007/).
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC2713501&req=5

Figure 2: Examples of novel RACEfrags verified by RT-PCR, cloning and sequencing(a) RACEarray interrogation of the MECP2 locus. Probe intensity values of RACE products originating from annotated exons of the MECP2 locus hybridized into the ENCODE tiling-array including the region in which the MECP2 locus resides. Two isoforms are known for this gene. Fifteen new transcript sequences have been discovered through RACEarray normalization (cloned RT-PCR products).(b) RACEarray interrogation of the CHAF1B locus (UCSC Genome Browser screenshot). RACEfrags are depicted in orange, RT-PCR primer pairs in cyan, sequenced RT-PCR products in purple, and index genes (“Q3 target genes” track) in blue. A 5’ RACE reaction originating from an exon of gene CHAF1B (a.k.a. NM_005441) produced the RACEfrags showed in orange. The most distal RACEfrag—overlapping an exon from upstream gene ZCWCC3 (a.k.a. NM_015358, on the same strand as CHAF1B)—was chosen for RT-PCR verification. RT-PCR products were cloned. Sixteen clones were selected at random and sequenced. Eight different sequences were obtained (under “RT-PCR products”; two end sequences from one clone could not be assembled and are not shown here). All the novel sequences connect the two loci using a variety of novel exon combinations. No previous evidence existed supporting these transcripts. For reference, various UCSC annotation tracks are also represented at the bottom of each screenshot. Some tracks (“Q3 RxFRAGS”, “Human mRNAs + ESTs”) were collapsed (“Dense” mode) for clarity purposes (a more detailed figure as well as other examples are available at http://genome.imim.es/datasets/racearrays2007/).

Mentions: As a proof of concept, we used RACEarray normalization to interrogate a single gene: MECP2. Mutations in this gene cause the Rett syndrome. MECP2 has two known transcript variants: the longer form has four exons, the shorter form skips the 2nd exon (Figure 2a). We performed 3’ and 5’ RACE from the exon number 3 in 16 different tissues (see Methods). We additionally performed 5’ RACE from exons 2, 3 and 4 in fetal brain. RACE reactions were separately hybridized on the ENCODE arrays containing the region ENm006, in which the MECP2 gene resides. The raw hybridization data appears on Figure 2a. Seventy one RACEfrags were detected. Eight 5’ RACEfrags were selected for RT-PCR verification. All eight gave at least one RT-PCR product, which was either cloned or sequenced directly. In total, 15 novel isoforms including 14 novel exons were discovered in this way. Most these isoforms are partial, since many of the novel RACEfrags interrogated are likely to correspond to internal exons. The majority of them use canonical splice sites and a few could be coding for proteins. Therefore, through a limited exploration using the RACEarray normalization strategy, we have discovered many novel isoforms for an important disease gene.


Efficient targeted transcript discovery via array-based normalization of RACE libraries.

Djebali S, Kapranov P, Foissac S, Lagarde J, Reymond A, Ucla C, Wyss C, Drenkow J, Dumais E, Murray RR, Lin C, Szeto D, Denoeud F, Calvo M, Frankish A, Harrow J, Makrythanasis P, Vidal M, Salehi-Ashtiani K, Antonarakis SE, Gingeras TR, Guigó R - Nat. Methods (2008)

Examples of novel RACEfrags verified by RT-PCR, cloning and sequencing(a) RACEarray interrogation of the MECP2 locus. Probe intensity values of RACE products originating from annotated exons of the MECP2 locus hybridized into the ENCODE tiling-array including the region in which the MECP2 locus resides. Two isoforms are known for this gene. Fifteen new transcript sequences have been discovered through RACEarray normalization (cloned RT-PCR products).(b) RACEarray interrogation of the CHAF1B locus (UCSC Genome Browser screenshot). RACEfrags are depicted in orange, RT-PCR primer pairs in cyan, sequenced RT-PCR products in purple, and index genes (“Q3 target genes” track) in blue. A 5’ RACE reaction originating from an exon of gene CHAF1B (a.k.a. NM_005441) produced the RACEfrags showed in orange. The most distal RACEfrag—overlapping an exon from upstream gene ZCWCC3 (a.k.a. NM_015358, on the same strand as CHAF1B)—was chosen for RT-PCR verification. RT-PCR products were cloned. Sixteen clones were selected at random and sequenced. Eight different sequences were obtained (under “RT-PCR products”; two end sequences from one clone could not be assembled and are not shown here). All the novel sequences connect the two loci using a variety of novel exon combinations. No previous evidence existed supporting these transcripts. For reference, various UCSC annotation tracks are also represented at the bottom of each screenshot. Some tracks (“Q3 RxFRAGS”, “Human mRNAs + ESTs”) were collapsed (“Dense” mode) for clarity purposes (a more detailed figure as well as other examples are available at http://genome.imim.es/datasets/racearrays2007/).
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC2713501&req=5

Figure 2: Examples of novel RACEfrags verified by RT-PCR, cloning and sequencing(a) RACEarray interrogation of the MECP2 locus. Probe intensity values of RACE products originating from annotated exons of the MECP2 locus hybridized into the ENCODE tiling-array including the region in which the MECP2 locus resides. Two isoforms are known for this gene. Fifteen new transcript sequences have been discovered through RACEarray normalization (cloned RT-PCR products).(b) RACEarray interrogation of the CHAF1B locus (UCSC Genome Browser screenshot). RACEfrags are depicted in orange, RT-PCR primer pairs in cyan, sequenced RT-PCR products in purple, and index genes (“Q3 target genes” track) in blue. A 5’ RACE reaction originating from an exon of gene CHAF1B (a.k.a. NM_005441) produced the RACEfrags showed in orange. The most distal RACEfrag—overlapping an exon from upstream gene ZCWCC3 (a.k.a. NM_015358, on the same strand as CHAF1B)—was chosen for RT-PCR verification. RT-PCR products were cloned. Sixteen clones were selected at random and sequenced. Eight different sequences were obtained (under “RT-PCR products”; two end sequences from one clone could not be assembled and are not shown here). All the novel sequences connect the two loci using a variety of novel exon combinations. No previous evidence existed supporting these transcripts. For reference, various UCSC annotation tracks are also represented at the bottom of each screenshot. Some tracks (“Q3 RxFRAGS”, “Human mRNAs + ESTs”) were collapsed (“Dense” mode) for clarity purposes (a more detailed figure as well as other examples are available at http://genome.imim.es/datasets/racearrays2007/).
Mentions: As a proof of concept, we used RACEarray normalization to interrogate a single gene: MECP2. Mutations in this gene cause the Rett syndrome. MECP2 has two known transcript variants: the longer form has four exons, the shorter form skips the 2nd exon (Figure 2a). We performed 3’ and 5’ RACE from the exon number 3 in 16 different tissues (see Methods). We additionally performed 5’ RACE from exons 2, 3 and 4 in fetal brain. RACE reactions were separately hybridized on the ENCODE arrays containing the region ENm006, in which the MECP2 gene resides. The raw hybridization data appears on Figure 2a. Seventy one RACEfrags were detected. Eight 5’ RACEfrags were selected for RT-PCR verification. All eight gave at least one RT-PCR product, which was either cloned or sequenced directly. In total, 15 novel isoforms including 14 novel exons were discovered in this way. Most these isoforms are partial, since many of the novel RACEfrags interrogated are likely to correspond to internal exons. The majority of them use canonical splice sites and a few could be coding for proteins. Therefore, through a limited exploration using the RACEarray normalization strategy, we have discovered many novel isoforms for an important disease gene.

Bottom Line: Random clone selection from the RACE mixture, however, is an ineffective sampling strategy if the dynamic range of transcript abundances is large.This approach, RACEarray, is superior to direct cloning and sequencing of RACE products because it specifically targets new transcripts and often results in overall normalization of transcript abundance.We show theoretically and experimentally that this strategy leads indeed to efficient sampling of new transcripts, and we investigated multiplexing the strategy by pooling RACE reactions from multiple interrogated loci before hybridization.

View Article: PubMed Central - PubMed

Affiliation: Grup de Recerca en Informàtica Biomèdica, Institut Municipal d'Investigació Mèdica/Universitat Pompeu Fabra, Dr. Aiguader 88, 08003 Barcelona, Spain.

ABSTRACT
Rapid amplification of cDNA ends (RACE) is a widely used approach for transcript identification. Random clone selection from the RACE mixture, however, is an ineffective sampling strategy if the dynamic range of transcript abundances is large. To improve sampling efficiency of human transcripts, we hybridized the products of the RACE reaction onto tiling arrays and used the detected exons to delineate a series of reverse-transcriptase (RT)-PCRs, through which the original RACE transcript population was segregated into simpler transcript populations. We independently cloned the products and sequenced randomly selected clones. This approach, RACEarray, is superior to direct cloning and sequencing of RACE products because it specifically targets new transcripts and often results in overall normalization of transcript abundance. We show theoretically and experimentally that this strategy leads indeed to efficient sampling of new transcripts, and we investigated multiplexing the strategy by pooling RACE reactions from multiple interrogated loci before hybridization.

Show MeSH