Limits...
Integrating heterogeneous sequence information for transcriptome-wide microarray design; a Zebrafish example.

Rauwerda H, de Jong M, de Leeuw WC, Spaink HP, Breit TM - BMC Res Notes (2010)

Bottom Line: If a transcript is much smaller than a TC to which it is highly similar, it will be annotated as a subsequence of that TC and is used for probe design only if the probe designed for the TC does not query the subsequence.With our strategy and the software developed, it is possible to use a set of heterogeneous transcript resources for microarray design, reduce the number of candidate target sequences on which the design is based and reduce redundancy.The annotation of the microarray is carried out simultaneously with the design.

View Article: PubMed Central - HTML - PubMed

Affiliation: Microarray Department & Integrative Bioinformatics Unit, Swammerdam Institute for Life Sciences, Faculty of Science, University of Amsterdam, Amsterdam, The Netherlands. t.m.breit@uva.nl.

ABSTRACT

Background: A complete gene-expression microarray should preferably detect all genomic sequences that can be expressed as RNA in an organism, i.e. the transcriptome. However, our knowledge of a transcriptome of any organism still is incomplete and transcriptome information is continuously being updated. Here, we present a strategy to integrate heterogeneous sequence information that can be used as input for an up-to-date microarray design.

Findings: Our algorithm consists of four steps. In the first step transcripts from different resources are grouped into Transcription Clusters (TCs) by looking at the similarity of all transcripts. TCs are groups of transcripts with a similar length. If a transcript is much smaller than a TC to which it is highly similar, it will be annotated as a subsequence of that TC and is used for probe design only if the probe designed for the TC does not query the subsequence. Secondly, all TCs are mapped to a genome assembly and gene information is added to the design. Thirdly TC members are ranked according to their trustworthiness and the most reliable sequence is used for the probe design. The last step is the actual array design. We have used this strategy to build an up-to-date zebrafish microarray.

Conclusions: With our strategy and the software developed, it is possible to use a set of heterogeneous transcript resources for microarray design, reduce the number of candidate target sequences on which the design is based and reduce redundancy. By changing the parameters in the procedure it is possible to control the similarity within the TCs and thus the amount of candidate sequences for the design. The annotation of the microarray is carried out simultaneously with the design.

No MeSH data available.


Barplot of cross hybridizing probes. Barplot of TC-based uni-directional cross-hybridizing probes. Shown are the probes that cross-hybridize to 30 sequences or less; in or on top of the histogram bars the number of probes is displayed.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2913925&req=5

Figure 4: Barplot of cross hybridizing probes. Barplot of TC-based uni-directional cross-hybridizing probes. Shown are the probes that cross-hybridize to 30 sequences or less; in or on top of the histogram bars the number of probes is displayed.

Mentions: We have organized the design information in several additional files: Additional file 2, all non cross-hybridizing probes are tabulated together with their sequences, the characteristics of the transcript the probe is designed on, the other sequences in the TC and the Ensembl genes and Ensembl transcripts mapped onto this TC; Additional file 3, all TCs are given that are queried by non cross-hybridizing probes along with the identifiers of the transcripts and the probe(s); Additional file 4, all cross-hybridizing probes are tabulated together with their sequences, the characteristics of the transcript the probe is designed on, the other sequences in the TC, the Ensembl genes and Ensembl transcripts mapped onto this TC and the TCs to which they cross-hybridize; For a number of TC-pairs no probe could be designed that distinguishes between the members of the pair. In Additional file 5, 10757 probes are tabulated that query two or more of such TCs or subsequences. To indicate the extent of cross hybridization we summarized the number of sequences to which probes cross-hybridize in Figure 4. 38% (15,296) uni-directional cross-hybridizing TC-based probes cross hybridize just to one sequence. 55% (29,797) of all cross-hybridizing probes have only perfect hits to the sequences they cross-hybridize with (Additional file 6). In total, we have designed 126,632 probes in this whole-transcriptome Zebrafish Microarray Design.


Integrating heterogeneous sequence information for transcriptome-wide microarray design; a Zebrafish example.

Rauwerda H, de Jong M, de Leeuw WC, Spaink HP, Breit TM - BMC Res Notes (2010)

Barplot of cross hybridizing probes. Barplot of TC-based uni-directional cross-hybridizing probes. Shown are the probes that cross-hybridize to 30 sequences or less; in or on top of the histogram bars the number of probes is displayed.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2913925&req=5

Figure 4: Barplot of cross hybridizing probes. Barplot of TC-based uni-directional cross-hybridizing probes. Shown are the probes that cross-hybridize to 30 sequences or less; in or on top of the histogram bars the number of probes is displayed.
Mentions: We have organized the design information in several additional files: Additional file 2, all non cross-hybridizing probes are tabulated together with their sequences, the characteristics of the transcript the probe is designed on, the other sequences in the TC and the Ensembl genes and Ensembl transcripts mapped onto this TC; Additional file 3, all TCs are given that are queried by non cross-hybridizing probes along with the identifiers of the transcripts and the probe(s); Additional file 4, all cross-hybridizing probes are tabulated together with their sequences, the characteristics of the transcript the probe is designed on, the other sequences in the TC, the Ensembl genes and Ensembl transcripts mapped onto this TC and the TCs to which they cross-hybridize; For a number of TC-pairs no probe could be designed that distinguishes between the members of the pair. In Additional file 5, 10757 probes are tabulated that query two or more of such TCs or subsequences. To indicate the extent of cross hybridization we summarized the number of sequences to which probes cross-hybridize in Figure 4. 38% (15,296) uni-directional cross-hybridizing TC-based probes cross hybridize just to one sequence. 55% (29,797) of all cross-hybridizing probes have only perfect hits to the sequences they cross-hybridize with (Additional file 6). In total, we have designed 126,632 probes in this whole-transcriptome Zebrafish Microarray Design.

Bottom Line: If a transcript is much smaller than a TC to which it is highly similar, it will be annotated as a subsequence of that TC and is used for probe design only if the probe designed for the TC does not query the subsequence.With our strategy and the software developed, it is possible to use a set of heterogeneous transcript resources for microarray design, reduce the number of candidate target sequences on which the design is based and reduce redundancy.The annotation of the microarray is carried out simultaneously with the design.

View Article: PubMed Central - HTML - PubMed

Affiliation: Microarray Department & Integrative Bioinformatics Unit, Swammerdam Institute for Life Sciences, Faculty of Science, University of Amsterdam, Amsterdam, The Netherlands. t.m.breit@uva.nl.

ABSTRACT

Background: A complete gene-expression microarray should preferably detect all genomic sequences that can be expressed as RNA in an organism, i.e. the transcriptome. However, our knowledge of a transcriptome of any organism still is incomplete and transcriptome information is continuously being updated. Here, we present a strategy to integrate heterogeneous sequence information that can be used as input for an up-to-date microarray design.

Findings: Our algorithm consists of four steps. In the first step transcripts from different resources are grouped into Transcription Clusters (TCs) by looking at the similarity of all transcripts. TCs are groups of transcripts with a similar length. If a transcript is much smaller than a TC to which it is highly similar, it will be annotated as a subsequence of that TC and is used for probe design only if the probe designed for the TC does not query the subsequence. Secondly, all TCs are mapped to a genome assembly and gene information is added to the design. Thirdly TC members are ranked according to their trustworthiness and the most reliable sequence is used for the probe design. The last step is the actual array design. We have used this strategy to build an up-to-date zebrafish microarray.

Conclusions: With our strategy and the software developed, it is possible to use a set of heterogeneous transcript resources for microarray design, reduce the number of candidate target sequences on which the design is based and reduce redundancy. By changing the parameters in the procedure it is possible to control the similarity within the TCs and thus the amount of candidate sequences for the design. The annotation of the microarray is carried out simultaneously with the design.

No MeSH data available.