Limits...
A general framework for optimization of probes for gene expression microarray and its application to the fungus Podospora anserina.

Bidard F, Imbeaud S, Reymond N, Lespinet O, Silar P, Clavé C, Delacroix H, Berteaux-Lecellier V, Debuchy R - BMC Res Notes (2010)

Bottom Line: The development of new microarray technologies makes custom long oligonucleotide arrays affordable for many experimental applications, notably gene expression analyses.Reliable results depend on probe design quality and selection.This latter criterion is critical when exon-intron gene structure predictions for intron-rich genes are inaccurate.

View Article: PubMed Central - HTML - PubMed

Affiliation: Univ Paris-Sud 11, Institut de Génétique et Microbiologie UMR8621, F- 91405 Orsay, France. robert.debuchy@igmors.u-psud.fr.

ABSTRACT

Background: The development of new microarray technologies makes custom long oligonucleotide arrays affordable for many experimental applications, notably gene expression analyses. Reliable results depend on probe design quality and selection. Probe design strategy should cope with the limited accuracy of de novo gene prediction programs, and annotation up-dating. We present a novel in silico procedure which addresses these issues and includes experimental screening, as an empirical approach is the best strategy to identify optimal probes in the in silico outcome.

Findings: We used four criteria for in silico probe selection: cross-hybridization, hairpin stability, probe location relative to coding sequence end and intron position. This latter criterion is critical when exon-intron gene structure predictions for intron-rich genes are inaccurate. For each coding sequence (CDS), we selected a sub-set of four probes. These probes were included in a test microarray, which was used to evaluate the hybridization behavior of each probe. The best probe for each CDS was selected according to three experimental criteria: signal-to-noise ratio, signal reproducibility, and representative signal intensities. This procedure was applied for the development of a gene expression Agilent platform for the filamentous fungus Podospora anserina and the selection of a single 60-mer probe for each of the 10,556 P. anserina CDS.

Conclusions: A reliable gene expression microarray version based on the Agilent 44K platform was developed with four spot replicates of each probe to increase statistical significance of analysis.

No MeSH data available.


Related in: MedlinePlus

Distribution of probe intensity CV in the five conditions used for the experimental validation of probes. The distributions of probe intensity CV are presented in a series of five boxes (interquartile range) and whiskers plots. Hybridizations were performed on microarray v.2 with the cRNAs prepared from the five conditions (M24h, M48h, M96h, C24h, C48h) and labeled with Cy3. Each condition consisted of 4 biological replicates. The CVs were computed as indicated in Additional file 1. The median CV is 0.13, 0.10, 0.19, 0.12 and 0.11 for M24h, M48h, M96h, C24h and C48h, respectively.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2908635&req=5

Figure 3: Distribution of probe intensity CV in the five conditions used for the experimental validation of probes. The distributions of probe intensity CV are presented in a series of five boxes (interquartile range) and whiskers plots. Hybridizations were performed on microarray v.2 with the cRNAs prepared from the five conditions (M24h, M48h, M96h, C24h, C48h) and labeled with Cy3. Each condition consisted of 4 biological replicates. The CVs were computed as indicated in Additional file 1. The median CV is 0.13, 0.10, 0.19, 0.12 and 0.11 for M24h, M48h, M96h, C24h and C48h, respectively.

Mentions: Time courses of vegetative growth (24 h [M24h], 48 h [M48h] and 96 h [M96h]) and sexual crossing (24 h [C24h], 48 h [C48h] and 96 h [C96h] after fertilization) were used for extraction of RNA but only the M24h, M48h, M96h, C24h and C48h conditions were used for subsequent probe selection. Each condition had four biological replicates and including mat + and mat - strains [20], which were isogenic except at the mating-type locus. The common reference RNA pool was created by mixing equal amounts of RNA extracted from M48h, M96h, C24h, C48h and C96h. The materials and methods used for strains, cultures, nucleic acid extractions, RNA pool preparation and microarray analyses are described in Additional file 1. The numbers of outlier probes and probe-deficient CDS identified by experimental validation are shown in Table 2. As a result of low signal-to-noise ratio, 123 CDS had all of their probes rejected. These probes may either correspond to genes that were not expressed under the experimental conditions, or to false-positive genes resulting from over-annotation. The distribution of CV in the five experimental conditions is shown in Figure 3. Most of the probes (92%) rejected by this metric belong to M96h. Great transcription differences between mat + and mat - strains at M96h were characterized for some genes ([21] and unpublished observations); these differences are expected to persist in C24h and C48h conditions. Therefore, 27 probes (9 CDS) with a CV > 0.75 in two of the three above conditions were retained, as the high CV is likely biologically relevant. Mprobe and Marray scores proved to be the most selective measures with 60% of probes being rejected after this analysis. At the end of the experimental validation, 9,822 CDS had at least one qualified probe. A total of 717 CDS were probe-deficient, because either one criterion, or a combination of criteria, was sufficient to eliminate all probes targeted to a given CDS (Table 2). For these CDS, one probe was chosen by supervised selection. The final array design contained 10,539 probes for nuclear CDS (Microarray v.3). As P. anserina is used as a model system for mitochondrial metabolism [22], 17 mitochondrial CDS probes were added to the final array. These probes underwent only the computational screening. Each array contained four replicates of each probe to improve statistical significance of results. Progression in microarray v.3 was confirmed by its signal CV which was lower than that obtained with v.2 upon self-to-self hybridization with the common reference cRNA pool (Figure 4). The median CV of microarray v.3 is similar to those obtained in the MAQC study with the commercial Agilent human microarray platform [23].


A general framework for optimization of probes for gene expression microarray and its application to the fungus Podospora anserina.

Bidard F, Imbeaud S, Reymond N, Lespinet O, Silar P, Clavé C, Delacroix H, Berteaux-Lecellier V, Debuchy R - BMC Res Notes (2010)

Distribution of probe intensity CV in the five conditions used for the experimental validation of probes. The distributions of probe intensity CV are presented in a series of five boxes (interquartile range) and whiskers plots. Hybridizations were performed on microarray v.2 with the cRNAs prepared from the five conditions (M24h, M48h, M96h, C24h, C48h) and labeled with Cy3. Each condition consisted of 4 biological replicates. The CVs were computed as indicated in Additional file 1. The median CV is 0.13, 0.10, 0.19, 0.12 and 0.11 for M24h, M48h, M96h, C24h and C48h, respectively.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2908635&req=5

Figure 3: Distribution of probe intensity CV in the five conditions used for the experimental validation of probes. The distributions of probe intensity CV are presented in a series of five boxes (interquartile range) and whiskers plots. Hybridizations were performed on microarray v.2 with the cRNAs prepared from the five conditions (M24h, M48h, M96h, C24h, C48h) and labeled with Cy3. Each condition consisted of 4 biological replicates. The CVs were computed as indicated in Additional file 1. The median CV is 0.13, 0.10, 0.19, 0.12 and 0.11 for M24h, M48h, M96h, C24h and C48h, respectively.
Mentions: Time courses of vegetative growth (24 h [M24h], 48 h [M48h] and 96 h [M96h]) and sexual crossing (24 h [C24h], 48 h [C48h] and 96 h [C96h] after fertilization) were used for extraction of RNA but only the M24h, M48h, M96h, C24h and C48h conditions were used for subsequent probe selection. Each condition had four biological replicates and including mat + and mat - strains [20], which were isogenic except at the mating-type locus. The common reference RNA pool was created by mixing equal amounts of RNA extracted from M48h, M96h, C24h, C48h and C96h. The materials and methods used for strains, cultures, nucleic acid extractions, RNA pool preparation and microarray analyses are described in Additional file 1. The numbers of outlier probes and probe-deficient CDS identified by experimental validation are shown in Table 2. As a result of low signal-to-noise ratio, 123 CDS had all of their probes rejected. These probes may either correspond to genes that were not expressed under the experimental conditions, or to false-positive genes resulting from over-annotation. The distribution of CV in the five experimental conditions is shown in Figure 3. Most of the probes (92%) rejected by this metric belong to M96h. Great transcription differences between mat + and mat - strains at M96h were characterized for some genes ([21] and unpublished observations); these differences are expected to persist in C24h and C48h conditions. Therefore, 27 probes (9 CDS) with a CV > 0.75 in two of the three above conditions were retained, as the high CV is likely biologically relevant. Mprobe and Marray scores proved to be the most selective measures with 60% of probes being rejected after this analysis. At the end of the experimental validation, 9,822 CDS had at least one qualified probe. A total of 717 CDS were probe-deficient, because either one criterion, or a combination of criteria, was sufficient to eliminate all probes targeted to a given CDS (Table 2). For these CDS, one probe was chosen by supervised selection. The final array design contained 10,539 probes for nuclear CDS (Microarray v.3). As P. anserina is used as a model system for mitochondrial metabolism [22], 17 mitochondrial CDS probes were added to the final array. These probes underwent only the computational screening. Each array contained four replicates of each probe to improve statistical significance of results. Progression in microarray v.3 was confirmed by its signal CV which was lower than that obtained with v.2 upon self-to-self hybridization with the common reference cRNA pool (Figure 4). The median CV of microarray v.3 is similar to those obtained in the MAQC study with the commercial Agilent human microarray platform [23].

Bottom Line: The development of new microarray technologies makes custom long oligonucleotide arrays affordable for many experimental applications, notably gene expression analyses.Reliable results depend on probe design quality and selection.This latter criterion is critical when exon-intron gene structure predictions for intron-rich genes are inaccurate.

View Article: PubMed Central - HTML - PubMed

Affiliation: Univ Paris-Sud 11, Institut de Génétique et Microbiologie UMR8621, F- 91405 Orsay, France. robert.debuchy@igmors.u-psud.fr.

ABSTRACT

Background: The development of new microarray technologies makes custom long oligonucleotide arrays affordable for many experimental applications, notably gene expression analyses. Reliable results depend on probe design quality and selection. Probe design strategy should cope with the limited accuracy of de novo gene prediction programs, and annotation up-dating. We present a novel in silico procedure which addresses these issues and includes experimental screening, as an empirical approach is the best strategy to identify optimal probes in the in silico outcome.

Findings: We used four criteria for in silico probe selection: cross-hybridization, hairpin stability, probe location relative to coding sequence end and intron position. This latter criterion is critical when exon-intron gene structure predictions for intron-rich genes are inaccurate. For each coding sequence (CDS), we selected a sub-set of four probes. These probes were included in a test microarray, which was used to evaluate the hybridization behavior of each probe. The best probe for each CDS was selected according to three experimental criteria: signal-to-noise ratio, signal reproducibility, and representative signal intensities. This procedure was applied for the development of a gene expression Agilent platform for the filamentous fungus Podospora anserina and the selection of a single 60-mer probe for each of the 10,556 P. anserina CDS.

Conclusions: A reliable gene expression microarray version based on the Agilent 44K platform was developed with four spot replicates of each probe to increase statistical significance of analysis.

No MeSH data available.


Related in: MedlinePlus