Limits...
Re-Annotator: Annotation Pipeline for Microarray Probe Sequences.

Arloth J, Bader DM, Röh S, Altmann A - PLoS ONE (2015)

Bottom Line: Microarray technologies are established approaches for high throughput gene expression, methylation and genotyping analysis.However, manufacturers of the microarray platforms typically provide incomplete and outdated annotation tables, which often rely on older genome and transcriptome versions that differ substantially from up-to-date sequence databases.It is primarily designed for gene expression microarrays but can also be adapted to other types of microarrays.

View Article: PubMed Central - PubMed

Affiliation: Translational Research Department, Max Planck Institute of Psychiatry, Kraepelinstrasse 2-10, 80804, Munich, Germany.

ABSTRACT
Microarray technologies are established approaches for high throughput gene expression, methylation and genotyping analysis. An accurate mapping of the array probes is essential to generate reliable biological findings. However, manufacturers of the microarray platforms typically provide incomplete and outdated annotation tables, which often rely on older genome and transcriptome versions that differ substantially from up-to-date sequence databases. Here, we present the Re-Annotator, a re-annotation pipeline for microarray probe sequences. It is primarily designed for gene expression microarrays but can also be adapted to other types of microarrays. The Re-Annotator uses a custom-built mRNA reference database to identify the positions of gene expression array probe sequences. We applied Re-Annotator to the Illumina Human-HT12 v4 microarray platform and found that about one quarter (25%) of the probes differed from the manufacturer's annotation. In further computational experiments on experimental gene expression data, we compared Re-Annotator to another probe re-annotation tool, ReMOAT, and found that Re-Annotator provided an improved re-annotation of microarray probes. A thorough re-annotation of probe information is crucial to any microarray analysis. The Re-Annotator pipeline is freely available at http://sourceforge.net/projects/reannotator along with re-annotated files for Illumina microarrays HumanHT-12 v3/v4 and MouseRef-8 v2.

No MeSH data available.


Results of the re-annotation of Illumina HumanHT-12 v4 probe sequences.(A) Barplot of the alignment basis. The left bar represents the array probe sequences that could be aligned to the in silico mRNA reference database. The middle bar represents sequences that were aligned to the whole genome reference subdivided into genic and intergenic alignments. The right bar represents unaligned sequences. Histograms showing (B) the number of mismatches between probe sequence and reference, (C) the number of equally top scoring best hits per probe sequence and (D) the number of SNPs (in the 1000 genomes data) within an aligned probe sequence. (E) Histogram of the annotation of probes, which have no annotation according to the manufacture and now have been rescued and reliably annotated. (F) Histogram showing the changes in annotation from the manufacture to our re-annotation (Manufacture / Re-Annotator).
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4591122&req=5

pone.0139516.g002: Results of the re-annotation of Illumina HumanHT-12 v4 probe sequences.(A) Barplot of the alignment basis. The left bar represents the array probe sequences that could be aligned to the in silico mRNA reference database. The middle bar represents sequences that were aligned to the whole genome reference subdivided into genic and intergenic alignments. The right bar represents unaligned sequences. Histograms showing (B) the number of mismatches between probe sequence and reference, (C) the number of equally top scoring best hits per probe sequence and (D) the number of SNPs (in the 1000 genomes data) within an aligned probe sequence. (E) Histogram of the annotation of probes, which have no annotation according to the manufacture and now have been rescued and reliably annotated. (F) Histogram showing the changes in annotation from the manufacture to our re-annotation (Manufacture / Re-Annotator).

Mentions: We analyzed the Illumina HumanHT-12 v4 probe sequences using the Re-Annotator Pipeline. Of all 47,230 probe sequences, 95% (Fig 2A; Table 1) were aligned to either our custom-built mRNA sequence database (n = 34,277; in the first alignment step) or if no hit was found to the reference genome (n = 10,661; in the second alignment step). A large fraction of the latter probe sequences were aligned to genomic locations without any known transcribed gene (n = 7,493). After the post-processing filter, 77.7% of all aligned probe sequences (see Table 1) mapped to a distinct region (defined as a maximum of 25 bp distance between multiple hits for the same sequence) in the genome and were included in the final annotation file for the HumanHT-12 v4 BeadChip array. This set of 34,936 probes is referred to as “reliable” array probes in the following. The majority (93.8%) of those reliable probes (Fig 2B; Table 1) were aligned without mismatches. The number of hits per probe to a region ranged from 1 to 32, where 67.7% had only one unique hit and 96% (n = 33,539) had less than five hits (Fig 2C). The vast majority of reliable probes (92.1%; Fig 2D; Table 1) resided in regions without known SNPs in the Caucasian population (based on the 1,000 Genomes Project). It is conceivable that SNPs within the probe sequence may be the source of “differential” expression via altered hybridization efficiency. However, Schurmann et al. [4] reported no consistent effects of SNPs located in array probe sequences on hybridization efficiency. Thus, one has to test individually whether these SNPs are associated with alternate expression signals intensity.


Re-Annotator: Annotation Pipeline for Microarray Probe Sequences.

Arloth J, Bader DM, Röh S, Altmann A - PLoS ONE (2015)

Results of the re-annotation of Illumina HumanHT-12 v4 probe sequences.(A) Barplot of the alignment basis. The left bar represents the array probe sequences that could be aligned to the in silico mRNA reference database. The middle bar represents sequences that were aligned to the whole genome reference subdivided into genic and intergenic alignments. The right bar represents unaligned sequences. Histograms showing (B) the number of mismatches between probe sequence and reference, (C) the number of equally top scoring best hits per probe sequence and (D) the number of SNPs (in the 1000 genomes data) within an aligned probe sequence. (E) Histogram of the annotation of probes, which have no annotation according to the manufacture and now have been rescued and reliably annotated. (F) Histogram showing the changes in annotation from the manufacture to our re-annotation (Manufacture / Re-Annotator).
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4591122&req=5

pone.0139516.g002: Results of the re-annotation of Illumina HumanHT-12 v4 probe sequences.(A) Barplot of the alignment basis. The left bar represents the array probe sequences that could be aligned to the in silico mRNA reference database. The middle bar represents sequences that were aligned to the whole genome reference subdivided into genic and intergenic alignments. The right bar represents unaligned sequences. Histograms showing (B) the number of mismatches between probe sequence and reference, (C) the number of equally top scoring best hits per probe sequence and (D) the number of SNPs (in the 1000 genomes data) within an aligned probe sequence. (E) Histogram of the annotation of probes, which have no annotation according to the manufacture and now have been rescued and reliably annotated. (F) Histogram showing the changes in annotation from the manufacture to our re-annotation (Manufacture / Re-Annotator).
Mentions: We analyzed the Illumina HumanHT-12 v4 probe sequences using the Re-Annotator Pipeline. Of all 47,230 probe sequences, 95% (Fig 2A; Table 1) were aligned to either our custom-built mRNA sequence database (n = 34,277; in the first alignment step) or if no hit was found to the reference genome (n = 10,661; in the second alignment step). A large fraction of the latter probe sequences were aligned to genomic locations without any known transcribed gene (n = 7,493). After the post-processing filter, 77.7% of all aligned probe sequences (see Table 1) mapped to a distinct region (defined as a maximum of 25 bp distance between multiple hits for the same sequence) in the genome and were included in the final annotation file for the HumanHT-12 v4 BeadChip array. This set of 34,936 probes is referred to as “reliable” array probes in the following. The majority (93.8%) of those reliable probes (Fig 2B; Table 1) were aligned without mismatches. The number of hits per probe to a region ranged from 1 to 32, where 67.7% had only one unique hit and 96% (n = 33,539) had less than five hits (Fig 2C). The vast majority of reliable probes (92.1%; Fig 2D; Table 1) resided in regions without known SNPs in the Caucasian population (based on the 1,000 Genomes Project). It is conceivable that SNPs within the probe sequence may be the source of “differential” expression via altered hybridization efficiency. However, Schurmann et al. [4] reported no consistent effects of SNPs located in array probe sequences on hybridization efficiency. Thus, one has to test individually whether these SNPs are associated with alternate expression signals intensity.

Bottom Line: Microarray technologies are established approaches for high throughput gene expression, methylation and genotyping analysis.However, manufacturers of the microarray platforms typically provide incomplete and outdated annotation tables, which often rely on older genome and transcriptome versions that differ substantially from up-to-date sequence databases.It is primarily designed for gene expression microarrays but can also be adapted to other types of microarrays.

View Article: PubMed Central - PubMed

Affiliation: Translational Research Department, Max Planck Institute of Psychiatry, Kraepelinstrasse 2-10, 80804, Munich, Germany.

ABSTRACT
Microarray technologies are established approaches for high throughput gene expression, methylation and genotyping analysis. An accurate mapping of the array probes is essential to generate reliable biological findings. However, manufacturers of the microarray platforms typically provide incomplete and outdated annotation tables, which often rely on older genome and transcriptome versions that differ substantially from up-to-date sequence databases. Here, we present the Re-Annotator, a re-annotation pipeline for microarray probe sequences. It is primarily designed for gene expression microarrays but can also be adapted to other types of microarrays. The Re-Annotator uses a custom-built mRNA reference database to identify the positions of gene expression array probe sequences. We applied Re-Annotator to the Illumina Human-HT12 v4 microarray platform and found that about one quarter (25%) of the probes differed from the manufacturer's annotation. In further computational experiments on experimental gene expression data, we compared Re-Annotator to another probe re-annotation tool, ReMOAT, and found that Re-Annotator provided an improved re-annotation of microarray probes. A thorough re-annotation of probe information is crucial to any microarray analysis. The Re-Annotator pipeline is freely available at http://sourceforge.net/projects/reannotator along with re-annotated files for Illumina microarrays HumanHT-12 v3/v4 and MouseRef-8 v2.

No MeSH data available.