Limits...
Selection of long oligonucleotides for gene expression microarrays using weighted rank-sum strategy.

Hu G, Llinás M, Li J, Preiser PR, Bozdech Z - BMC Bioinformatics (2007)

Bottom Line: This approach optimizes the selection criteria (weight score) for each gene individually, accommodating variable properties of the DNA sequence along the genome.OligoRankPick is an efficient tool for the design of long oligonucleotide DNA microarrays which does not rely on direct oligonucleotide exclusion by parameter cutoffs but instead optimizes all parameters in context of each other.The weighted rank-sum strategy utilized by this algorithm provides high flexibility of oligonucleotide selection which accommodates extreme variability of DNA sequence properties along genomes of many organisms.

View Article: PubMed Central - HTML - PubMed

Affiliation: School of Biological Sciences, Nanyang Technological University, No, 60 Nanyang Drive, 637551, Singapore. hu0002an@ntu.edu.sg

ABSTRACT

Background: The design of long oligonucleotides for spotted DNA microarrays requires detailed attention to ensure their optimal performance in the hybridization process. The main challenge is to select an optimal oligonucleotide element that represents each genetic locus/gene in the genome and is unique, devoid of internal structures and repetitive sequences and its Tm is uniform with all other elements on the microarray. Currently, all of the publicly available programs for DNA long oligonucleotide microarray selection utilize various combinations of cutoffs in which each parameter (uniqueness, Tm, and secondary structure) is evaluated and filtered individually. The use of the cutoffs can, however, lead to information loss and to selection of suboptimal oligonucleotides, especially for genomes with extreme distribution of the GC content, a large proportion of repetitive sequences or the presence of large gene families with highly homologous members.

Results: Here we present the program OligoRankPick which is using a weighted rank-based strategy to select microarray oligonucleotide elements via an integer weighted linear function. This approach optimizes the selection criteria (weight score) for each gene individually, accommodating variable properties of the DNA sequence along the genome. The designed algorithm was tested using three microbial genomes Escherichia coli, Saccharomyces cerevisiae and the human malaria parasite species Plasmodium falciparum. In comparison to other published algorithms OligoRankPick provides significant improvements in oligonucleotide design for all three genomes with the most significant improvements observed in the microarray design for P. falciparum whose genome is characterized by large fluctuations of GC content, and abundant gene duplications.

Conclusion: OligoRankPick is an efficient tool for the design of long oligonucleotide DNA microarrays which does not rely on direct oligonucleotide exclusion by parameter cutoffs but instead optimizes all parameters in context of each other. The weighted rank-sum strategy utilized by this algorithm provides high flexibility of oligonucleotide selection which accommodates extreme variability of DNA sequence properties along genomes of many organisms.

Show MeSH

Related in: MedlinePlus

Oligonucleotide parameter distributions in the newly designed P. falciparum DNA microarray. Total 10166 oligonucleotides were designed for the P. falciprum DNA microarray. Relative abundance of the oligonucleotides is plotted along the uniqueness scores (BLAST score of the second-best target in the genome) (A) and along the GC content (B). The dotted line indicates the quality control criteria (see text) with BLAST score = 56 which corresponding to > 40% continuous match cross-hybridization and the 31.4% ± 5% interval of GC content corresponding to the targeted range. Percentages of oligonucleotides which fall within the targeted values are indicated.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2099447&req=5

Figure 4: Oligonucleotide parameter distributions in the newly designed P. falciparum DNA microarray. Total 10166 oligonucleotides were designed for the P. falciprum DNA microarray. Relative abundance of the oligonucleotides is plotted along the uniqueness scores (BLAST score of the second-best target in the genome) (A) and along the GC content (B). The dotted line indicates the quality control criteria (see text) with BLAST score = 56 which corresponding to > 40% continuous match cross-hybridization and the 31.4% ± 5% interval of GC content corresponding to the targeted range. Percentages of oligonucleotides which fall within the targeted values are indicated.

Mentions: In the final step we applied OligoRankPick to design a gene specific DNA microarray for the P. falciparum genome (5363 coding sequences, CDS) that can be used for functional genomic studies of this important human pathogen. For this design we wished to increase the oligonucleotide coverage for longer open reading frames and thus we fragmented each coding sequence using the fragmentation.pl script as follows: sequences smaller than 1 kb were kept as one fragment; sequences between 1 kb and 2 kb were split evenly into two fragments, sequences larger then 2 kb were split into n fragments (n > = 2) when: (2n-2)kb < gene size > (2n)kb. The fragmentation step generated 10166 Microarray Element Fragments (MEFs) from 5363 CDS. A single oligonucleotide was designed for each MEF which resulted in one oligonucleotide per 1198 bp on average for all P. falciparum coding sequences. Although the median GC content of all 70 nt oligonucleotide windows in the P. falciparum coding sequences is 24.3% (displayed by GC_dis.pl optional module) for higher specificity and efficiency of microarray hybridization, we selected oligonucleotides with a GC content of 31.4% (22 GCs out of 70 nt). OligoRankPick successfully designed 10166 oligonucleotides representing all predicted P. falciparum genes with an average of 1.9 oligonucleotides per protein coding sequence (see Additional file 3). Figure 4B summarizes the GC content distribution suggesting that OligoRankPick can identify optimal oligonucleotide elements with GC content significantly distant from the average GC content in the genome. Astonishingly 70.5% of the designed oligonucleotides had the desired GC content of 31.4% (figure 4B).


Selection of long oligonucleotides for gene expression microarrays using weighted rank-sum strategy.

Hu G, Llinás M, Li J, Preiser PR, Bozdech Z - BMC Bioinformatics (2007)

Oligonucleotide parameter distributions in the newly designed P. falciparum DNA microarray. Total 10166 oligonucleotides were designed for the P. falciprum DNA microarray. Relative abundance of the oligonucleotides is plotted along the uniqueness scores (BLAST score of the second-best target in the genome) (A) and along the GC content (B). The dotted line indicates the quality control criteria (see text) with BLAST score = 56 which corresponding to > 40% continuous match cross-hybridization and the 31.4% ± 5% interval of GC content corresponding to the targeted range. Percentages of oligonucleotides which fall within the targeted values are indicated.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2099447&req=5

Figure 4: Oligonucleotide parameter distributions in the newly designed P. falciparum DNA microarray. Total 10166 oligonucleotides were designed for the P. falciprum DNA microarray. Relative abundance of the oligonucleotides is plotted along the uniqueness scores (BLAST score of the second-best target in the genome) (A) and along the GC content (B). The dotted line indicates the quality control criteria (see text) with BLAST score = 56 which corresponding to > 40% continuous match cross-hybridization and the 31.4% ± 5% interval of GC content corresponding to the targeted range. Percentages of oligonucleotides which fall within the targeted values are indicated.
Mentions: In the final step we applied OligoRankPick to design a gene specific DNA microarray for the P. falciparum genome (5363 coding sequences, CDS) that can be used for functional genomic studies of this important human pathogen. For this design we wished to increase the oligonucleotide coverage for longer open reading frames and thus we fragmented each coding sequence using the fragmentation.pl script as follows: sequences smaller than 1 kb were kept as one fragment; sequences between 1 kb and 2 kb were split evenly into two fragments, sequences larger then 2 kb were split into n fragments (n > = 2) when: (2n-2)kb < gene size > (2n)kb. The fragmentation step generated 10166 Microarray Element Fragments (MEFs) from 5363 CDS. A single oligonucleotide was designed for each MEF which resulted in one oligonucleotide per 1198 bp on average for all P. falciparum coding sequences. Although the median GC content of all 70 nt oligonucleotide windows in the P. falciparum coding sequences is 24.3% (displayed by GC_dis.pl optional module) for higher specificity and efficiency of microarray hybridization, we selected oligonucleotides with a GC content of 31.4% (22 GCs out of 70 nt). OligoRankPick successfully designed 10166 oligonucleotides representing all predicted P. falciparum genes with an average of 1.9 oligonucleotides per protein coding sequence (see Additional file 3). Figure 4B summarizes the GC content distribution suggesting that OligoRankPick can identify optimal oligonucleotide elements with GC content significantly distant from the average GC content in the genome. Astonishingly 70.5% of the designed oligonucleotides had the desired GC content of 31.4% (figure 4B).

Bottom Line: This approach optimizes the selection criteria (weight score) for each gene individually, accommodating variable properties of the DNA sequence along the genome.OligoRankPick is an efficient tool for the design of long oligonucleotide DNA microarrays which does not rely on direct oligonucleotide exclusion by parameter cutoffs but instead optimizes all parameters in context of each other.The weighted rank-sum strategy utilized by this algorithm provides high flexibility of oligonucleotide selection which accommodates extreme variability of DNA sequence properties along genomes of many organisms.

View Article: PubMed Central - HTML - PubMed

Affiliation: School of Biological Sciences, Nanyang Technological University, No, 60 Nanyang Drive, 637551, Singapore. hu0002an@ntu.edu.sg

ABSTRACT

Background: The design of long oligonucleotides for spotted DNA microarrays requires detailed attention to ensure their optimal performance in the hybridization process. The main challenge is to select an optimal oligonucleotide element that represents each genetic locus/gene in the genome and is unique, devoid of internal structures and repetitive sequences and its Tm is uniform with all other elements on the microarray. Currently, all of the publicly available programs for DNA long oligonucleotide microarray selection utilize various combinations of cutoffs in which each parameter (uniqueness, Tm, and secondary structure) is evaluated and filtered individually. The use of the cutoffs can, however, lead to information loss and to selection of suboptimal oligonucleotides, especially for genomes with extreme distribution of the GC content, a large proportion of repetitive sequences or the presence of large gene families with highly homologous members.

Results: Here we present the program OligoRankPick which is using a weighted rank-based strategy to select microarray oligonucleotide elements via an integer weighted linear function. This approach optimizes the selection criteria (weight score) for each gene individually, accommodating variable properties of the DNA sequence along the genome. The designed algorithm was tested using three microbial genomes Escherichia coli, Saccharomyces cerevisiae and the human malaria parasite species Plasmodium falciparum. In comparison to other published algorithms OligoRankPick provides significant improvements in oligonucleotide design for all three genomes with the most significant improvements observed in the microarray design for P. falciparum whose genome is characterized by large fluctuations of GC content, and abundant gene duplications.

Conclusion: OligoRankPick is an efficient tool for the design of long oligonucleotide DNA microarrays which does not rely on direct oligonucleotide exclusion by parameter cutoffs but instead optimizes all parameters in context of each other. The weighted rank-sum strategy utilized by this algorithm provides high flexibility of oligonucleotide selection which accommodates extreme variability of DNA sequence properties along genomes of many organisms.

Show MeSH
Related in: MedlinePlus