Limits...
Selection of long oligonucleotides for gene expression microarrays using weighted rank-sum strategy.

Hu G, Llinás M, Li J, Preiser PR, Bozdech Z - BMC Bioinformatics (2007)

Bottom Line: This approach optimizes the selection criteria (weight score) for each gene individually, accommodating variable properties of the DNA sequence along the genome.OligoRankPick is an efficient tool for the design of long oligonucleotide DNA microarrays which does not rely on direct oligonucleotide exclusion by parameter cutoffs but instead optimizes all parameters in context of each other.The weighted rank-sum strategy utilized by this algorithm provides high flexibility of oligonucleotide selection which accommodates extreme variability of DNA sequence properties along genomes of many organisms.

View Article: PubMed Central - HTML - PubMed

Affiliation: School of Biological Sciences, Nanyang Technological University, No, 60 Nanyang Drive, 637551, Singapore. hu0002an@ntu.edu.sg

ABSTRACT

Background: The design of long oligonucleotides for spotted DNA microarrays requires detailed attention to ensure their optimal performance in the hybridization process. The main challenge is to select an optimal oligonucleotide element that represents each genetic locus/gene in the genome and is unique, devoid of internal structures and repetitive sequences and its Tm is uniform with all other elements on the microarray. Currently, all of the publicly available programs for DNA long oligonucleotide microarray selection utilize various combinations of cutoffs in which each parameter (uniqueness, Tm, and secondary structure) is evaluated and filtered individually. The use of the cutoffs can, however, lead to information loss and to selection of suboptimal oligonucleotides, especially for genomes with extreme distribution of the GC content, a large proportion of repetitive sequences or the presence of large gene families with highly homologous members.

Results: Here we present the program OligoRankPick which is using a weighted rank-based strategy to select microarray oligonucleotide elements via an integer weighted linear function. This approach optimizes the selection criteria (weight score) for each gene individually, accommodating variable properties of the DNA sequence along the genome. The designed algorithm was tested using three microbial genomes Escherichia coli, Saccharomyces cerevisiae and the human malaria parasite species Plasmodium falciparum. In comparison to other published algorithms OligoRankPick provides significant improvements in oligonucleotide design for all three genomes with the most significant improvements observed in the microarray design for P. falciparum whose genome is characterized by large fluctuations of GC content, and abundant gene duplications.

Conclusion: OligoRankPick is an efficient tool for the design of long oligonucleotide DNA microarrays which does not rely on direct oligonucleotide exclusion by parameter cutoffs but instead optimizes all parameters in context of each other. The weighted rank-sum strategy utilized by this algorithm provides high flexibility of oligonucleotide selection which accommodates extreme variability of DNA sequence properties along genomes of many organisms.

Show MeSH

Related in: MedlinePlus

The flowchart of OligoRankPick. All possible oligonucleotides were extracted form the input sequence and stored. Subsequently four parameters of all possible oligonucleotides were calculated including the BLAST score to a second genomic target (uniqueness), the GC content (Tm), the Reverse Smith-Waterman score (self-binding) and the LZ compression score (sequence complexity). In the rank transformation step, the oligonucleotides are ranked based on each parameter and ordinal rank number is given to all oligonucleotides in each parameter rank independently. Finally weighted rank-sum (RS(x)) is calculated for all oligonucleotides with uniqueness weights (WBLAST), GC content weights (WGC) self-binding weights (WSW), and sequence complexity weights (WLZ) and RBLAST, RGC, RSR and RLZ representing the ranks corresponding to each parameter ranking. Multiple RS(x) are determined by the gene specific optimization using multiple weight sets (not indicated) and the lowest value is finally considered. The optimal candidate is selected based on the lowest RS(x) amongst all oligonucleotides in the locus.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2099447&req=5

Figure 1: The flowchart of OligoRankPick. All possible oligonucleotides were extracted form the input sequence and stored. Subsequently four parameters of all possible oligonucleotides were calculated including the BLAST score to a second genomic target (uniqueness), the GC content (Tm), the Reverse Smith-Waterman score (self-binding) and the LZ compression score (sequence complexity). In the rank transformation step, the oligonucleotides are ranked based on each parameter and ordinal rank number is given to all oligonucleotides in each parameter rank independently. Finally weighted rank-sum (RS(x)) is calculated for all oligonucleotides with uniqueness weights (WBLAST), GC content weights (WGC) self-binding weights (WSW), and sequence complexity weights (WLZ) and RBLAST, RGC, RSR and RLZ representing the ranks corresponding to each parameter ranking. Multiple RS(x) are determined by the gene specific optimization using multiple weight sets (not indicated) and the lowest value is finally considered. The optimal candidate is selected based on the lowest RS(x) amongst all oligonucleotides in the locus.

Mentions: Figure 1 summarizes the global overview of the OligoRankPick algorithm. Essentially, all possible oligonucleotide windows from a gene/locus are extracted and scored by the four parameter measurements, uniqueness (BLAST score to second target), GC content (GC content, Tm), self-binding (Reverse Smith-Waterman, SW) and sequence complexity (Lempel-Ziv compression score) (figure 1). Subsequently, each score is transformed into a rank and a weighted rank-sum is calculated for each oligonucleotide using the weighted optimization strategy (see below). The final oligonucleotide is selected based on the smallest rank-sum value.


Selection of long oligonucleotides for gene expression microarrays using weighted rank-sum strategy.

Hu G, Llinás M, Li J, Preiser PR, Bozdech Z - BMC Bioinformatics (2007)

The flowchart of OligoRankPick. All possible oligonucleotides were extracted form the input sequence and stored. Subsequently four parameters of all possible oligonucleotides were calculated including the BLAST score to a second genomic target (uniqueness), the GC content (Tm), the Reverse Smith-Waterman score (self-binding) and the LZ compression score (sequence complexity). In the rank transformation step, the oligonucleotides are ranked based on each parameter and ordinal rank number is given to all oligonucleotides in each parameter rank independently. Finally weighted rank-sum (RS(x)) is calculated for all oligonucleotides with uniqueness weights (WBLAST), GC content weights (WGC) self-binding weights (WSW), and sequence complexity weights (WLZ) and RBLAST, RGC, RSR and RLZ representing the ranks corresponding to each parameter ranking. Multiple RS(x) are determined by the gene specific optimization using multiple weight sets (not indicated) and the lowest value is finally considered. The optimal candidate is selected based on the lowest RS(x) amongst all oligonucleotides in the locus.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2099447&req=5

Figure 1: The flowchart of OligoRankPick. All possible oligonucleotides were extracted form the input sequence and stored. Subsequently four parameters of all possible oligonucleotides were calculated including the BLAST score to a second genomic target (uniqueness), the GC content (Tm), the Reverse Smith-Waterman score (self-binding) and the LZ compression score (sequence complexity). In the rank transformation step, the oligonucleotides are ranked based on each parameter and ordinal rank number is given to all oligonucleotides in each parameter rank independently. Finally weighted rank-sum (RS(x)) is calculated for all oligonucleotides with uniqueness weights (WBLAST), GC content weights (WGC) self-binding weights (WSW), and sequence complexity weights (WLZ) and RBLAST, RGC, RSR and RLZ representing the ranks corresponding to each parameter ranking. Multiple RS(x) are determined by the gene specific optimization using multiple weight sets (not indicated) and the lowest value is finally considered. The optimal candidate is selected based on the lowest RS(x) amongst all oligonucleotides in the locus.
Mentions: Figure 1 summarizes the global overview of the OligoRankPick algorithm. Essentially, all possible oligonucleotide windows from a gene/locus are extracted and scored by the four parameter measurements, uniqueness (BLAST score to second target), GC content (GC content, Tm), self-binding (Reverse Smith-Waterman, SW) and sequence complexity (Lempel-Ziv compression score) (figure 1). Subsequently, each score is transformed into a rank and a weighted rank-sum is calculated for each oligonucleotide using the weighted optimization strategy (see below). The final oligonucleotide is selected based on the smallest rank-sum value.

Bottom Line: This approach optimizes the selection criteria (weight score) for each gene individually, accommodating variable properties of the DNA sequence along the genome.OligoRankPick is an efficient tool for the design of long oligonucleotide DNA microarrays which does not rely on direct oligonucleotide exclusion by parameter cutoffs but instead optimizes all parameters in context of each other.The weighted rank-sum strategy utilized by this algorithm provides high flexibility of oligonucleotide selection which accommodates extreme variability of DNA sequence properties along genomes of many organisms.

View Article: PubMed Central - HTML - PubMed

Affiliation: School of Biological Sciences, Nanyang Technological University, No, 60 Nanyang Drive, 637551, Singapore. hu0002an@ntu.edu.sg

ABSTRACT

Background: The design of long oligonucleotides for spotted DNA microarrays requires detailed attention to ensure their optimal performance in the hybridization process. The main challenge is to select an optimal oligonucleotide element that represents each genetic locus/gene in the genome and is unique, devoid of internal structures and repetitive sequences and its Tm is uniform with all other elements on the microarray. Currently, all of the publicly available programs for DNA long oligonucleotide microarray selection utilize various combinations of cutoffs in which each parameter (uniqueness, Tm, and secondary structure) is evaluated and filtered individually. The use of the cutoffs can, however, lead to information loss and to selection of suboptimal oligonucleotides, especially for genomes with extreme distribution of the GC content, a large proportion of repetitive sequences or the presence of large gene families with highly homologous members.

Results: Here we present the program OligoRankPick which is using a weighted rank-based strategy to select microarray oligonucleotide elements via an integer weighted linear function. This approach optimizes the selection criteria (weight score) for each gene individually, accommodating variable properties of the DNA sequence along the genome. The designed algorithm was tested using three microbial genomes Escherichia coli, Saccharomyces cerevisiae and the human malaria parasite species Plasmodium falciparum. In comparison to other published algorithms OligoRankPick provides significant improvements in oligonucleotide design for all three genomes with the most significant improvements observed in the microarray design for P. falciparum whose genome is characterized by large fluctuations of GC content, and abundant gene duplications.

Conclusion: OligoRankPick is an efficient tool for the design of long oligonucleotide DNA microarrays which does not rely on direct oligonucleotide exclusion by parameter cutoffs but instead optimizes all parameters in context of each other. The weighted rank-sum strategy utilized by this algorithm provides high flexibility of oligonucleotide selection which accommodates extreme variability of DNA sequence properties along genomes of many organisms.

Show MeSH
Related in: MedlinePlus