Limits...
Shared probe design and existing microarray reanalysis using PICKY.

Chou HH - BMC Bioinformatics (2010)

Bottom Line: This limitation is due to thermodynamic restrictions and cannot be resolved by any computational method.PICKY 2.0 uses novel algorithms to track sharable regions among genes and to strictly distinguish them from other highly similar but nontarget regions during thermodynamic comparisons.In addition, more precise nonlinear salt effect estimates and other improvements are added, making PICKY 2.1 more versatile to microarray users.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Genetics, Development and Cell Biology, and Department of Computer Science, Iowa State University, Ames, IA, 50011-3223, USA. hhchou@iastate.edu

ABSTRACT

Background: Large genomes contain families of highly similar genes that cannot be individually identified by microarray probes. This limitation is due to thermodynamic restrictions and cannot be resolved by any computational method. Since gene annotations are updated more frequently than microarrays, another common issue facing microarray users is that existing microarrays must be routinely reanalyzed to determine probes that are still useful with respect to the updated annotations.

Results: PICKY 2.0 can design shared probes for sets of genes that cannot be individually identified using unique probes. PICKY 2.0 uses novel algorithms to track sharable regions among genes and to strictly distinguish them from other highly similar but nontarget regions during thermodynamic comparisons. Therefore, PICKY does not sacrifice the quality of shared probes when choosing them. The latest PICKY 2.1 includes the new capability to reanalyze existing microarray probes against updated gene sets to determine probes that are still valid to use. In addition, more precise nonlinear salt effect estimates and other improvements are added, making PICKY 2.1 more versatile to microarray users.

Conclusions: Shared probes allow expressed gene family members to be detected; this capability is generally more desirable than not knowing anything about these genes. Shared probes also enable the design of cross-genome microarrays, which facilitate multiple species identification in environmental samples. The new nonlinear salt effect calculation significantly increases the precision of probes at a lower buffer salt concentration, and the probe reanalysis function improves existing microarray result interpretations.

Show MeSH
Example implementation to traverse all host sequences, track their stacking groups and process the groups for shared probe design. r points to each overlap group on a host sequence, which contains four data fields used in this algorithm: Pos, the start of the group on the host sequence, End, the end of the group on the host sequence, Span, the span value of the group, and Next, pointer to the next group; host is the host sequence currently being scanned; pqc counts the total number of distinctive groups on a host sequence; pqs is a collection of priority queues for each associated group; start and end indicated the range of the current group being processed; span records its span value; and next_s is the start position of the next group. Each stack entry in st contains a pair of values: the first is the r pointer to a region as described above, and the second is the pqi index into pqs for storing shared probes designed for a group.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2875240&req=5

Figure 3: Example implementation to traverse all host sequences, track their stacking groups and process the groups for shared probe design. r points to each overlap group on a host sequence, which contains four data fields used in this algorithm: Pos, the start of the group on the host sequence, End, the end of the group on the host sequence, Span, the span value of the group, and Next, pointer to the next group; host is the host sequence currently being scanned; pqc counts the total number of distinctive groups on a host sequence; pqs is a collection of priority queues for each associated group; start and end indicated the range of the current group being processed; span records its span value; and next_s is the start position of the next group. Each stack entry in st contains a pair of values: the first is the r pointer to a region as described above, and the second is the pqi index into pqs for storing shared probes designed for a group.

Mentions: In step 3.2, if the next group overlaps at least maximum nontarget match length with the group currently on top of the stack, then the right end of the next group cannot extend beyond the right end of the stack-top group. In this case we say that the next group covers the stack-top group. To prove this is always true, assume the opposite that a next group goes beyond the right end of the stack-top group but overlaps it with maximum nontarget match length. Suffixes of the stack-top group cannot all be also members of the next group because that would extend the stack-top group to the right end of the next group and contradict our assumption. Therefore, suffixes of the stack-top group that are not members of the next group must have overlapped at least the maximum nontarget match length with suffixes of the next group, which prevents the next group from even being added to the lookup table based on step 2.2 of the first algorithm. Therefore, a group that covers another group must be entirely within the region of the covered group and must have more members (i.e., a larger span value) than the covered group. A priority queue and stack combination can be used to keep track of the various groups on a host sequence: a priority queue is associated with each group and is used to store and prioritize its best probe candidates, while a stack is used to keep track of all stacking groups. For example, host sequence D in Figure 1 contains three stacking groups. When the group representing region 6 is being processed, the groups for regions 3 and 5 are pushed down the stack as indicated by the underlines. Processing on the host sequence never backtracks; thus, after group 6 has been processed, only the remaining region of group 5 will be processed. Group 3 will later be skipped because the remaining region of group 3 becomes zero after group 5 has also been processed. Figure 3 presents an example implementation of this algorithm in C++.


Shared probe design and existing microarray reanalysis using PICKY.

Chou HH - BMC Bioinformatics (2010)

Example implementation to traverse all host sequences, track their stacking groups and process the groups for shared probe design. r points to each overlap group on a host sequence, which contains four data fields used in this algorithm: Pos, the start of the group on the host sequence, End, the end of the group on the host sequence, Span, the span value of the group, and Next, pointer to the next group; host is the host sequence currently being scanned; pqc counts the total number of distinctive groups on a host sequence; pqs is a collection of priority queues for each associated group; start and end indicated the range of the current group being processed; span records its span value; and next_s is the start position of the next group. Each stack entry in st contains a pair of values: the first is the r pointer to a region as described above, and the second is the pqi index into pqs for storing shared probes designed for a group.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2875240&req=5

Figure 3: Example implementation to traverse all host sequences, track their stacking groups and process the groups for shared probe design. r points to each overlap group on a host sequence, which contains four data fields used in this algorithm: Pos, the start of the group on the host sequence, End, the end of the group on the host sequence, Span, the span value of the group, and Next, pointer to the next group; host is the host sequence currently being scanned; pqc counts the total number of distinctive groups on a host sequence; pqs is a collection of priority queues for each associated group; start and end indicated the range of the current group being processed; span records its span value; and next_s is the start position of the next group. Each stack entry in st contains a pair of values: the first is the r pointer to a region as described above, and the second is the pqi index into pqs for storing shared probes designed for a group.
Mentions: In step 3.2, if the next group overlaps at least maximum nontarget match length with the group currently on top of the stack, then the right end of the next group cannot extend beyond the right end of the stack-top group. In this case we say that the next group covers the stack-top group. To prove this is always true, assume the opposite that a next group goes beyond the right end of the stack-top group but overlaps it with maximum nontarget match length. Suffixes of the stack-top group cannot all be also members of the next group because that would extend the stack-top group to the right end of the next group and contradict our assumption. Therefore, suffixes of the stack-top group that are not members of the next group must have overlapped at least the maximum nontarget match length with suffixes of the next group, which prevents the next group from even being added to the lookup table based on step 2.2 of the first algorithm. Therefore, a group that covers another group must be entirely within the region of the covered group and must have more members (i.e., a larger span value) than the covered group. A priority queue and stack combination can be used to keep track of the various groups on a host sequence: a priority queue is associated with each group and is used to store and prioritize its best probe candidates, while a stack is used to keep track of all stacking groups. For example, host sequence D in Figure 1 contains three stacking groups. When the group representing region 6 is being processed, the groups for regions 3 and 5 are pushed down the stack as indicated by the underlines. Processing on the host sequence never backtracks; thus, after group 6 has been processed, only the remaining region of group 5 will be processed. Group 3 will later be skipped because the remaining region of group 3 becomes zero after group 5 has also been processed. Figure 3 presents an example implementation of this algorithm in C++.

Bottom Line: This limitation is due to thermodynamic restrictions and cannot be resolved by any computational method.PICKY 2.0 uses novel algorithms to track sharable regions among genes and to strictly distinguish them from other highly similar but nontarget regions during thermodynamic comparisons.In addition, more precise nonlinear salt effect estimates and other improvements are added, making PICKY 2.1 more versatile to microarray users.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Genetics, Development and Cell Biology, and Department of Computer Science, Iowa State University, Ames, IA, 50011-3223, USA. hhchou@iastate.edu

ABSTRACT

Background: Large genomes contain families of highly similar genes that cannot be individually identified by microarray probes. This limitation is due to thermodynamic restrictions and cannot be resolved by any computational method. Since gene annotations are updated more frequently than microarrays, another common issue facing microarray users is that existing microarrays must be routinely reanalyzed to determine probes that are still useful with respect to the updated annotations.

Results: PICKY 2.0 can design shared probes for sets of genes that cannot be individually identified using unique probes. PICKY 2.0 uses novel algorithms to track sharable regions among genes and to strictly distinguish them from other highly similar but nontarget regions during thermodynamic comparisons. Therefore, PICKY does not sacrifice the quality of shared probes when choosing them. The latest PICKY 2.1 includes the new capability to reanalyze existing microarray probes against updated gene sets to determine probes that are still valid to use. In addition, more precise nonlinear salt effect estimates and other improvements are added, making PICKY 2.1 more versatile to microarray users.

Conclusions: Shared probes allow expressed gene family members to be detected; this capability is generally more desirable than not knowing anything about these genes. Shared probes also enable the design of cross-genome microarrays, which facilitate multiple species identification in environmental samples. The new nonlinear salt effect calculation significantly increases the precision of probes at a lower buffer salt concentration, and the probe reanalysis function improves existing microarray result interpretations.

Show MeSH