Limits...
Shared probe design and existing microarray reanalysis using PICKY.

Chou HH - BMC Bioinformatics (2010)

Bottom Line: This limitation is due to thermodynamic restrictions and cannot be resolved by any computational method.PICKY 2.0 uses novel algorithms to track sharable regions among genes and to strictly distinguish them from other highly similar but nontarget regions during thermodynamic comparisons.In addition, more precise nonlinear salt effect estimates and other improvements are added, making PICKY 2.1 more versatile to microarray users.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Genetics, Development and Cell Biology, and Department of Computer Science, Iowa State University, Ames, IA, 50011-3223, USA. hhchou@iastate.edu

ABSTRACT

Background: Large genomes contain families of highly similar genes that cannot be individually identified by microarray probes. This limitation is due to thermodynamic restrictions and cannot be resolved by any computational method. Since gene annotations are updated more frequently than microarrays, another common issue facing microarray users is that existing microarrays must be routinely reanalyzed to determine probes that are still useful with respect to the updated annotations.

Results: PICKY 2.0 can design shared probes for sets of genes that cannot be individually identified using unique probes. PICKY 2.0 uses novel algorithms to track sharable regions among genes and to strictly distinguish them from other highly similar but nontarget regions during thermodynamic comparisons. Therefore, PICKY does not sacrifice the quality of shared probes when choosing them. The latest PICKY 2.1 includes the new capability to reanalyze existing microarray probes against updated gene sets to determine probes that are still valid to use. In addition, more precise nonlinear salt effect estimates and other improvements are added, making PICKY 2.1 more versatile to microarray users.

Conclusions: Shared probes allow expressed gene family members to be detected; this capability is generally more desirable than not knowing anything about these genes. Shared probes also enable the design of cross-genome microarrays, which facilitate multiple species identification in environmental samples. The new nonlinear salt effect calculation significantly increases the precision of probes at a lower buffer salt concentration, and the probe reanalysis function improves existing microarray result interpretations.

Show MeSH
An example of overlapping gene family sequences. Five sequences A--E can overlap each other in six regions as indicated by the gray colors; darker grays indicate more sequences that overlap. These common regions are represented by suffix groups, which are found on the suffix array and hosted by sequences with the black underlines (i.e., sequences B, C and D). The underlines also indicate the stacking of the suffix groups when a host sequence is being processed.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2875240&req=5

Figure 1: An example of overlapping gene family sequences. Five sequences A--E can overlap each other in six regions as indicated by the gray colors; darker grays indicate more sequences that overlap. These common regions are represented by suffix groups, which are found on the suffix array and hosted by sequences with the black underlines (i.e., sequences B, C and D). The underlines also indicate the stacking of the suffix groups when a host sequence is being processed.

Mentions: Although it is straightforward to detect common regions between any two sequences, from the perspective of probe design these common regions often randomly overlap, making it difficult to target a probe. An example helps illustrate the complexity. In Figure 1, five sequences A-E are shown to share six common regions. Region 1 is shared by A and B, region 2 is shared by B and C, region 3 is shared by C and D, region 4 is shared by A-C, region 5 is shared by A-D, and region 6 is shared by all sequences. The common regions overlap each other, e.g., region 4 overlaps regions 1 and 2, and region 6 overlaps all other regions. Therefore, a common region may be implicitly divided into more regions, and probes targeting the region should not cross any of its dividing boundaries set by the other regions. For example, probes designed to target the early part of region 1 are shared only by sequences A and B, but probes designed to target region 5, which overlaps region 1, are shared by sequences A-D. A probe that targets the boundary area between regions 1 and 5 does not make a good probe for either region. This example demonstrates that simply finding a long common region is not sufficient to design good shared probes.


Shared probe design and existing microarray reanalysis using PICKY.

Chou HH - BMC Bioinformatics (2010)

An example of overlapping gene family sequences. Five sequences A--E can overlap each other in six regions as indicated by the gray colors; darker grays indicate more sequences that overlap. These common regions are represented by suffix groups, which are found on the suffix array and hosted by sequences with the black underlines (i.e., sequences B, C and D). The underlines also indicate the stacking of the suffix groups when a host sequence is being processed.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2875240&req=5

Figure 1: An example of overlapping gene family sequences. Five sequences A--E can overlap each other in six regions as indicated by the gray colors; darker grays indicate more sequences that overlap. These common regions are represented by suffix groups, which are found on the suffix array and hosted by sequences with the black underlines (i.e., sequences B, C and D). The underlines also indicate the stacking of the suffix groups when a host sequence is being processed.
Mentions: Although it is straightforward to detect common regions between any two sequences, from the perspective of probe design these common regions often randomly overlap, making it difficult to target a probe. An example helps illustrate the complexity. In Figure 1, five sequences A-E are shown to share six common regions. Region 1 is shared by A and B, region 2 is shared by B and C, region 3 is shared by C and D, region 4 is shared by A-C, region 5 is shared by A-D, and region 6 is shared by all sequences. The common regions overlap each other, e.g., region 4 overlaps regions 1 and 2, and region 6 overlaps all other regions. Therefore, a common region may be implicitly divided into more regions, and probes targeting the region should not cross any of its dividing boundaries set by the other regions. For example, probes designed to target the early part of region 1 are shared only by sequences A and B, but probes designed to target region 5, which overlaps region 1, are shared by sequences A-D. A probe that targets the boundary area between regions 1 and 5 does not make a good probe for either region. This example demonstrates that simply finding a long common region is not sufficient to design good shared probes.

Bottom Line: This limitation is due to thermodynamic restrictions and cannot be resolved by any computational method.PICKY 2.0 uses novel algorithms to track sharable regions among genes and to strictly distinguish them from other highly similar but nontarget regions during thermodynamic comparisons.In addition, more precise nonlinear salt effect estimates and other improvements are added, making PICKY 2.1 more versatile to microarray users.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Genetics, Development and Cell Biology, and Department of Computer Science, Iowa State University, Ames, IA, 50011-3223, USA. hhchou@iastate.edu

ABSTRACT

Background: Large genomes contain families of highly similar genes that cannot be individually identified by microarray probes. This limitation is due to thermodynamic restrictions and cannot be resolved by any computational method. Since gene annotations are updated more frequently than microarrays, another common issue facing microarray users is that existing microarrays must be routinely reanalyzed to determine probes that are still useful with respect to the updated annotations.

Results: PICKY 2.0 can design shared probes for sets of genes that cannot be individually identified using unique probes. PICKY 2.0 uses novel algorithms to track sharable regions among genes and to strictly distinguish them from other highly similar but nontarget regions during thermodynamic comparisons. Therefore, PICKY does not sacrifice the quality of shared probes when choosing them. The latest PICKY 2.1 includes the new capability to reanalyze existing microarray probes against updated gene sets to determine probes that are still valid to use. In addition, more precise nonlinear salt effect estimates and other improvements are added, making PICKY 2.1 more versatile to microarray users.

Conclusions: Shared probes allow expressed gene family members to be detected; this capability is generally more desirable than not knowing anything about these genes. Shared probes also enable the design of cross-genome microarrays, which facilitate multiple species identification in environmental samples. The new nonlinear salt effect calculation significantly increases the precision of probes at a lower buffer salt concentration, and the probe reanalysis function improves existing microarray result interpretations.

Show MeSH