Limits...
Large-scale identification of polymorphic microsatellites using an in silico approach.

Tang J, Baldwin SJ, Jacobs JM, Linden CG, Voorrips RE, Leunissen JA, van Eck H, Vosman B - BMC Bioinformatics (2008)

Bottom Line: PolySSR is a very effective tool to identify polymorphic SSRs.Using PolySSR, several hundred putative markers were developed and stored in a searchable database.This greatly improves the efficiency of marker development, especially in species where there are low levels of polymorphism, like tomato.

View Article: PubMed Central - HTML - PubMed

Affiliation: Laboratory of Bioinformatics, Wageningen University, PO Box 8128, 6700 ET Wageningen, the Netherlands. jifeng.tang@gmail.com

ABSTRACT

Background: Simple Sequence Repeat (SSR) or microsatellite markers are valuable for genetic research. Experimental methods to develop SSR markers are laborious, time consuming and expensive. In silico approaches have become a practicable and relatively inexpensive alternative during the last decade, although testing putative SSR markers still is time consuming and expensive. In many species only a relatively small percentage of SSR markers turn out to be polymorphic. This is particularly true for markers derived from expressed sequence tags (ESTs). In EST databases a large redundancy of sequences is present, which may contain information on length-polymorphisms in the SSR they contain, and whether they have been derived from heterozygotes or from different genotypes. Up to now, although a number of programs have been developed to identify SSRs in EST sequences, no software can detect putatively polymorphic SSRs.

Results: We have developed PolySSR, a new pipeline to identify polymorphic SSRs rather than just SSRs. Sequence information is obtained from public EST databases derived from heterozygous individuals and/or at least two different genotypes. The pipeline includes PCR-primer design for the putatively polymorphic SSR markers, taking into account Single Nucleotide Polymorphisms (SNPs) in the flanking regions, thereby improving the success rate of the potential markers. A large number of polymorphic SSRs were identified using publicly available EST sequences of potato, tomato, rice, Arabidopsis, Brassica and chicken.The SSRs obtained were divided into long and short based on the number of times the motif was repeated. Surprisingly, the frequency of polymorphic SSRs was much higher in the short SSRs.

Conclusion: PolySSR is a very effective tool to identify polymorphic SSRs. Using PolySSR, several hundred putative markers were developed and stored in a searchable database. Validation experiments showed that almost all markers that were indicated as putatively polymorphic by polySSR were indeed polymorphic. This greatly improves the efficiency of marker development, especially in species where there are low levels of polymorphism, like tomato. When combined with the new sequencing technologies PolySSR will have a big impact on the development of polymorphic SSRs in any species.PolySSR and the polymorphic SSR marker database are available from http://www.bioinformatics.nl/tools/polyssr/.

Show MeSH

Related in: MedlinePlus

An example of unreliable polymorphic SSRs. Since the repeat chain in EST 3 and 4 does not extend to the end it is not clear whether these two ESTs represent a different (shorter) allele of the SSR or not. For that reason a minimum length for the flanking sequence used must be specified to reliably detect polymorphic SSRs.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2562394&req=5

Figure 1: An example of unreliable polymorphic SSRs. Since the repeat chain in EST 3 and 4 does not extend to the end it is not clear whether these two ESTs represent a different (shorter) allele of the SSR or not. For that reason a minimum length for the flanking sequence used must be specified to reliably detect polymorphic SSRs.

Mentions: The second feature takes into account a criterion that is important for the reliability of PCR amplification: the quality of flanking sequences [15]. This is of particular importance for EST-SSRs, since EST sequences are usually of poor quality, especially at the beginning and end of the sequence. Also, it is important that flanking sequences are of sufficient length to reduce possible artifacts (like the EST 3 and 4 in Figure 1; see also Materials and Methods). PolySSR uses at least 25 nucleotides on both sides of the SSR to filter out SSRs with low quality flanking sequences. Furthermore, potential single nucleotide polymorphisms (SNPs) identified by PolySSR are taken into account when designing primers. This is accomplished by changing the SNPs in the consensus sequence of a contig into N's. Primer3 [24] excludes these positions as suitable positions for primers. Primer sequences of potato SSRs, as provided by TIGR do not take into account potential SNPs around the SSR. Some of these primers are in regions where SNPs are present and therefore may produce unreliable amplicons in some genotypes. PCR primers that fail to anneal to the DNA template will result in -alleles, which are difficult to deal with in genetic experiments. It is also possible that a SSR predicted to be polymorphic becomes monomorphic because the primers amplify one allele only. Using the improved strategy we were able to design reliable primers for more than 93% of the polymorphic SSR in Arabidopsis, Brassica, rice, potato, chicken and tomato (Table 1).


Large-scale identification of polymorphic microsatellites using an in silico approach.

Tang J, Baldwin SJ, Jacobs JM, Linden CG, Voorrips RE, Leunissen JA, van Eck H, Vosman B - BMC Bioinformatics (2008)

An example of unreliable polymorphic SSRs. Since the repeat chain in EST 3 and 4 does not extend to the end it is not clear whether these two ESTs represent a different (shorter) allele of the SSR or not. For that reason a minimum length for the flanking sequence used must be specified to reliably detect polymorphic SSRs.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2562394&req=5

Figure 1: An example of unreliable polymorphic SSRs. Since the repeat chain in EST 3 and 4 does not extend to the end it is not clear whether these two ESTs represent a different (shorter) allele of the SSR or not. For that reason a minimum length for the flanking sequence used must be specified to reliably detect polymorphic SSRs.
Mentions: The second feature takes into account a criterion that is important for the reliability of PCR amplification: the quality of flanking sequences [15]. This is of particular importance for EST-SSRs, since EST sequences are usually of poor quality, especially at the beginning and end of the sequence. Also, it is important that flanking sequences are of sufficient length to reduce possible artifacts (like the EST 3 and 4 in Figure 1; see also Materials and Methods). PolySSR uses at least 25 nucleotides on both sides of the SSR to filter out SSRs with low quality flanking sequences. Furthermore, potential single nucleotide polymorphisms (SNPs) identified by PolySSR are taken into account when designing primers. This is accomplished by changing the SNPs in the consensus sequence of a contig into N's. Primer3 [24] excludes these positions as suitable positions for primers. Primer sequences of potato SSRs, as provided by TIGR do not take into account potential SNPs around the SSR. Some of these primers are in regions where SNPs are present and therefore may produce unreliable amplicons in some genotypes. PCR primers that fail to anneal to the DNA template will result in -alleles, which are difficult to deal with in genetic experiments. It is also possible that a SSR predicted to be polymorphic becomes monomorphic because the primers amplify one allele only. Using the improved strategy we were able to design reliable primers for more than 93% of the polymorphic SSR in Arabidopsis, Brassica, rice, potato, chicken and tomato (Table 1).

Bottom Line: PolySSR is a very effective tool to identify polymorphic SSRs.Using PolySSR, several hundred putative markers were developed and stored in a searchable database.This greatly improves the efficiency of marker development, especially in species where there are low levels of polymorphism, like tomato.

View Article: PubMed Central - HTML - PubMed

Affiliation: Laboratory of Bioinformatics, Wageningen University, PO Box 8128, 6700 ET Wageningen, the Netherlands. jifeng.tang@gmail.com

ABSTRACT

Background: Simple Sequence Repeat (SSR) or microsatellite markers are valuable for genetic research. Experimental methods to develop SSR markers are laborious, time consuming and expensive. In silico approaches have become a practicable and relatively inexpensive alternative during the last decade, although testing putative SSR markers still is time consuming and expensive. In many species only a relatively small percentage of SSR markers turn out to be polymorphic. This is particularly true for markers derived from expressed sequence tags (ESTs). In EST databases a large redundancy of sequences is present, which may contain information on length-polymorphisms in the SSR they contain, and whether they have been derived from heterozygotes or from different genotypes. Up to now, although a number of programs have been developed to identify SSRs in EST sequences, no software can detect putatively polymorphic SSRs.

Results: We have developed PolySSR, a new pipeline to identify polymorphic SSRs rather than just SSRs. Sequence information is obtained from public EST databases derived from heterozygous individuals and/or at least two different genotypes. The pipeline includes PCR-primer design for the putatively polymorphic SSR markers, taking into account Single Nucleotide Polymorphisms (SNPs) in the flanking regions, thereby improving the success rate of the potential markers. A large number of polymorphic SSRs were identified using publicly available EST sequences of potato, tomato, rice, Arabidopsis, Brassica and chicken.The SSRs obtained were divided into long and short based on the number of times the motif was repeated. Surprisingly, the frequency of polymorphic SSRs was much higher in the short SSRs.

Conclusion: PolySSR is a very effective tool to identify polymorphic SSRs. Using PolySSR, several hundred putative markers were developed and stored in a searchable database. Validation experiments showed that almost all markers that were indicated as putatively polymorphic by polySSR were indeed polymorphic. This greatly improves the efficiency of marker development, especially in species where there are low levels of polymorphism, like tomato. When combined with the new sequencing technologies PolySSR will have a big impact on the development of polymorphic SSRs in any species.PolySSR and the polymorphic SSR marker database are available from http://www.bioinformatics.nl/tools/polyssr/.

Show MeSH
Related in: MedlinePlus