Limits...
PSR: polymorphic SSR retrieval.

Cantarella C, D'Agostino N - BMC Res Notes (2015)

Bottom Line: The first study aims at the identification of polymorphic SSRs in a set of de novo assembled transcripts defined by RNA-sequencing of two different plant genotypes.PSR has been specifically developed from the need to automate the gene-based and genome-wide identification of polymorphic microsatellites from NGS data.It overcomes the limits related to the existing and time-consuming efforts based on tools developed in the pre-NGS era.

View Article: PubMed Central - PubMed

Affiliation: Consiglio per la ricerca in agricoltura e l'analisi dell'economia agraria - Centro di ricerca per l'orticoltura, Via Cavalleggeri 25, 84098, Pontecagnano Faiano, Italy. concita.cantarella@gmail.com.

ABSTRACT

Background: With the advent of high-throughput sequencing technologies large-scale identification of microsatellites became affordable and was especially directed to non-model species. By contrast, few efforts have been published toward the automatic identification of polymorphic microsatellites by exploiting sequence redundancy. Few tools for genotyping microsatellite repeats have been implemented so far that are able to manage huge amount of sequence data and handle the SAM/BAM file format. Most of them have been developed for and tested on human or model organisms with high quality reference genomes.

Results: In this note we describe polymorphic SSR retrieval (PSR), a read counter and simple sequence repeat (SSR) length polymorphism detection tool. It is written in Perl and was developed to identify length polymorphisms in perfect microsatellites exploiting next generation sequencing (NGS) data. PSR has been developed bearing in mind plant non-model species for which de novo transcriptome assembly is generally the first sequence resource available to be used for SSR-mining. PSR is divided into two modules: the read-counting module (PSR_read_retrieval) identifies all the reads that cover the full-length of perfect microsatellites; the comparative module (PSR_poly_finder) detects both heterozygous and homozygous alleles at each microsatellite locus across all genotypes under investigation. Two threshold values to call a length polymorphism and reduce the number of false positives can be defined by the user: the minimum number of reads overlapping the repetitive stretch and the minimum read depth. The first parameter determines if the microsatellite-containing sequence must be processed or not, while the second one is decisive for the identification of minor alleles. PSR was tested on two different case studies. The first study aims at the identification of polymorphic SSRs in a set of de novo assembled transcripts defined by RNA-sequencing of two different plant genotypes. The second research activity aims to investigate sequence variations within a collection of newly sequenced chloroplast genomes. In both the cases PSR results are in agreement with those obtained by capillary gel separation.

Conclusion: PSR has been specifically developed from the need to automate the gene-based and genome-wide identification of polymorphic microsatellites from NGS data. It overcomes the limits related to the existing and time-consuming efforts based on tools developed in the pre-NGS era.

No MeSH data available.


Related in: MedlinePlus

Workflow design of the polymorphic SSR retrieval tool that includes two modules. PSR_read_retrieval aims at the identification of all the reads that cover the full-length of perfect microsatellites. N indicates the number of iteration that must correspond to the number of genotypes under investigation. PSR_poly_finder detects length polymorphism in microsatellites
© Copyright Policy - OpenAccess
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4591729&req=5

Fig1: Workflow design of the polymorphic SSR retrieval tool that includes two modules. PSR_read_retrieval aims at the identification of all the reads that cover the full-length of perfect microsatellites. N indicates the number of iteration that must correspond to the number of genotypes under investigation. PSR_poly_finder detects length polymorphism in microsatellites

Mentions: The user guide provides general information on software dependencies and installation procedure as well as detailed instructions for running the application. PSR workflow is shown in Fig. 1 and the text that follows describes the key points. Psr_read_retrieval aims at the identification of all the reads that align to the reference sequences covering the full-length of perfect microsatellites. We decided to focus on perfect microsatellite only, because the total length of the reads obtained by the Illumina sequencing instruments rarely exceeds the 100 high quality nucleotides. These constraints appear to limit polymorphism discovery only to short polymorphic SSRs, but as it is evident from data in Table 1, the maximum length of the microsatellite is always slightly lower than the one of the sequenced read. In addition, the number of reads covering each microsatellite does not reflect the total number of reads at each locus. Indeed, in case SSRs are located at the ends of the read, the microsatellite-containing sequences are discarded since they can strongly affect the call of polymorphic sites (Fig. 2).Fig. 1


PSR: polymorphic SSR retrieval.

Cantarella C, D'Agostino N - BMC Res Notes (2015)

Workflow design of the polymorphic SSR retrieval tool that includes two modules. PSR_read_retrieval aims at the identification of all the reads that cover the full-length of perfect microsatellites. N indicates the number of iteration that must correspond to the number of genotypes under investigation. PSR_poly_finder detects length polymorphism in microsatellites
© Copyright Policy - OpenAccess
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4591729&req=5

Fig1: Workflow design of the polymorphic SSR retrieval tool that includes two modules. PSR_read_retrieval aims at the identification of all the reads that cover the full-length of perfect microsatellites. N indicates the number of iteration that must correspond to the number of genotypes under investigation. PSR_poly_finder detects length polymorphism in microsatellites
Mentions: The user guide provides general information on software dependencies and installation procedure as well as detailed instructions for running the application. PSR workflow is shown in Fig. 1 and the text that follows describes the key points. Psr_read_retrieval aims at the identification of all the reads that align to the reference sequences covering the full-length of perfect microsatellites. We decided to focus on perfect microsatellite only, because the total length of the reads obtained by the Illumina sequencing instruments rarely exceeds the 100 high quality nucleotides. These constraints appear to limit polymorphism discovery only to short polymorphic SSRs, but as it is evident from data in Table 1, the maximum length of the microsatellite is always slightly lower than the one of the sequenced read. In addition, the number of reads covering each microsatellite does not reflect the total number of reads at each locus. Indeed, in case SSRs are located at the ends of the read, the microsatellite-containing sequences are discarded since they can strongly affect the call of polymorphic sites (Fig. 2).Fig. 1

Bottom Line: The first study aims at the identification of polymorphic SSRs in a set of de novo assembled transcripts defined by RNA-sequencing of two different plant genotypes.PSR has been specifically developed from the need to automate the gene-based and genome-wide identification of polymorphic microsatellites from NGS data.It overcomes the limits related to the existing and time-consuming efforts based on tools developed in the pre-NGS era.

View Article: PubMed Central - PubMed

Affiliation: Consiglio per la ricerca in agricoltura e l'analisi dell'economia agraria - Centro di ricerca per l'orticoltura, Via Cavalleggeri 25, 84098, Pontecagnano Faiano, Italy. concita.cantarella@gmail.com.

ABSTRACT

Background: With the advent of high-throughput sequencing technologies large-scale identification of microsatellites became affordable and was especially directed to non-model species. By contrast, few efforts have been published toward the automatic identification of polymorphic microsatellites by exploiting sequence redundancy. Few tools for genotyping microsatellite repeats have been implemented so far that are able to manage huge amount of sequence data and handle the SAM/BAM file format. Most of them have been developed for and tested on human or model organisms with high quality reference genomes.

Results: In this note we describe polymorphic SSR retrieval (PSR), a read counter and simple sequence repeat (SSR) length polymorphism detection tool. It is written in Perl and was developed to identify length polymorphisms in perfect microsatellites exploiting next generation sequencing (NGS) data. PSR has been developed bearing in mind plant non-model species for which de novo transcriptome assembly is generally the first sequence resource available to be used for SSR-mining. PSR is divided into two modules: the read-counting module (PSR_read_retrieval) identifies all the reads that cover the full-length of perfect microsatellites; the comparative module (PSR_poly_finder) detects both heterozygous and homozygous alleles at each microsatellite locus across all genotypes under investigation. Two threshold values to call a length polymorphism and reduce the number of false positives can be defined by the user: the minimum number of reads overlapping the repetitive stretch and the minimum read depth. The first parameter determines if the microsatellite-containing sequence must be processed or not, while the second one is decisive for the identification of minor alleles. PSR was tested on two different case studies. The first study aims at the identification of polymorphic SSRs in a set of de novo assembled transcripts defined by RNA-sequencing of two different plant genotypes. The second research activity aims to investigate sequence variations within a collection of newly sequenced chloroplast genomes. In both the cases PSR results are in agreement with those obtained by capillary gel separation.

Conclusion: PSR has been specifically developed from the need to automate the gene-based and genome-wide identification of polymorphic microsatellites from NGS data. It overcomes the limits related to the existing and time-consuming efforts based on tools developed in the pre-NGS era.

No MeSH data available.


Related in: MedlinePlus