Limits...
MutSpec: a Galaxy toolbox for streamlined analyses of somatic mutation spectra in human and mouse cancer genomes.

Ardin M, Cahais V, Castells X, Bouaoun L, Byrnes G, Herceg Z, Zavadil J, Olivier M - BMC Bioinformatics (2016)

Bottom Line: Results are provided in various formats including rich graphical outputs.An example is presented to illustrate the package functionalities, the straightforward workflow analysis and the richness of the statistics and publication-grade graphics produced by the tool.MutSpec can thus effectively assist in the discovery of complex mutational processes resulting from exogenous and endogenous carcinogenic insults.

View Article: PubMed Central - PubMed

Affiliation: Molecular Mechanisms and Biomarkers Group, International Agency for Research on Cancer, F69372, Lyon, France.

ABSTRACT

Background: The nature of somatic mutations observed in human tumors at single gene or genome-wide levels can reveal information on past carcinogenic exposures and mutational processes contributing to tumor development. While large amounts of sequencing data are being generated, the associated analysis and interpretation of mutation patterns that may reveal clues about the natural history of cancer present complex and challenging tasks that require advanced bioinformatics skills. To make such analyses accessible to a wider community of researchers with no programming expertise, we have developed within the web-based user-friendly platform Galaxy a first-of-its-kind package called MutSpec.

Results: MutSpec includes a set of tools that perform variant annotation and use advanced statistics for the identification of mutation signatures present in cancer genomes and for comparing the obtained signatures with those published in the COSMIC database and other sources. MutSpec offers an accessible framework for building reproducible analysis pipelines, integrating existing methods and scripts developed in-house with publicly available R packages. MutSpec may be used to analyse data from whole-exome, whole-genome or targeted sequencing experiments performed on human or mouse genomes. Results are provided in various formats including rich graphical outputs. An example is presented to illustrate the package functionalities, the straightforward workflow analysis and the richness of the statistics and publication-grade graphics produced by the tool.

Conclusions: MutSpec offers an easy-to-use graphical interface embedded in the popular Galaxy platform that can be used by researchers with limited programming or bioinformatics expertise to analyse mutation signatures present in cancer genomes. MutSpec can thus effectively assist in the discovery of complex mutational processes resulting from exogenous and endogenous carcinogenic insults.

No MeSH data available.


Related in: MedlinePlus

Mutation spectra in OSCC from Indian patients. Results for the pool of 106 samples are shown. a Distribution of variants (N = 13059) according to their functional impact on protein sequences. b Stranded analysis of the 6 types of SBS showing counts for SBS with transcript annotations (N = 12789). c Distribution of SBS according to their trinucleotide sequence context (SBS counts are indicated in parenthesis). d Plots of the cophenic and rss analyses using a range of factorisation values (2 to 8). The solid lines represent the results obtained with the original data while the dotted lines represent the results obtained with randomized data (original data are shuffled)
© Copyright Policy - OpenAccess
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4835840&req=5

Fig2: Mutation spectra in OSCC from Indian patients. Results for the pool of 106 samples are shown. a Distribution of variants (N = 13059) according to their functional impact on protein sequences. b Stranded analysis of the 6 types of SBS showing counts for SBS with transcript annotations (N = 12789). c Distribution of SBS according to their trinucleotide sequence context (SBS counts are indicated in parenthesis). d Plots of the cophenic and rss analyses using a range of factorisation values (2 to 8). The solid lines represent the results obtained with the original data while the dotted lines represent the results obtained with randomized data (original data are shuffled)

Mentions: The dataset collection created by MutSpec-Split was then used as input for MutSpec-Stat to generate statistics on mutation spectra for each sample and to compute the mutation matrix to be used for extracting mutation signatures. We ran the tool with the “pool sample” option in order to obtain statistics for the pooled samples. The reference genome should be specified again at this step. Finally, we also selected the option that calculates statistics for estimating the number of signatures present in the dataset. A summary of the results are shown in Fig. 2 for the sample pool (see detailed results in Additional file 1). The overall mutation pattern shows that the majority of variants are non-synonymous SBS (Fig. 2a), and that the most frequent SBS types are C:G > A:T followed by C:G > T:A (Fig. 2b). The trinucleotide sequence context distribution of these mutations show specific patterns, with C > A occurring preferentially within 5’-GCN-3’ motifs and C > T within CpG sites (Fig. 2c). The third most frequent SBS are C > G. Both C > G and C > T occur preferentially within 5’-TCN-3’ motifs, suggesting the presence of APOBEC-induced mutations [23]. Based on the cophenic and rss statistics calculated for estimating the NMF factorization value, 4 signatures may be present in this dataset as it is the first value for which the cophenetic coefficient starts decreasing and where the rss curve presents an inflection point (Fig. 2d).Fig. 2


MutSpec: a Galaxy toolbox for streamlined analyses of somatic mutation spectra in human and mouse cancer genomes.

Ardin M, Cahais V, Castells X, Bouaoun L, Byrnes G, Herceg Z, Zavadil J, Olivier M - BMC Bioinformatics (2016)

Mutation spectra in OSCC from Indian patients. Results for the pool of 106 samples are shown. a Distribution of variants (N = 13059) according to their functional impact on protein sequences. b Stranded analysis of the 6 types of SBS showing counts for SBS with transcript annotations (N = 12789). c Distribution of SBS according to their trinucleotide sequence context (SBS counts are indicated in parenthesis). d Plots of the cophenic and rss analyses using a range of factorisation values (2 to 8). The solid lines represent the results obtained with the original data while the dotted lines represent the results obtained with randomized data (original data are shuffled)
© Copyright Policy - OpenAccess
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4835840&req=5

Fig2: Mutation spectra in OSCC from Indian patients. Results for the pool of 106 samples are shown. a Distribution of variants (N = 13059) according to their functional impact on protein sequences. b Stranded analysis of the 6 types of SBS showing counts for SBS with transcript annotations (N = 12789). c Distribution of SBS according to their trinucleotide sequence context (SBS counts are indicated in parenthesis). d Plots of the cophenic and rss analyses using a range of factorisation values (2 to 8). The solid lines represent the results obtained with the original data while the dotted lines represent the results obtained with randomized data (original data are shuffled)
Mentions: The dataset collection created by MutSpec-Split was then used as input for MutSpec-Stat to generate statistics on mutation spectra for each sample and to compute the mutation matrix to be used for extracting mutation signatures. We ran the tool with the “pool sample” option in order to obtain statistics for the pooled samples. The reference genome should be specified again at this step. Finally, we also selected the option that calculates statistics for estimating the number of signatures present in the dataset. A summary of the results are shown in Fig. 2 for the sample pool (see detailed results in Additional file 1). The overall mutation pattern shows that the majority of variants are non-synonymous SBS (Fig. 2a), and that the most frequent SBS types are C:G > A:T followed by C:G > T:A (Fig. 2b). The trinucleotide sequence context distribution of these mutations show specific patterns, with C > A occurring preferentially within 5’-GCN-3’ motifs and C > T within CpG sites (Fig. 2c). The third most frequent SBS are C > G. Both C > G and C > T occur preferentially within 5’-TCN-3’ motifs, suggesting the presence of APOBEC-induced mutations [23]. Based on the cophenic and rss statistics calculated for estimating the NMF factorization value, 4 signatures may be present in this dataset as it is the first value for which the cophenetic coefficient starts decreasing and where the rss curve presents an inflection point (Fig. 2d).Fig. 2

Bottom Line: Results are provided in various formats including rich graphical outputs.An example is presented to illustrate the package functionalities, the straightforward workflow analysis and the richness of the statistics and publication-grade graphics produced by the tool.MutSpec can thus effectively assist in the discovery of complex mutational processes resulting from exogenous and endogenous carcinogenic insults.

View Article: PubMed Central - PubMed

Affiliation: Molecular Mechanisms and Biomarkers Group, International Agency for Research on Cancer, F69372, Lyon, France.

ABSTRACT

Background: The nature of somatic mutations observed in human tumors at single gene or genome-wide levels can reveal information on past carcinogenic exposures and mutational processes contributing to tumor development. While large amounts of sequencing data are being generated, the associated analysis and interpretation of mutation patterns that may reveal clues about the natural history of cancer present complex and challenging tasks that require advanced bioinformatics skills. To make such analyses accessible to a wider community of researchers with no programming expertise, we have developed within the web-based user-friendly platform Galaxy a first-of-its-kind package called MutSpec.

Results: MutSpec includes a set of tools that perform variant annotation and use advanced statistics for the identification of mutation signatures present in cancer genomes and for comparing the obtained signatures with those published in the COSMIC database and other sources. MutSpec offers an accessible framework for building reproducible analysis pipelines, integrating existing methods and scripts developed in-house with publicly available R packages. MutSpec may be used to analyse data from whole-exome, whole-genome or targeted sequencing experiments performed on human or mouse genomes. Results are provided in various formats including rich graphical outputs. An example is presented to illustrate the package functionalities, the straightforward workflow analysis and the richness of the statistics and publication-grade graphics produced by the tool.

Conclusions: MutSpec offers an easy-to-use graphical interface embedded in the popular Galaxy platform that can be used by researchers with limited programming or bioinformatics expertise to analyse mutation signatures present in cancer genomes. MutSpec can thus effectively assist in the discovery of complex mutational processes resulting from exogenous and endogenous carcinogenic insults.

No MeSH data available.


Related in: MedlinePlus