Limits...
Quality Control for RNA-Seq (QuaCRS): An Integrated Quality Control Pipeline.

Kroll KW, Mokaram NE, Pelletier AR, Frankhouser DE, Westphal MS, Stump PA, Stump CL, Bundschuh R, Blachly JS, Yan P - Cancer Inform (2014)

Bottom Line: Combining these three tools into one wrapper provides increased ease of use and provides a much more complete view of sample data quality than any individual tool.Second is the QC database, which displays the resulting metrics in a user-friendly web interface.The structure of the QuaCRS database is designed to enable expansion with additional tools and metrics in the future.

View Article: PubMed Central - PubMed

Affiliation: Department of Internal Medicine, Division of Hematology, Ohio State University Comprehensive Cancer Center, Columbus, OH, USA.

ABSTRACT
QuaCRS (Quality Control for RNA-Seq) is an integrated, simplified quality control (QC) system for RNA-seq data that allows easy execution of several open-source QC tools, aggregation of their output, and the ability to quickly identify quality issues by performing meta-analyses on QC metrics across large numbers of samples in different studies. It comprises two main sections. First is the QC Pack wrapper, which executes three QC tools: FastQC, RNA-SeQC, and selected functions from RSeQC. Combining these three tools into one wrapper provides increased ease of use and provides a much more complete view of sample data quality than any individual tool. Second is the QC database, which displays the resulting metrics in a user-friendly web interface. It was designed to allow users with less computational experience to easily generate and view QC information for their data, to investigate individual samples and aggregate reports of sample groups, and to sort and search samples based on quality. The structure of the QuaCRS database is designed to enable expansion with additional tools and metrics in the future. The source code for not-for-profit use and a fully functional sample user interface with mock data are available at http://bioserv.mps.ohio-state.edu/QuaCRS/.

No MeSH data available.


Directory structure produced and used by the QC Pack program. QC Pack runs on one sample at a time with a configuration text file passed as an argument. Before execution, this configuration file must be populated with the required metadata for the sample. In addition, a separate configuration file must be populated with the locations of the necessary tools for QuaCRS to run. These two configuration files, and the QC Pack program itself, are shown in yellow to indicate that they must exist before QC Pack is executed. Upon execution, the sample is given a unique identifier (ID) composed of several pieces of metadata. The workflow then creates the directories, shown in green, and populates them with data, shown in blue. It also produces a composite table of all QC metrics and images for the sample to be parsed into the database.
© Copyright Policy - open-access
Related In: Results  -  Collection


getmorefigures.php?uid=PMC4214596&req=5

f2-cin-suppl.3-2014-007: Directory structure produced and used by the QC Pack program. QC Pack runs on one sample at a time with a configuration text file passed as an argument. Before execution, this configuration file must be populated with the required metadata for the sample. In addition, a separate configuration file must be populated with the locations of the necessary tools for QuaCRS to run. These two configuration files, and the QC Pack program itself, are shown in yellow to indicate that they must exist before QC Pack is executed. Upon execution, the sample is given a unique identifier (ID) composed of several pieces of metadata. The workflow then creates the directories, shown in green, and populates them with data, shown in blue. It also produces a composite table of all QC metrics and images for the sample to be parsed into the database.

Mentions: QuaCRS will process RNA-seq data sequentially through the three QC tools: FastQC, RNA-SeQC, and RSeQC. The output files are then ready for query and report generation as shown in Figure 2. The output file from one tool will trigger the launch of the subsequent tool until all steps are completed in the workflow. At each step, it will check to see whether the output from that tool already exists, and will move on to the next step if an output file is found. This design element allows efficient updating and management of ongoing studies present in QuaCRS. It eliminates the likelihood of rerunning any QC tools unnecessarily when new samples are uploaded to an existing study. There exist scenarios whereby one might need to bypass this functionality, eg, when replacing improperly generated QC data from a previous run. In this case, an additional argument can be passed to force the program to rerun all QC tools, regardless of whether QC output files for that sample exists. At the completion of a run, a directory for each QC tool will appear. These directories will contain a subdirectory for each of the processed samples. Each sample subdirectory will contain a text file that will be parsed and copied into the database. QuaCRS provides the flexibility to upload samples to the database one at a time as their QC analyses finish, or upload the whole directory when all samples are processed.


Quality Control for RNA-Seq (QuaCRS): An Integrated Quality Control Pipeline.

Kroll KW, Mokaram NE, Pelletier AR, Frankhouser DE, Westphal MS, Stump PA, Stump CL, Bundschuh R, Blachly JS, Yan P - Cancer Inform (2014)

Directory structure produced and used by the QC Pack program. QC Pack runs on one sample at a time with a configuration text file passed as an argument. Before execution, this configuration file must be populated with the required metadata for the sample. In addition, a separate configuration file must be populated with the locations of the necessary tools for QuaCRS to run. These two configuration files, and the QC Pack program itself, are shown in yellow to indicate that they must exist before QC Pack is executed. Upon execution, the sample is given a unique identifier (ID) composed of several pieces of metadata. The workflow then creates the directories, shown in green, and populates them with data, shown in blue. It also produces a composite table of all QC metrics and images for the sample to be parsed into the database.
© Copyright Policy - open-access
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC4214596&req=5

f2-cin-suppl.3-2014-007: Directory structure produced and used by the QC Pack program. QC Pack runs on one sample at a time with a configuration text file passed as an argument. Before execution, this configuration file must be populated with the required metadata for the sample. In addition, a separate configuration file must be populated with the locations of the necessary tools for QuaCRS to run. These two configuration files, and the QC Pack program itself, are shown in yellow to indicate that they must exist before QC Pack is executed. Upon execution, the sample is given a unique identifier (ID) composed of several pieces of metadata. The workflow then creates the directories, shown in green, and populates them with data, shown in blue. It also produces a composite table of all QC metrics and images for the sample to be parsed into the database.
Mentions: QuaCRS will process RNA-seq data sequentially through the three QC tools: FastQC, RNA-SeQC, and RSeQC. The output files are then ready for query and report generation as shown in Figure 2. The output file from one tool will trigger the launch of the subsequent tool until all steps are completed in the workflow. At each step, it will check to see whether the output from that tool already exists, and will move on to the next step if an output file is found. This design element allows efficient updating and management of ongoing studies present in QuaCRS. It eliminates the likelihood of rerunning any QC tools unnecessarily when new samples are uploaded to an existing study. There exist scenarios whereby one might need to bypass this functionality, eg, when replacing improperly generated QC data from a previous run. In this case, an additional argument can be passed to force the program to rerun all QC tools, regardless of whether QC output files for that sample exists. At the completion of a run, a directory for each QC tool will appear. These directories will contain a subdirectory for each of the processed samples. Each sample subdirectory will contain a text file that will be parsed and copied into the database. QuaCRS provides the flexibility to upload samples to the database one at a time as their QC analyses finish, or upload the whole directory when all samples are processed.

Bottom Line: Combining these three tools into one wrapper provides increased ease of use and provides a much more complete view of sample data quality than any individual tool.Second is the QC database, which displays the resulting metrics in a user-friendly web interface.The structure of the QuaCRS database is designed to enable expansion with additional tools and metrics in the future.

View Article: PubMed Central - PubMed

Affiliation: Department of Internal Medicine, Division of Hematology, Ohio State University Comprehensive Cancer Center, Columbus, OH, USA.

ABSTRACT
QuaCRS (Quality Control for RNA-Seq) is an integrated, simplified quality control (QC) system for RNA-seq data that allows easy execution of several open-source QC tools, aggregation of their output, and the ability to quickly identify quality issues by performing meta-analyses on QC metrics across large numbers of samples in different studies. It comprises two main sections. First is the QC Pack wrapper, which executes three QC tools: FastQC, RNA-SeQC, and selected functions from RSeQC. Combining these three tools into one wrapper provides increased ease of use and provides a much more complete view of sample data quality than any individual tool. Second is the QC database, which displays the resulting metrics in a user-friendly web interface. It was designed to allow users with less computational experience to easily generate and view QC information for their data, to investigate individual samples and aggregate reports of sample groups, and to sort and search samples based on quality. The structure of the QuaCRS database is designed to enable expansion with additional tools and metrics in the future. The source code for not-for-profit use and a fully functional sample user interface with mock data are available at http://bioserv.mps.ohio-state.edu/QuaCRS/.

No MeSH data available.