Limits...
Plastid: nucleotide-resolution analysis of next-generation sequencing and genomics data

View Article: PubMed Central - PubMed

ABSTRACT

Background: Next-generation sequencing (NGS) informs many biological questions with unprecedented depth and nucleotide resolution. These assays have created a need for analytical tools that enable users to manipulate data nucleotide-by-nucleotide robustly and easily. Furthermore, because many NGS assays encode information jointly within multiple properties of read alignments ― for example, in ribosome profiling, the locations of ribosomes are jointly encoded in alignment coordinates and length ― analytical tools are often required to extract the biological meaning from the alignments before analysis. Many assay-specific pipelines exist for this purpose, but there remains a need for user-friendly, generalized, nucleotide-resolution tools that are not limited to specific experimental regimes or analytical workflows.

Results: Plastid is a Python library designed specifically for nucleotide-resolution analysis of genomics and NGS data. As such, Plastid is designed to extract assay-specific information from read alignments while retaining generality and extensibility to novel NGS assays. Plastid represents NGS and other biological data as arrays of values associated with genomic or transcriptomic positions, and contains configurable tools to convert data from a variety of sources to such arrays.

Results: Plastid also includes numerous tools to manipulate even discontinuous genomic features, such as spliced transcripts, with nucleotide precision. Plastid automatically handles conversion between genomic and feature-centric coordinates, accounting for splicing and strand, freeing users of burdensome accounting. Finally, Plastid’s data models use consistent and familiar biological idioms, enabling even beginners to develop sophisticated analytical workflows with minimal effort.

Conclusions: Plastid is a versatile toolkit that has been used to analyze data from multiple NGS assays, including RNA-seq, ribosome profiling, and DMS-seq. It forms the genomic engine of our ORF annotation tool, ORF-RATER, and is readily adapted to novel NGS assays. Examples, tutorials, and extensive documentation can be found at https://plastid.readthedocs.io.

No MeSH data available.


SegmentChains automate many common tasks. a. SegmentChain and Transcript objects automatically convert coordinates between genomic and transcript-relative spaces. b. SegmentChains and Transcripts can therefore convert read alignments or quantitative data aligned to the genome to arrays of values at each position in the chain. c. Subsections (green, pink) of chains can be fetched using start and end points relative to the parental chains. SegmentChains automatically generate the corresponding genomic coordinates. d. Regions of a chain can be masked from computations without altering the chain coordinates
© Copyright Policy - OpenAccess
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC5120557&req=5

Fig3: SegmentChains automate many common tasks. a. SegmentChain and Transcript objects automatically convert coordinates between genomic and transcript-relative spaces. b. SegmentChains and Transcripts can therefore convert read alignments or quantitative data aligned to the genome to arrays of values at each position in the chain. c. Subsections (green, pink) of chains can be fetched using start and end points relative to the parental chains. SegmentChains automatically generate the corresponding genomic coordinates. d. Regions of a chain can be masked from computations without altering the chain coordinates

Mentions: Second, Plastid introduces a novel data model, the SegmentChain, to describe multi-exon transcripts and other discontinuous features. SegmentChains are aware of their own discontinuity and use this awareness to encapsulate many nucleotide-wise operations that are absent from other toolkits, such as conversion of coordinates or vectorized data between genomic and transcript-centric spaces. SegmentChains automatically account for splicing and complementing, and thus reduce user error during many tasks common in position-wise analysis (Fig. 3).Fig. 3


Plastid: nucleotide-resolution analysis of next-generation sequencing and genomics data
SegmentChains automate many common tasks. a. SegmentChain and Transcript objects automatically convert coordinates between genomic and transcript-relative spaces. b. SegmentChains and Transcripts can therefore convert read alignments or quantitative data aligned to the genome to arrays of values at each position in the chain. c. Subsections (green, pink) of chains can be fetched using start and end points relative to the parental chains. SegmentChains automatically generate the corresponding genomic coordinates. d. Regions of a chain can be masked from computations without altering the chain coordinates
© Copyright Policy - OpenAccess
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC5120557&req=5

Fig3: SegmentChains automate many common tasks. a. SegmentChain and Transcript objects automatically convert coordinates between genomic and transcript-relative spaces. b. SegmentChains and Transcripts can therefore convert read alignments or quantitative data aligned to the genome to arrays of values at each position in the chain. c. Subsections (green, pink) of chains can be fetched using start and end points relative to the parental chains. SegmentChains automatically generate the corresponding genomic coordinates. d. Regions of a chain can be masked from computations without altering the chain coordinates
Mentions: Second, Plastid introduces a novel data model, the SegmentChain, to describe multi-exon transcripts and other discontinuous features. SegmentChains are aware of their own discontinuity and use this awareness to encapsulate many nucleotide-wise operations that are absent from other toolkits, such as conversion of coordinates or vectorized data between genomic and transcript-centric spaces. SegmentChains automatically account for splicing and complementing, and thus reduce user error during many tasks common in position-wise analysis (Fig. 3).Fig. 3

View Article: PubMed Central - PubMed

ABSTRACT

Background: Next-generation sequencing (NGS) informs many biological questions with unprecedented depth and nucleotide resolution. These assays have created a need for analytical tools that enable users to manipulate data nucleotide-by-nucleotide robustly and easily. Furthermore, because many NGS assays encode information jointly within multiple properties of read alignments ― for example, in ribosome profiling, the locations of ribosomes are jointly encoded in alignment coordinates and length ― analytical tools are often required to extract the biological meaning from the alignments before analysis. Many assay-specific pipelines exist for this purpose, but there remains a need for user-friendly, generalized, nucleotide-resolution tools that are not limited to specific experimental regimes or analytical workflows.

Results: Plastid is a Python library designed specifically for nucleotide-resolution analysis of genomics and NGS data. As such, Plastid is designed to extract assay-specific information from read alignments while retaining generality and extensibility to novel NGS assays. Plastid represents NGS and other biological data as arrays of values associated with genomic or transcriptomic positions, and contains configurable tools to convert data from a variety of sources to such arrays.

Results: Plastid also includes numerous tools to manipulate even discontinuous genomic features, such as spliced transcripts, with nucleotide precision. Plastid automatically handles conversion between genomic and feature-centric coordinates, accounting for splicing and strand, freeing users of burdensome accounting. Finally, Plastid’s data models use consistent and familiar biological idioms, enabling even beginners to develop sophisticated analytical workflows with minimal effort.

Conclusions: Plastid is a versatile toolkit that has been used to analyze data from multiple NGS assays, including RNA-seq, ribosome profiling, and DMS-seq. It forms the genomic engine of our ORF annotation tool, ORF-RATER, and is readily adapted to novel NGS assays. Examples, tutorials, and extensive documentation can be found at https://plastid.readthedocs.io.

No MeSH data available.