Limits...
Exploiting single-cell expression to characterize co-expression replicability.

Crow M, Paul A, Ballouz S, Huang ZJ, Gillis J - Genome Biol. (2016)

Bottom Line: Many of the technical effects we identify are expression-level dependent, making expression level itself highly predictive of network topology.Technical properties of single-cell RNA-seq data create confounds in co-expression networks which can be identified and explicitly controlled for in any supervised analysis.This is useful both in improving co-expression performance and in characterizing single-cell data in generally applicable terms, permitting cross-laboratory comparison within a common framework.

View Article: PubMed Central - PubMed

Affiliation: Cold Spring Harbor Laboratory, One Bungtown Road, Cold Spring, Harbor, NY, 11724, USA.

ABSTRACT

Background: Co-expression networks have been a useful tool for functional genomics, providing important clues about the cellular and biochemical mechanisms that are active in normal and disease processes. However, co-expression analysis is often treated as a black box with results being hard to trace to their basis in the data. Here, we use both published and novel single-cell RNA sequencing (RNA-seq) data to understand fundamental drivers of gene-gene connectivity and replicability in co-expression networks.

Results: We perform the first major analysis of single-cell co-expression, sampling from 31 individual studies. Using neighbor voting in cross-validation, we find that single-cell network connectivity is less likely to overlap with known functions than co-expression derived from bulk data, with functional variation within cell types strongly resembling that also occurring across cell types. To identify features and analysis practices that contribute to this connectivity, we perform our own single-cell RNA-seq experiment of 126 cortical interneurons in an experimental design targeted to co-expression. By assessing network replicability, semantic similarity and overall functional connectivity, we identify technical factors influencing co-expression and suggest how they can be controlled for. Many of the technical effects we identify are expression-level dependent, making expression level itself highly predictive of network topology. We show this occurs generally through re-analysis of the BrainSpan RNA-seq data.

Conclusions: Technical properties of single-cell RNA-seq data create confounds in co-expression networks which can be identified and explicitly controlled for in any supervised analysis. This is useful both in improving co-expression performance and in characterizing single-cell data in generally applicable terms, permitting cross-laboratory comparison within a common framework.

No MeSH data available.


What lies beneath: co-expression can reflect different combinations of cell-state or compositional variation. Each panel shows a different scenario in which cell state and composition affect the expression of two genes (A and B), yielding different types of co-expression. Two cell types are colored in red and blue. In the top panel, both cell types have state-dependent variation that causes co-expression within each (r ~ 0.75). In addition, there is co-expression due to compositional variation (r ~ 0.75). In the bottom left panel only compositional variation is apparent (r ~ 0.65), there is no relationship between gene A and gene B within the cell types (r ~ 0). This is the opposite in the bottom right panel. Here, there is only variation within the cell types (r ~ 0.95) but no compositional effect across cell types (r ~ 0). The exact value the compositional correlations take would vary in real data since combinations of the underlying cell types would fill in intermediate points, but the three cases would still occur as described; other possibilities due to noise or other complex scenarios (e.g. Yule-Simpson effect) are also possible
© Copyright Policy - OpenAccess
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4862082&req=5

Fig1: What lies beneath: co-expression can reflect different combinations of cell-state or compositional variation. Each panel shows a different scenario in which cell state and composition affect the expression of two genes (A and B), yielding different types of co-expression. Two cell types are colored in red and blue. In the top panel, both cell types have state-dependent variation that causes co-expression within each (r ~ 0.75). In addition, there is co-expression due to compositional variation (r ~ 0.75). In the bottom left panel only compositional variation is apparent (r ~ 0.65), there is no relationship between gene A and gene B within the cell types (r ~ 0). This is the opposite in the bottom right panel. Here, there is only variation within the cell types (r ~ 0.95) but no compositional effect across cell types (r ~ 0). The exact value the compositional correlations take would vary in real data since combinations of the underlying cell types would fill in intermediate points, but the three cases would still occur as described; other possibilities due to noise or other complex scenarios (e.g. Yule-Simpson effect) are also possible

Mentions: Biology has increasingly looked to relationships between genes to explain phenotypic variability. One way to determine these functional groupings is from transcriptional data; genes with similar expression patterns are thought to be involved in the same cellular pathway or function [1]. Networks derived from expression data have become an important resource in the interpretation of gene function [2] and disease [3]. Co-expression networks are built from an assessment of similarity, often correlation, between gene pairs across sources of variation (see Box 1 for more detail). For bulk RNA sequencing (RNA-seq) and microarray data, the sources of variation are manifold, and pinpointing driving factors has been challenging. For example, co-expression signals may be interpreted as reflecting compositional differences, such as varying proportions of underlying cell types within a tissue, or cell-state differences, like the circadian rhythm, or some combination of both, with data quality and technical variation further complicating interpretation (see Fig. 1).Fig. 1


Exploiting single-cell expression to characterize co-expression replicability.

Crow M, Paul A, Ballouz S, Huang ZJ, Gillis J - Genome Biol. (2016)

What lies beneath: co-expression can reflect different combinations of cell-state or compositional variation. Each panel shows a different scenario in which cell state and composition affect the expression of two genes (A and B), yielding different types of co-expression. Two cell types are colored in red and blue. In the top panel, both cell types have state-dependent variation that causes co-expression within each (r ~ 0.75). In addition, there is co-expression due to compositional variation (r ~ 0.75). In the bottom left panel only compositional variation is apparent (r ~ 0.65), there is no relationship between gene A and gene B within the cell types (r ~ 0). This is the opposite in the bottom right panel. Here, there is only variation within the cell types (r ~ 0.95) but no compositional effect across cell types (r ~ 0). The exact value the compositional correlations take would vary in real data since combinations of the underlying cell types would fill in intermediate points, but the three cases would still occur as described; other possibilities due to noise or other complex scenarios (e.g. Yule-Simpson effect) are also possible
© Copyright Policy - OpenAccess
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4862082&req=5

Fig1: What lies beneath: co-expression can reflect different combinations of cell-state or compositional variation. Each panel shows a different scenario in which cell state and composition affect the expression of two genes (A and B), yielding different types of co-expression. Two cell types are colored in red and blue. In the top panel, both cell types have state-dependent variation that causes co-expression within each (r ~ 0.75). In addition, there is co-expression due to compositional variation (r ~ 0.75). In the bottom left panel only compositional variation is apparent (r ~ 0.65), there is no relationship between gene A and gene B within the cell types (r ~ 0). This is the opposite in the bottom right panel. Here, there is only variation within the cell types (r ~ 0.95) but no compositional effect across cell types (r ~ 0). The exact value the compositional correlations take would vary in real data since combinations of the underlying cell types would fill in intermediate points, but the three cases would still occur as described; other possibilities due to noise or other complex scenarios (e.g. Yule-Simpson effect) are also possible
Mentions: Biology has increasingly looked to relationships between genes to explain phenotypic variability. One way to determine these functional groupings is from transcriptional data; genes with similar expression patterns are thought to be involved in the same cellular pathway or function [1]. Networks derived from expression data have become an important resource in the interpretation of gene function [2] and disease [3]. Co-expression networks are built from an assessment of similarity, often correlation, between gene pairs across sources of variation (see Box 1 for more detail). For bulk RNA sequencing (RNA-seq) and microarray data, the sources of variation are manifold, and pinpointing driving factors has been challenging. For example, co-expression signals may be interpreted as reflecting compositional differences, such as varying proportions of underlying cell types within a tissue, or cell-state differences, like the circadian rhythm, or some combination of both, with data quality and technical variation further complicating interpretation (see Fig. 1).Fig. 1

Bottom Line: Many of the technical effects we identify are expression-level dependent, making expression level itself highly predictive of network topology.Technical properties of single-cell RNA-seq data create confounds in co-expression networks which can be identified and explicitly controlled for in any supervised analysis.This is useful both in improving co-expression performance and in characterizing single-cell data in generally applicable terms, permitting cross-laboratory comparison within a common framework.

View Article: PubMed Central - PubMed

Affiliation: Cold Spring Harbor Laboratory, One Bungtown Road, Cold Spring, Harbor, NY, 11724, USA.

ABSTRACT

Background: Co-expression networks have been a useful tool for functional genomics, providing important clues about the cellular and biochemical mechanisms that are active in normal and disease processes. However, co-expression analysis is often treated as a black box with results being hard to trace to their basis in the data. Here, we use both published and novel single-cell RNA sequencing (RNA-seq) data to understand fundamental drivers of gene-gene connectivity and replicability in co-expression networks.

Results: We perform the first major analysis of single-cell co-expression, sampling from 31 individual studies. Using neighbor voting in cross-validation, we find that single-cell network connectivity is less likely to overlap with known functions than co-expression derived from bulk data, with functional variation within cell types strongly resembling that also occurring across cell types. To identify features and analysis practices that contribute to this connectivity, we perform our own single-cell RNA-seq experiment of 126 cortical interneurons in an experimental design targeted to co-expression. By assessing network replicability, semantic similarity and overall functional connectivity, we identify technical factors influencing co-expression and suggest how they can be controlled for. Many of the technical effects we identify are expression-level dependent, making expression level itself highly predictive of network topology. We show this occurs generally through re-analysis of the BrainSpan RNA-seq data.

Conclusions: Technical properties of single-cell RNA-seq data create confounds in co-expression networks which can be identified and explicitly controlled for in any supervised analysis. This is useful both in improving co-expression performance and in characterizing single-cell data in generally applicable terms, permitting cross-laboratory comparison within a common framework.

No MeSH data available.