Limits...
Exploiting single-cell expression to characterize co-expression replicability.

Crow M, Paul A, Ballouz S, Huang ZJ, Gillis J - Genome Biol. (2016)

Bottom Line: Many of the technical effects we identify are expression-level dependent, making expression level itself highly predictive of network topology.Technical properties of single-cell RNA-seq data create confounds in co-expression networks which can be identified and explicitly controlled for in any supervised analysis.This is useful both in improving co-expression performance and in characterizing single-cell data in generally applicable terms, permitting cross-laboratory comparison within a common framework.

View Article: PubMed Central - PubMed

Affiliation: Cold Spring Harbor Laboratory, One Bungtown Road, Cold Spring, Harbor, NY, 11724, USA.

ABSTRACT

Background: Co-expression networks have been a useful tool for functional genomics, providing important clues about the cellular and biochemical mechanisms that are active in normal and disease processes. However, co-expression analysis is often treated as a black box with results being hard to trace to their basis in the data. Here, we use both published and novel single-cell RNA sequencing (RNA-seq) data to understand fundamental drivers of gene-gene connectivity and replicability in co-expression networks.

Results: We perform the first major analysis of single-cell co-expression, sampling from 31 individual studies. Using neighbor voting in cross-validation, we find that single-cell network connectivity is less likely to overlap with known functions than co-expression derived from bulk data, with functional variation within cell types strongly resembling that also occurring across cell types. To identify features and analysis practices that contribute to this connectivity, we perform our own single-cell RNA-seq experiment of 126 cortical interneurons in an experimental design targeted to co-expression. By assessing network replicability, semantic similarity and overall functional connectivity, we identify technical factors influencing co-expression and suggest how they can be controlled for. Many of the technical effects we identify are expression-level dependent, making expression level itself highly predictive of network topology. We show this occurs generally through re-analysis of the BrainSpan RNA-seq data.

Conclusions: Technical properties of single-cell RNA-seq data create confounds in co-expression networks which can be identified and explicitly controlled for in any supervised analysis. This is useful both in improving co-expression performance and in characterizing single-cell data in generally applicable terms, permitting cross-laboratory comparison within a common framework.

No MeSH data available.


Node degree and network performance are predicted by expression level in the single-cell aggregate network. Top: GO slim AUROCs and predicted AUROCs based on node degree are plotted for single-cell and bulk aggregates (163 networks in each). Functional connectivity in both aggregates is dependent on node degree. Bottom: GO slim AUROCs and predicted AUROCs based on median gene expression are plotted. Single-cell aggregate performance is predicted by expression; however, there is no relationship between expression and bulk aggregate performance
© Copyright Policy - OpenAccess
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4862082&req=5

Fig5: Node degree and network performance are predicted by expression level in the single-cell aggregate network. Top: GO slim AUROCs and predicted AUROCs based on node degree are plotted for single-cell and bulk aggregates (163 networks in each). Functional connectivity in both aggregates is dependent on node degree. Bottom: GO slim AUROCs and predicted AUROCs based on median gene expression are plotted. Single-cell aggregate performance is predicted by expression; however, there is no relationship between expression and bulk aggregate performance

Mentions: Our meta-analytic aggregate networks have higher performance than any individual network, so to trace the functional impact of expression level more broadly than the synaptic gene set we focused on using the aggregate networks. Previous work from our lab has shown that functional connectivity in gene networks can be predicted from the node degree of functional genes, with high node degree genes being good candidates for many functions [24]. We assess this by using the node degree as a predictor for each gene function; we control for the role of node degree by making predictions using it alone (“Node degree performance”) and determining how much of a given GO group’s performance within the network could be attributable to this factor. Both the bulk and single-cell aggregate network performance showed a characteristic V-shaped dependency on node degree due to our use of signed networks (Fig. 5, top right panel). Within the scRNA-seq co-expression networks we can, again, trace this back to a dependency on expression level (Fig. 5, bottom left panel). However, this is not the case for the bulk RNA-seq aggregate (Fig. 5, bottom right panel), possibly because its higher performance means it is powered sufficiently to overcome single-study (or even single pipeline) technical artifacts and is therefore robust to weak expression level variation. Generalizing from this, because individual bulk studies are not as well powered as the aggregate networks, we might hypothesize that where performance variation exists, it may again derive from simple data features such as expression level. We tested this through a re-analysis of the BrainSpan data, where both expression level variation and functional specificity in co-expression have been previously identified.Fig. 5


Exploiting single-cell expression to characterize co-expression replicability.

Crow M, Paul A, Ballouz S, Huang ZJ, Gillis J - Genome Biol. (2016)

Node degree and network performance are predicted by expression level in the single-cell aggregate network. Top: GO slim AUROCs and predicted AUROCs based on node degree are plotted for single-cell and bulk aggregates (163 networks in each). Functional connectivity in both aggregates is dependent on node degree. Bottom: GO slim AUROCs and predicted AUROCs based on median gene expression are plotted. Single-cell aggregate performance is predicted by expression; however, there is no relationship between expression and bulk aggregate performance
© Copyright Policy - OpenAccess
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4862082&req=5

Fig5: Node degree and network performance are predicted by expression level in the single-cell aggregate network. Top: GO slim AUROCs and predicted AUROCs based on node degree are plotted for single-cell and bulk aggregates (163 networks in each). Functional connectivity in both aggregates is dependent on node degree. Bottom: GO slim AUROCs and predicted AUROCs based on median gene expression are plotted. Single-cell aggregate performance is predicted by expression; however, there is no relationship between expression and bulk aggregate performance
Mentions: Our meta-analytic aggregate networks have higher performance than any individual network, so to trace the functional impact of expression level more broadly than the synaptic gene set we focused on using the aggregate networks. Previous work from our lab has shown that functional connectivity in gene networks can be predicted from the node degree of functional genes, with high node degree genes being good candidates for many functions [24]. We assess this by using the node degree as a predictor for each gene function; we control for the role of node degree by making predictions using it alone (“Node degree performance”) and determining how much of a given GO group’s performance within the network could be attributable to this factor. Both the bulk and single-cell aggregate network performance showed a characteristic V-shaped dependency on node degree due to our use of signed networks (Fig. 5, top right panel). Within the scRNA-seq co-expression networks we can, again, trace this back to a dependency on expression level (Fig. 5, bottom left panel). However, this is not the case for the bulk RNA-seq aggregate (Fig. 5, bottom right panel), possibly because its higher performance means it is powered sufficiently to overcome single-study (or even single pipeline) technical artifacts and is therefore robust to weak expression level variation. Generalizing from this, because individual bulk studies are not as well powered as the aggregate networks, we might hypothesize that where performance variation exists, it may again derive from simple data features such as expression level. We tested this through a re-analysis of the BrainSpan data, where both expression level variation and functional specificity in co-expression have been previously identified.Fig. 5

Bottom Line: Many of the technical effects we identify are expression-level dependent, making expression level itself highly predictive of network topology.Technical properties of single-cell RNA-seq data create confounds in co-expression networks which can be identified and explicitly controlled for in any supervised analysis.This is useful both in improving co-expression performance and in characterizing single-cell data in generally applicable terms, permitting cross-laboratory comparison within a common framework.

View Article: PubMed Central - PubMed

Affiliation: Cold Spring Harbor Laboratory, One Bungtown Road, Cold Spring, Harbor, NY, 11724, USA.

ABSTRACT

Background: Co-expression networks have been a useful tool for functional genomics, providing important clues about the cellular and biochemical mechanisms that are active in normal and disease processes. However, co-expression analysis is often treated as a black box with results being hard to trace to their basis in the data. Here, we use both published and novel single-cell RNA sequencing (RNA-seq) data to understand fundamental drivers of gene-gene connectivity and replicability in co-expression networks.

Results: We perform the first major analysis of single-cell co-expression, sampling from 31 individual studies. Using neighbor voting in cross-validation, we find that single-cell network connectivity is less likely to overlap with known functions than co-expression derived from bulk data, with functional variation within cell types strongly resembling that also occurring across cell types. To identify features and analysis practices that contribute to this connectivity, we perform our own single-cell RNA-seq experiment of 126 cortical interneurons in an experimental design targeted to co-expression. By assessing network replicability, semantic similarity and overall functional connectivity, we identify technical factors influencing co-expression and suggest how they can be controlled for. Many of the technical effects we identify are expression-level dependent, making expression level itself highly predictive of network topology. We show this occurs generally through re-analysis of the BrainSpan RNA-seq data.

Conclusions: Technical properties of single-cell RNA-seq data create confounds in co-expression networks which can be identified and explicitly controlled for in any supervised analysis. This is useful both in improving co-expression performance and in characterizing single-cell data in generally applicable terms, permitting cross-laboratory comparison within a common framework.

No MeSH data available.