Limits...
Reverse causal reasoning: applying qualitative causal knowledge to the interpretation of high-throughput data.

Catlett NL, Bargnesi AJ, Ungerer S, Seagaran T, Ladd W, Elliston KO, Pratt D - BMC Bioinformatics (2013)

Bottom Line: These small directed networks are generated from a knowledge base of literature-curated qualitative biological cause-and-effect relationships expressed as a network.We present the Whistle analyses for three transcriptomic data sets using a publically available knowledge base.The mechanisms inferred by Whistle are consistent with the expected biology for each data set.

View Article: PubMed Central - HTML - PubMed

Affiliation: Selventa, One Alewife Center, Cambridge, MA 02140, USA. ncatlett@selventa.com.

ABSTRACT

Background: Gene expression profiling and other genome-scale measurement technologies provide comprehensive information about molecular changes resulting from a chemical or genetic perturbation, or disease state. A critical challenge is the development of methods to interpret these large-scale data sets to identify specific biological mechanisms that can provide experimentally verifiable hypotheses and lead to the understanding of disease and drug action.

Results: We present a detailed description of Reverse Causal Reasoning (RCR), a reverse engineering methodology to infer mechanistic hypotheses from molecular profiling data. This methodology requires prior knowledge in the form of small networks that causally link a key upstream controller node representing a biological mechanism to downstream measurable quantities. These small directed networks are generated from a knowledge base of literature-curated qualitative biological cause-and-effect relationships expressed as a network. The small mechanism networks are evaluated as hypotheses to explain observed differential measurements. We provide a simple implementation of this methodology, Whistle, specifically geared towards the analysis of gene expression data and using prior knowledge expressed in Biological Expression Language (BEL). We present the Whistle analyses for three transcriptomic data sets using a publically available knowledge base. The mechanisms inferred by Whistle are consistent with the expected biology for each data set.

Conclusions: Reverse Causal Reasoning yields mechanistic insights to the interpretation of gene expression profiling data that are distinct from and complementary to the results of analyses using ontology or pathway gene sets. This reverse engineering algorithm provides an evidence-driven approach to the development of models of disease, drug action, and drug toxicity.

Show MeSH

Related in: MedlinePlus

Scored HYP example. The HYP with the upstream node bp(GO:“response to endoplasmic reticulum stress”), scored for the E-MEXP-1755 high fat diet data set. This network contains 27 measured RNA abundance nodes (possible), represented as ovals coloured by differential expression (red – significantly increased, green – significantly decreased, grey – no significant change). A total of seven differentially expressed RNAs mapped to the network (observed), including six supporting increased mechanism activity (correct) and one supporting decreased activity (contra, marked with an ‘X’ on the edge).
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4222496&req=5

Figure 3: Scored HYP example. The HYP with the upstream node bp(GO:“response to endoplasmic reticulum stress”), scored for the E-MEXP-1755 high fat diet data set. This network contains 27 measured RNA abundance nodes (possible), represented as ovals coloured by differential expression (red – significantly increased, green – significantly decreased, grey – no significant change). A total of seven differentially expressed RNAs mapped to the network (observed), including six supporting increased mechanism activity (correct) and one supporting decreased activity (contra, marked with an ‘X’ on the edge).

Mentions: Of the 606 HYPs evaluated from the mouse Large Corpus KAM, 13 met the standard richness and concordance p-value significance thresholds of 0.1 (Table 3) (see Randomized data sets for threshold selection). For example, the HYP bp(GO:“response to endoplasmic reticulum stress”) is inferred to be significantly increased for the high fat data set (Figure 3). This mechanism, representing the biological process defined by the GO term ‘response to endoplasmic reticulum stress’, is causally upstream from 27 measured RNA abundances in the mouse-orthologized Large Corpus KAM (possible). Of the 193 significantly increased or decreased RNA abundance nodes resulting from high fat diet, seven map to the bp(GO:“response to endoplasmic reticulum stress”) HYP, representing a significant enrichment in endoplasmic reticulum stress-regulated RNA abundance nodes (richness p = 3.5E-6). Of these seven, six are in a direction consistent with increased response to endoplasmic reticulum stress, and one is consistent with decreased response. Thus, the direction increased is assigned to the bp(GO:“response to endoplasmic reticulum stress”) HYP. The concordance p-value, which evaluates the directions of the observed states of the downstream nodes against the predictions made by the HYP, is 6.3E-2, supporting the inference of increased bp(GO:“response to endoplasmic reticulum stress”).


Reverse causal reasoning: applying qualitative causal knowledge to the interpretation of high-throughput data.

Catlett NL, Bargnesi AJ, Ungerer S, Seagaran T, Ladd W, Elliston KO, Pratt D - BMC Bioinformatics (2013)

Scored HYP example. The HYP with the upstream node bp(GO:“response to endoplasmic reticulum stress”), scored for the E-MEXP-1755 high fat diet data set. This network contains 27 measured RNA abundance nodes (possible), represented as ovals coloured by differential expression (red – significantly increased, green – significantly decreased, grey – no significant change). A total of seven differentially expressed RNAs mapped to the network (observed), including six supporting increased mechanism activity (correct) and one supporting decreased activity (contra, marked with an ‘X’ on the edge).
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4222496&req=5

Figure 3: Scored HYP example. The HYP with the upstream node bp(GO:“response to endoplasmic reticulum stress”), scored for the E-MEXP-1755 high fat diet data set. This network contains 27 measured RNA abundance nodes (possible), represented as ovals coloured by differential expression (red – significantly increased, green – significantly decreased, grey – no significant change). A total of seven differentially expressed RNAs mapped to the network (observed), including six supporting increased mechanism activity (correct) and one supporting decreased activity (contra, marked with an ‘X’ on the edge).
Mentions: Of the 606 HYPs evaluated from the mouse Large Corpus KAM, 13 met the standard richness and concordance p-value significance thresholds of 0.1 (Table 3) (see Randomized data sets for threshold selection). For example, the HYP bp(GO:“response to endoplasmic reticulum stress”) is inferred to be significantly increased for the high fat data set (Figure 3). This mechanism, representing the biological process defined by the GO term ‘response to endoplasmic reticulum stress’, is causally upstream from 27 measured RNA abundances in the mouse-orthologized Large Corpus KAM (possible). Of the 193 significantly increased or decreased RNA abundance nodes resulting from high fat diet, seven map to the bp(GO:“response to endoplasmic reticulum stress”) HYP, representing a significant enrichment in endoplasmic reticulum stress-regulated RNA abundance nodes (richness p = 3.5E-6). Of these seven, six are in a direction consistent with increased response to endoplasmic reticulum stress, and one is consistent with decreased response. Thus, the direction increased is assigned to the bp(GO:“response to endoplasmic reticulum stress”) HYP. The concordance p-value, which evaluates the directions of the observed states of the downstream nodes against the predictions made by the HYP, is 6.3E-2, supporting the inference of increased bp(GO:“response to endoplasmic reticulum stress”).

Bottom Line: These small directed networks are generated from a knowledge base of literature-curated qualitative biological cause-and-effect relationships expressed as a network.We present the Whistle analyses for three transcriptomic data sets using a publically available knowledge base.The mechanisms inferred by Whistle are consistent with the expected biology for each data set.

View Article: PubMed Central - HTML - PubMed

Affiliation: Selventa, One Alewife Center, Cambridge, MA 02140, USA. ncatlett@selventa.com.

ABSTRACT

Background: Gene expression profiling and other genome-scale measurement technologies provide comprehensive information about molecular changes resulting from a chemical or genetic perturbation, or disease state. A critical challenge is the development of methods to interpret these large-scale data sets to identify specific biological mechanisms that can provide experimentally verifiable hypotheses and lead to the understanding of disease and drug action.

Results: We present a detailed description of Reverse Causal Reasoning (RCR), a reverse engineering methodology to infer mechanistic hypotheses from molecular profiling data. This methodology requires prior knowledge in the form of small networks that causally link a key upstream controller node representing a biological mechanism to downstream measurable quantities. These small directed networks are generated from a knowledge base of literature-curated qualitative biological cause-and-effect relationships expressed as a network. The small mechanism networks are evaluated as hypotheses to explain observed differential measurements. We provide a simple implementation of this methodology, Whistle, specifically geared towards the analysis of gene expression data and using prior knowledge expressed in Biological Expression Language (BEL). We present the Whistle analyses for three transcriptomic data sets using a publically available knowledge base. The mechanisms inferred by Whistle are consistent with the expected biology for each data set.

Conclusions: Reverse Causal Reasoning yields mechanistic insights to the interpretation of gene expression profiling data that are distinct from and complementary to the results of analyses using ontology or pathway gene sets. This reverse engineering algorithm provides an evidence-driven approach to the development of models of disease, drug action, and drug toxicity.

Show MeSH
Related in: MedlinePlus