Limits...
THINK Back: KNowledge-based Interpretation of High Throughput data.

Farfán F, Ma J, Sartor MA, Michailidis G, Jagadish HV - BMC Bioinformatics (2012)

Bottom Line: The use of such techniques should lead to qualitatively superior results.The specific aim of this project is to develop computational techniques to generate a small number of biologically meaningful hypotheses based on observed results from high throughput microarray experiments, gene sequences, and next-generation sequences.Our methods perform a thorough and rigorous analysis of biological pathways, using complex factors such as the topology of the pathway graph and the frequency in which genes appear on different pathways, to provide more meaningful hypotheses to describe the biological phenomena captured by high throughput experiments, when compared to other existing methods that only consider partial information captured by biological pathways.

View Article: PubMed Central - HTML - PubMed

Affiliation: Computer Science and Engineering Department, University of Michigan, Ann Arbor, MI, USA. ffarfan@umich.edu

ABSTRACT
Results of high throughput experiments can be challenging to interpret. Current approaches have relied on bulk processing the set of expression levels, in conjunction with easily obtained external evidence, such as co-occurrence. While such techniques can be used to reason probabilistically, they are not designed to shed light on what any individual gene, or a network of genes acting together, may be doing. Our belief is that today we have the information extraction ability and the computational power to perform more sophisticated analyses that consider the individual situation of each gene. The use of such techniques should lead to qualitatively superior results. The specific aim of this project is to develop computational techniques to generate a small number of biologically meaningful hypotheses based on observed results from high throughput microarray experiments, gene sequences, and next-generation sequences. Through the use of relevant known biomedical knowledge, as represented in published literature and public databases, we can generate meaningful hypotheses that will aide biologists to interpret their experimental data. We are currently developing novel approaches that exploit the rich information encapsulated in biological pathway graphs. Our methods perform a thorough and rigorous analysis of biological pathways, using complex factors such as the topology of the pathway graph and the frequency in which genes appear on different pathways, to provide more meaningful hypotheses to describe the biological phenomena captured by high throughput experiments, when compared to other existing methods that only consider partial information captured by biological pathways.

Show MeSH
Example of density analysis on biological pathways. Two example pathways with differentially expressed genes appearing in different configurations. A pathway with differentially expressed genes appearing tightly-clustered in one portion of the graph is more significant than a pathway in which the differentially expressed genes appear spread out.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3375631&req=5

Figure 1: Example of density analysis on biological pathways. Two example pathways with differentially expressed genes appearing in different configurations. A pathway with differentially expressed genes appearing tightly-clustered in one portion of the graph is more significant than a pathway in which the differentially expressed genes appear spread out.

Mentions: Our assumption for the method proposed here is that a pathway with a closely-connected cluster of differentially expressed genes is more likely informative and relevant than a pathway which has the same number of differentially expressed genes spread out uniformly or randomly across the pathway. Figure 1 illustrates this idea intuitively: it presents two different configurations for an example pathway. Figure 1(a) shows differentially expressed genes spread out uniformly across the pathway; in contrast, Figure 1(b) shows the same number of differentially expressed genes, but clustered in one portion of the pathway, creating a tight cluster of connected genes. We can observe how the pathway is more clearly activated in Figure 1(b) than in Figure 1(a). We justify this assumption by observing that since pathways are often activated via sub-paths, one does not expect the expression levels of all genes to change in an activated pathway. This is partially because the activity level of some genes may change through a different mechanism, but also because some canonical pathways are defined in ways that involve more than one function. For example, the KEGG pathway for "Apoptosis" involves a sub-path leading to apoptosis and a sub-path leading to cell survival.


THINK Back: KNowledge-based Interpretation of High Throughput data.

Farfán F, Ma J, Sartor MA, Michailidis G, Jagadish HV - BMC Bioinformatics (2012)

Example of density analysis on biological pathways. Two example pathways with differentially expressed genes appearing in different configurations. A pathway with differentially expressed genes appearing tightly-clustered in one portion of the graph is more significant than a pathway in which the differentially expressed genes appear spread out.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3375631&req=5

Figure 1: Example of density analysis on biological pathways. Two example pathways with differentially expressed genes appearing in different configurations. A pathway with differentially expressed genes appearing tightly-clustered in one portion of the graph is more significant than a pathway in which the differentially expressed genes appear spread out.
Mentions: Our assumption for the method proposed here is that a pathway with a closely-connected cluster of differentially expressed genes is more likely informative and relevant than a pathway which has the same number of differentially expressed genes spread out uniformly or randomly across the pathway. Figure 1 illustrates this idea intuitively: it presents two different configurations for an example pathway. Figure 1(a) shows differentially expressed genes spread out uniformly across the pathway; in contrast, Figure 1(b) shows the same number of differentially expressed genes, but clustered in one portion of the pathway, creating a tight cluster of connected genes. We can observe how the pathway is more clearly activated in Figure 1(b) than in Figure 1(a). We justify this assumption by observing that since pathways are often activated via sub-paths, one does not expect the expression levels of all genes to change in an activated pathway. This is partially because the activity level of some genes may change through a different mechanism, but also because some canonical pathways are defined in ways that involve more than one function. For example, the KEGG pathway for "Apoptosis" involves a sub-path leading to apoptosis and a sub-path leading to cell survival.

Bottom Line: The use of such techniques should lead to qualitatively superior results.The specific aim of this project is to develop computational techniques to generate a small number of biologically meaningful hypotheses based on observed results from high throughput microarray experiments, gene sequences, and next-generation sequences.Our methods perform a thorough and rigorous analysis of biological pathways, using complex factors such as the topology of the pathway graph and the frequency in which genes appear on different pathways, to provide more meaningful hypotheses to describe the biological phenomena captured by high throughput experiments, when compared to other existing methods that only consider partial information captured by biological pathways.

View Article: PubMed Central - HTML - PubMed

Affiliation: Computer Science and Engineering Department, University of Michigan, Ann Arbor, MI, USA. ffarfan@umich.edu

ABSTRACT
Results of high throughput experiments can be challenging to interpret. Current approaches have relied on bulk processing the set of expression levels, in conjunction with easily obtained external evidence, such as co-occurrence. While such techniques can be used to reason probabilistically, they are not designed to shed light on what any individual gene, or a network of genes acting together, may be doing. Our belief is that today we have the information extraction ability and the computational power to perform more sophisticated analyses that consider the individual situation of each gene. The use of such techniques should lead to qualitatively superior results. The specific aim of this project is to develop computational techniques to generate a small number of biologically meaningful hypotheses based on observed results from high throughput microarray experiments, gene sequences, and next-generation sequences. Through the use of relevant known biomedical knowledge, as represented in published literature and public databases, we can generate meaningful hypotheses that will aide biologists to interpret their experimental data. We are currently developing novel approaches that exploit the rich information encapsulated in biological pathway graphs. Our methods perform a thorough and rigorous analysis of biological pathways, using complex factors such as the topology of the pathway graph and the frequency in which genes appear on different pathways, to provide more meaningful hypotheses to describe the biological phenomena captured by high throughput experiments, when compared to other existing methods that only consider partial information captured by biological pathways.

Show MeSH