Limits...
Exploring the cellular basis of human disease through a large-scale mapping of deleterious genes to cell types.

Cornish AJ, Filippis I, David A, Sternberg MJ - Genome Med (2015)

Bottom Line: Each cell type found within the human body performs a diverse and unique set of functions, the disruption of which can lead to disease.The data set produced in this study represents the first large-scale mapping of diseases to the cell types in which they are manifested and will therefore be useful in the study of disease systems.Overall, we demonstrate that our approach links disease-associated genes to the phenotypes they produce, a key goal within systems medicine.

View Article: PubMed Central - PubMed

Affiliation: Department of Life Sciences, Imperial College London, Exhibition Road, London, SW7 2AZ, UK. a.cornish12@imperial.ac.uk.

ABSTRACT

Background: Each cell type found within the human body performs a diverse and unique set of functions, the disruption of which can lead to disease. However, there currently exists no systematic mapping between cell types and the diseases they can cause.

Methods: In this study, we integrate protein-protein interaction data with high-quality cell-type-specific gene expression data from the FANTOM5 project to build the largest collection of cell-type-specific interactomes created to date. We develop a novel method, called gene set compactness (GSC), that contrasts the relative positions of disease-associated genes across 73 cell-type-specific interactomes to map genes associated with 196 diseases to the cell types they affect. We conduct text-mining of the PubMed database to produce an independent resource of disease-associated cell types, which we use to validate our method.

Results: The GSC method successfully identifies known disease-cell-type associations, as well as highlighting associations that warrant further study. This includes mast cells and multiple sclerosis, a cell population currently being targeted in a multiple sclerosis phase 2 clinical trial. Furthermore, we build a cell-type-based diseasome using the cell types identified as manifesting each disease, offering insight into diseases linked through etiology.

Conclusions: The data set produced in this study represents the first large-scale mapping of diseases to the cell types in which they are manifested and will therefore be useful in the study of disease systems. Overall, we demonstrate that our approach links disease-associated genes to the phenotypes they produce, a key goal within systems medicine.

No MeSH data available.


Related in: MedlinePlus

Overview of the GSC, text-mining and GSO methods. (A) For the GSC method, percentile-normalized relative gene expression scores are integrated with PPI data to create interactomes. Permuted interactomes are created by permuting the expression scores. The compactness score of the disease-associated gene set is computed for each observed and permuted interactome and empirical P values produced by counting the proportion of permuted compactness scores less than the observed compactness score. (B) To complete the text-mining, diseases from DisGeNET and cell types from the FANTOM5 project were mapped to MeSH terms using a number of controlled vocabularies. These MeSH terms were then used to query PubMed and count the number of articles individually and co-mentioning terms. Fisher’s exact test was used to determine whether the number of co-mentioning articles is greater than expected by chance. (C) For the GSO method, percentile-normalized gene expression scores are used to create observed and permuted expression profiles. The mean expression score of the disease-associated gene set is then computed for the observed and permuted expression profiles. Empirical P values are computed by counting the numbers of permuted scores greater that each observed score
© Copyright Policy - OpenAccess
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4557825&req=5

Fig2: Overview of the GSC, text-mining and GSO methods. (A) For the GSC method, percentile-normalized relative gene expression scores are integrated with PPI data to create interactomes. Permuted interactomes are created by permuting the expression scores. The compactness score of the disease-associated gene set is computed for each observed and permuted interactome and empirical P values produced by counting the proportion of permuted compactness scores less than the observed compactness score. (B) To complete the text-mining, diseases from DisGeNET and cell types from the FANTOM5 project were mapped to MeSH terms using a number of controlled vocabularies. These MeSH terms were then used to query PubMed and count the number of articles individually and co-mentioning terms. Fisher’s exact test was used to determine whether the number of co-mentioning articles is greater than expected by chance. (C) For the GSO method, percentile-normalized gene expression scores are used to create observed and permuted expression profiles. The mean expression score of the disease-associated gene set is then computed for the observed and permuted expression profiles. Empirical P values are computed by counting the numbers of permuted scores greater that each observed score

Mentions: We introduce the compactness score [29] to identify the cell-type-specific interactomes within which sets of disease-associated genes are significantly more clustered than expected by chance (Fig. 2A) and thereby identify disease-manifesting cell types. The compactness score is defined as the mean distance between pairs of vertices in a set in a graph. The smaller the compactness score of a vertex set, the stronger the interactions between the vertices in the set. If a vertex set interacts more strongly than expected by chance, then the vertex set can be said to cluster [30].Fig. 2


Exploring the cellular basis of human disease through a large-scale mapping of deleterious genes to cell types.

Cornish AJ, Filippis I, David A, Sternberg MJ - Genome Med (2015)

Overview of the GSC, text-mining and GSO methods. (A) For the GSC method, percentile-normalized relative gene expression scores are integrated with PPI data to create interactomes. Permuted interactomes are created by permuting the expression scores. The compactness score of the disease-associated gene set is computed for each observed and permuted interactome and empirical P values produced by counting the proportion of permuted compactness scores less than the observed compactness score. (B) To complete the text-mining, diseases from DisGeNET and cell types from the FANTOM5 project were mapped to MeSH terms using a number of controlled vocabularies. These MeSH terms were then used to query PubMed and count the number of articles individually and co-mentioning terms. Fisher’s exact test was used to determine whether the number of co-mentioning articles is greater than expected by chance. (C) For the GSO method, percentile-normalized gene expression scores are used to create observed and permuted expression profiles. The mean expression score of the disease-associated gene set is then computed for the observed and permuted expression profiles. Empirical P values are computed by counting the numbers of permuted scores greater that each observed score
© Copyright Policy - OpenAccess
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4557825&req=5

Fig2: Overview of the GSC, text-mining and GSO methods. (A) For the GSC method, percentile-normalized relative gene expression scores are integrated with PPI data to create interactomes. Permuted interactomes are created by permuting the expression scores. The compactness score of the disease-associated gene set is computed for each observed and permuted interactome and empirical P values produced by counting the proportion of permuted compactness scores less than the observed compactness score. (B) To complete the text-mining, diseases from DisGeNET and cell types from the FANTOM5 project were mapped to MeSH terms using a number of controlled vocabularies. These MeSH terms were then used to query PubMed and count the number of articles individually and co-mentioning terms. Fisher’s exact test was used to determine whether the number of co-mentioning articles is greater than expected by chance. (C) For the GSO method, percentile-normalized gene expression scores are used to create observed and permuted expression profiles. The mean expression score of the disease-associated gene set is then computed for the observed and permuted expression profiles. Empirical P values are computed by counting the numbers of permuted scores greater that each observed score
Mentions: We introduce the compactness score [29] to identify the cell-type-specific interactomes within which sets of disease-associated genes are significantly more clustered than expected by chance (Fig. 2A) and thereby identify disease-manifesting cell types. The compactness score is defined as the mean distance between pairs of vertices in a set in a graph. The smaller the compactness score of a vertex set, the stronger the interactions between the vertices in the set. If a vertex set interacts more strongly than expected by chance, then the vertex set can be said to cluster [30].Fig. 2

Bottom Line: Each cell type found within the human body performs a diverse and unique set of functions, the disruption of which can lead to disease.The data set produced in this study represents the first large-scale mapping of diseases to the cell types in which they are manifested and will therefore be useful in the study of disease systems.Overall, we demonstrate that our approach links disease-associated genes to the phenotypes they produce, a key goal within systems medicine.

View Article: PubMed Central - PubMed

Affiliation: Department of Life Sciences, Imperial College London, Exhibition Road, London, SW7 2AZ, UK. a.cornish12@imperial.ac.uk.

ABSTRACT

Background: Each cell type found within the human body performs a diverse and unique set of functions, the disruption of which can lead to disease. However, there currently exists no systematic mapping between cell types and the diseases they can cause.

Methods: In this study, we integrate protein-protein interaction data with high-quality cell-type-specific gene expression data from the FANTOM5 project to build the largest collection of cell-type-specific interactomes created to date. We develop a novel method, called gene set compactness (GSC), that contrasts the relative positions of disease-associated genes across 73 cell-type-specific interactomes to map genes associated with 196 diseases to the cell types they affect. We conduct text-mining of the PubMed database to produce an independent resource of disease-associated cell types, which we use to validate our method.

Results: The GSC method successfully identifies known disease-cell-type associations, as well as highlighting associations that warrant further study. This includes mast cells and multiple sclerosis, a cell population currently being targeted in a multiple sclerosis phase 2 clinical trial. Furthermore, we build a cell-type-based diseasome using the cell types identified as manifesting each disease, offering insight into diseases linked through etiology.

Conclusions: The data set produced in this study represents the first large-scale mapping of diseases to the cell types in which they are manifested and will therefore be useful in the study of disease systems. Overall, we demonstrate that our approach links disease-associated genes to the phenotypes they produce, a key goal within systems medicine.

No MeSH data available.


Related in: MedlinePlus