Limits...
Systems level analysis of systemic sclerosis shows a network of immune and profibrotic pathways connected with genetic polymorphisms.

Mahoney JM, Taroni J, Martyanov V, Wood TA, Greene CS, Pioli PA, Hinchcliff ME, Whitfield ML - PLoS Comput. Biol. (2015)

Bottom Line: Here we identify the genes consistently associated with the intrinsic subsets across three independent cohorts, show the relationship between these genes using a gene-gene interaction network, and place the genetic risk loci in the context of the intrinsic subsets.We created a gene-gene interaction network of the conserved molecular features across the intrinsic subsets and analyzed their connections with SSc-associated genetic polymorphisms.The network also shows connections between these subset-specific genes and 30 SSc-associated polymorphic genes including STAT4, BLK, IRF7, NOTCH4, PLAUR, CSK, IRAK1, and several human leukocyte antigen (HLA) genes.

View Article: PubMed Central - PubMed

Affiliation: Department of Genetics, Geisel School of Medicine at Dartmouth, Hannover, New Hampshire, United States of America.

ABSTRACT
Systemic sclerosis (SSc) is a rare systemic autoimmune disease characterized by skin and organ fibrosis. The pathogenesis of SSc and its progression are poorly understood. The SSc intrinsic gene expression subsets (inflammatory, fibroproliferative, normal-like, and limited) are observed in multiple clinical cohorts of patients with SSc. Analysis of longitudinal skin biopsies suggests that a patient's subset assignment is stable over 6-12 months. Genetically, SSc is multi-factorial with many genetic risk loci for SSc generally and for specific clinical manifestations. Here we identify the genes consistently associated with the intrinsic subsets across three independent cohorts, show the relationship between these genes using a gene-gene interaction network, and place the genetic risk loci in the context of the intrinsic subsets. To identify gene expression modules common to three independent datasets from three different clinical centers, we developed a consensus clustering procedure based on mutual information of partitions, an information theory concept, and performed a meta-analysis of these genome-wide gene expression datasets. We created a gene-gene interaction network of the conserved molecular features across the intrinsic subsets and analyzed their connections with SSc-associated genetic polymorphisms. The network is composed of distinct, but interconnected, components related to interferon activation, M2 macrophages, adaptive immunity, extracellular matrix remodeling, and cell proliferation. The network shows extensive connections between the inflammatory- and fibroproliferative-specific genes. The network also shows connections between these subset-specific genes and 30 SSc-associated polymorphic genes including STAT4, BLK, IRF7, NOTCH4, PLAUR, CSK, IRAK1, and several human leukocyte antigen (HLA) genes. Our analyses suggest that the gene expression changes underlying the SSc subsets may be long-lived, but mechanistically interconnected and related to a patients underlying genetic risk.

Show MeSH

Related in: MedlinePlus

Information graph and consensus clusters for the MPH cohorts.(A) The information graph of the MPH cohorts is highly modular (cf. S2 Fig.), indicating approximate conservation of gene expression modules across datasets. The information graph is tripartite by construction, so a triangle in the graph necessarily connects modules across all three datasets. The triangles form communities of mutual edge sharing. Colored nodes and edges highlight four of these communities. The purple community contains modules that are up-regulated in the inflammatory subset (cf. panel B). The red community contains modules that are up-regulated in the fibroproliferative subset (cf. panel B). The cyan community contains modules that are enriched for keratinocyte-specific processes. The orange community contains modules that are enriched for fatty acid metabolism genes. The remaining communities (22 in all and not colored to avoid cluttering the display) are enriched primarily for housekeeping processes and are neither skin- nor disease-specific (see Table 3). (B) Modules from the communities were tested for their enrichment in the subsets. Each row corresponds to a triangle in the information graph and each column corresponds to a dataset. The black lines separate communities, e.g. all of the rows in the block marked “1” correspond triangles in community 1. The cells are colored according to whether the module was significantly differentially expressed in a subset with dark colors representing up-regulation and light colors representing down-regulation (Bonferroni-corrected Wilcoxon rank sum p-value p<0.05). We assessed statistical significance of modules within each dataset for each of the three diffuse SSc intrinsic subsets, as well as all SSc vs. healthy controls (Purple- Inflammatory, Red- Proliferation, Green- Normal-like, Blue- All SSc). Note the inflammatory up community (*) and the fibroproliferative up community (**). Note also that community 2 is significantly highly expressed in the inflammatory subset and lowly expressed in the proliferative subset in Milano only. Likewise, community 9 appears to be expressed at low levels in the inflammatory subset in Milano, but none of the other data sets.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4288710&req=5

pcbi-1004005-g003: Information graph and consensus clusters for the MPH cohorts.(A) The information graph of the MPH cohorts is highly modular (cf. S2 Fig.), indicating approximate conservation of gene expression modules across datasets. The information graph is tripartite by construction, so a triangle in the graph necessarily connects modules across all three datasets. The triangles form communities of mutual edge sharing. Colored nodes and edges highlight four of these communities. The purple community contains modules that are up-regulated in the inflammatory subset (cf. panel B). The red community contains modules that are up-regulated in the fibroproliferative subset (cf. panel B). The cyan community contains modules that are enriched for keratinocyte-specific processes. The orange community contains modules that are enriched for fatty acid metabolism genes. The remaining communities (22 in all and not colored to avoid cluttering the display) are enriched primarily for housekeeping processes and are neither skin- nor disease-specific (see Table 3). (B) Modules from the communities were tested for their enrichment in the subsets. Each row corresponds to a triangle in the information graph and each column corresponds to a dataset. The black lines separate communities, e.g. all of the rows in the block marked “1” correspond triangles in community 1. The cells are colored according to whether the module was significantly differentially expressed in a subset with dark colors representing up-regulation and light colors representing down-regulation (Bonferroni-corrected Wilcoxon rank sum p-value p<0.05). We assessed statistical significance of modules within each dataset for each of the three diffuse SSc intrinsic subsets, as well as all SSc vs. healthy controls (Purple- Inflammatory, Red- Proliferation, Green- Normal-like, Blue- All SSc). Note the inflammatory up community (*) and the fibroproliferative up community (**). Note also that community 2 is significantly highly expressed in the inflammatory subset and lowly expressed in the proliferative subset in Milano only. Likewise, community 9 appears to be expressed at low levels in the inflammatory subset in Milano, but none of the other data sets.

Mentions: To identify genes with conserved expression across multiple datasets, we developed a procedure called Mutual Information Consensus Clustering (MICC) that detects significant conservation of a piece of a module and groups these conserved modules into collections called communities, which are sets of modules with considerable mutual overlap between datasets. Each community is associated with a gene set; namely all genes that are annotated to a module in that community for each dataset. We call these gene sets consensus clusters. The basis of MICC is the concept of mutual information from information theory [14]. Specifically, we use mutual information of partitions (MIP), which is an information measure specific for partitions. MIP quantifies the amount of information one partition has about another; i.e. it measures the correlation of cluster labels across datasets. MICC identifies consensus clusters using MIP to build a module similarity network of significant module overlaps, which we call the information graph (Figs. 1B, 3A). Then MICC algorithmically identifies communities in that network (Fig. 1B; Fig. 3A, colored nodes). These communities are collections of modules that have substantial overlap among each other, and they represent nearly all of the mutual information between the genomic partitions. In this way, MICC extracts almost all of the available information present in the separate clusterings of individual datasets and reports the clusters that are conserved across the three cohorts (see Materials and Methods for a detailed description).


Systems level analysis of systemic sclerosis shows a network of immune and profibrotic pathways connected with genetic polymorphisms.

Mahoney JM, Taroni J, Martyanov V, Wood TA, Greene CS, Pioli PA, Hinchcliff ME, Whitfield ML - PLoS Comput. Biol. (2015)

Information graph and consensus clusters for the MPH cohorts.(A) The information graph of the MPH cohorts is highly modular (cf. S2 Fig.), indicating approximate conservation of gene expression modules across datasets. The information graph is tripartite by construction, so a triangle in the graph necessarily connects modules across all three datasets. The triangles form communities of mutual edge sharing. Colored nodes and edges highlight four of these communities. The purple community contains modules that are up-regulated in the inflammatory subset (cf. panel B). The red community contains modules that are up-regulated in the fibroproliferative subset (cf. panel B). The cyan community contains modules that are enriched for keratinocyte-specific processes. The orange community contains modules that are enriched for fatty acid metabolism genes. The remaining communities (22 in all and not colored to avoid cluttering the display) are enriched primarily for housekeeping processes and are neither skin- nor disease-specific (see Table 3). (B) Modules from the communities were tested for their enrichment in the subsets. Each row corresponds to a triangle in the information graph and each column corresponds to a dataset. The black lines separate communities, e.g. all of the rows in the block marked “1” correspond triangles in community 1. The cells are colored according to whether the module was significantly differentially expressed in a subset with dark colors representing up-regulation and light colors representing down-regulation (Bonferroni-corrected Wilcoxon rank sum p-value p<0.05). We assessed statistical significance of modules within each dataset for each of the three diffuse SSc intrinsic subsets, as well as all SSc vs. healthy controls (Purple- Inflammatory, Red- Proliferation, Green- Normal-like, Blue- All SSc). Note the inflammatory up community (*) and the fibroproliferative up community (**). Note also that community 2 is significantly highly expressed in the inflammatory subset and lowly expressed in the proliferative subset in Milano only. Likewise, community 9 appears to be expressed at low levels in the inflammatory subset in Milano, but none of the other data sets.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4288710&req=5

pcbi-1004005-g003: Information graph and consensus clusters for the MPH cohorts.(A) The information graph of the MPH cohorts is highly modular (cf. S2 Fig.), indicating approximate conservation of gene expression modules across datasets. The information graph is tripartite by construction, so a triangle in the graph necessarily connects modules across all three datasets. The triangles form communities of mutual edge sharing. Colored nodes and edges highlight four of these communities. The purple community contains modules that are up-regulated in the inflammatory subset (cf. panel B). The red community contains modules that are up-regulated in the fibroproliferative subset (cf. panel B). The cyan community contains modules that are enriched for keratinocyte-specific processes. The orange community contains modules that are enriched for fatty acid metabolism genes. The remaining communities (22 in all and not colored to avoid cluttering the display) are enriched primarily for housekeeping processes and are neither skin- nor disease-specific (see Table 3). (B) Modules from the communities were tested for their enrichment in the subsets. Each row corresponds to a triangle in the information graph and each column corresponds to a dataset. The black lines separate communities, e.g. all of the rows in the block marked “1” correspond triangles in community 1. The cells are colored according to whether the module was significantly differentially expressed in a subset with dark colors representing up-regulation and light colors representing down-regulation (Bonferroni-corrected Wilcoxon rank sum p-value p<0.05). We assessed statistical significance of modules within each dataset for each of the three diffuse SSc intrinsic subsets, as well as all SSc vs. healthy controls (Purple- Inflammatory, Red- Proliferation, Green- Normal-like, Blue- All SSc). Note the inflammatory up community (*) and the fibroproliferative up community (**). Note also that community 2 is significantly highly expressed in the inflammatory subset and lowly expressed in the proliferative subset in Milano only. Likewise, community 9 appears to be expressed at low levels in the inflammatory subset in Milano, but none of the other data sets.
Mentions: To identify genes with conserved expression across multiple datasets, we developed a procedure called Mutual Information Consensus Clustering (MICC) that detects significant conservation of a piece of a module and groups these conserved modules into collections called communities, which are sets of modules with considerable mutual overlap between datasets. Each community is associated with a gene set; namely all genes that are annotated to a module in that community for each dataset. We call these gene sets consensus clusters. The basis of MICC is the concept of mutual information from information theory [14]. Specifically, we use mutual information of partitions (MIP), which is an information measure specific for partitions. MIP quantifies the amount of information one partition has about another; i.e. it measures the correlation of cluster labels across datasets. MICC identifies consensus clusters using MIP to build a module similarity network of significant module overlaps, which we call the information graph (Figs. 1B, 3A). Then MICC algorithmically identifies communities in that network (Fig. 1B; Fig. 3A, colored nodes). These communities are collections of modules that have substantial overlap among each other, and they represent nearly all of the mutual information between the genomic partitions. In this way, MICC extracts almost all of the available information present in the separate clusterings of individual datasets and reports the clusters that are conserved across the three cohorts (see Materials and Methods for a detailed description).

Bottom Line: Here we identify the genes consistently associated with the intrinsic subsets across three independent cohorts, show the relationship between these genes using a gene-gene interaction network, and place the genetic risk loci in the context of the intrinsic subsets.We created a gene-gene interaction network of the conserved molecular features across the intrinsic subsets and analyzed their connections with SSc-associated genetic polymorphisms.The network also shows connections between these subset-specific genes and 30 SSc-associated polymorphic genes including STAT4, BLK, IRF7, NOTCH4, PLAUR, CSK, IRAK1, and several human leukocyte antigen (HLA) genes.

View Article: PubMed Central - PubMed

Affiliation: Department of Genetics, Geisel School of Medicine at Dartmouth, Hannover, New Hampshire, United States of America.

ABSTRACT
Systemic sclerosis (SSc) is a rare systemic autoimmune disease characterized by skin and organ fibrosis. The pathogenesis of SSc and its progression are poorly understood. The SSc intrinsic gene expression subsets (inflammatory, fibroproliferative, normal-like, and limited) are observed in multiple clinical cohorts of patients with SSc. Analysis of longitudinal skin biopsies suggests that a patient's subset assignment is stable over 6-12 months. Genetically, SSc is multi-factorial with many genetic risk loci for SSc generally and for specific clinical manifestations. Here we identify the genes consistently associated with the intrinsic subsets across three independent cohorts, show the relationship between these genes using a gene-gene interaction network, and place the genetic risk loci in the context of the intrinsic subsets. To identify gene expression modules common to three independent datasets from three different clinical centers, we developed a consensus clustering procedure based on mutual information of partitions, an information theory concept, and performed a meta-analysis of these genome-wide gene expression datasets. We created a gene-gene interaction network of the conserved molecular features across the intrinsic subsets and analyzed their connections with SSc-associated genetic polymorphisms. The network is composed of distinct, but interconnected, components related to interferon activation, M2 macrophages, adaptive immunity, extracellular matrix remodeling, and cell proliferation. The network shows extensive connections between the inflammatory- and fibroproliferative-specific genes. The network also shows connections between these subset-specific genes and 30 SSc-associated polymorphic genes including STAT4, BLK, IRF7, NOTCH4, PLAUR, CSK, IRAK1, and several human leukocyte antigen (HLA) genes. Our analyses suggest that the gene expression changes underlying the SSc subsets may be long-lived, but mechanistically interconnected and related to a patients underlying genetic risk.

Show MeSH
Related in: MedlinePlus