Limits...
MMP1 bimodal expression and differential response to inflammatory mediators is linked to promoter polymorphisms.

Affara M, Dunmore BJ, Sanders DA, Johnson N, Print CG, Charnock-Jones DS - BMC Genomics (2011)

Bottom Line: Identifying the functional importance of the millions of single nucleotide polymorphisms (SNPs) in the human genome is a difficult challenge.In this study, we used a novel but straightforward method based on agglomerative hierarchical clustering to identify bimodally expressed transcripts in human umbilical vein endothelial cell (HUVEC) microarray data from 15 individuals.We describe a simple method to identify putative bimodally expressed RNAs from transcriptome data that is effective yet easy for non-statisticians to understand and use.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Pathology, University of Cambridge, Tennis Court Road, Cambridge, CB2 1QP, UK.

ABSTRACT

Background: Identifying the functional importance of the millions of single nucleotide polymorphisms (SNPs) in the human genome is a difficult challenge. Therefore, a reverse strategy, which identifies functionally important SNPs by virtue of the bimodal abundance across the human population of the SNP-related mRNAs will be useful. Those mRNA transcripts that are expressed at two distinct abundances in proportion to SNP allele frequency may warrant further study. Matrix metalloproteinase 1 (MMP1) is important in both normal development and in numerous pathologies. Although much research has been conducted to investigate the expression of MMP1 in many different cell types and conditions, the regulation of its expression is still not fully understood.

Results: In this study, we used a novel but straightforward method based on agglomerative hierarchical clustering to identify bimodally expressed transcripts in human umbilical vein endothelial cell (HUVEC) microarray data from 15 individuals. We found that MMP1 mRNA abundance was bimodally distributed in un-treated HUVECs and showed a bimodal response to inflammatory mediator treatment. RT-PCR and MMP1 activity assays confirmed the bimodal regulation and DNA sequencing of 69 individuals identified an MMP1 gene promoter polymorphism that segregated precisely with the MMP1 bimodal expression. Chromatin immunoprecipitation (ChIP) experiments indicated that the transcription factors (TFs) ETS1, ETS2 and GATA3, bind to the MMP1 promoter in the region of this polymorphism and may contribute to the bimodal expression.

Conclusions: We describe a simple method to identify putative bimodally expressed RNAs from transcriptome data that is effective yet easy for non-statisticians to understand and use. This method identified bimodal endothelial cell expression of MMP1, which appears to be biologically significant with implications for inflammatory disease. (271 Words).

Show MeSH

Related in: MedlinePlus

Flow diagram of method to identify bimodally expressed transcripts from expression data. (A) Transcript abundance is quantified by microarray or RNAseq techniques. (B) On a transcript-by-transcript basis, agglomerative clustering across the dataset is carried out. The algorithm starts by assigning the same number of clusters as individuals (in this example 10 clusters were assigned since there are 10 individuals). The clusters are then progressively merged by combining the two most similar clusters, using Wards method to calculate the distance between clusters and Euclidian distance to calculate dissimilarities between the individuals. The distances between the merging clusters are recorded by the algorithm as branch "heights". The height values at either side of the dendrogram are removed to exclude transcripts that falsely appear to be bimodally expressed due to a single outlying individual. The maximum remaining branch height value (indicated by the red arrow) is identified for each transcript, which represents the greatest distance between the any two clusters of individuals, and is used a surrogate marker for the degree of bimodal expression for that particular transcript. (C) To estimate the probability of transcripts appearing to be bimodally expressed due to chance alone, for each transcript we make a maximum likelihood estimate of the parameters of the distribution of this transcript's abundance across the population from which the individuals being studied have been drawn. We use these parameters to generate 10,000 simulated datasets, each of which is clustered as described in (B) above. (D) In the 10,000 clusters formed from the bootstrapped data sets for this transcript, we identify how commonly the largest distance between clusters ≥ the largest distance between clusters in the actual data set. This information is shown graphically and is used generate an empirical p-value as an estimate of type I error rate.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3040839&req=5

Figure 1: Flow diagram of method to identify bimodally expressed transcripts from expression data. (A) Transcript abundance is quantified by microarray or RNAseq techniques. (B) On a transcript-by-transcript basis, agglomerative clustering across the dataset is carried out. The algorithm starts by assigning the same number of clusters as individuals (in this example 10 clusters were assigned since there are 10 individuals). The clusters are then progressively merged by combining the two most similar clusters, using Wards method to calculate the distance between clusters and Euclidian distance to calculate dissimilarities between the individuals. The distances between the merging clusters are recorded by the algorithm as branch "heights". The height values at either side of the dendrogram are removed to exclude transcripts that falsely appear to be bimodally expressed due to a single outlying individual. The maximum remaining branch height value (indicated by the red arrow) is identified for each transcript, which represents the greatest distance between the any two clusters of individuals, and is used a surrogate marker for the degree of bimodal expression for that particular transcript. (C) To estimate the probability of transcripts appearing to be bimodally expressed due to chance alone, for each transcript we make a maximum likelihood estimate of the parameters of the distribution of this transcript's abundance across the population from which the individuals being studied have been drawn. We use these parameters to generate 10,000 simulated datasets, each of which is clustered as described in (B) above. (D) In the 10,000 clusters formed from the bootstrapped data sets for this transcript, we identify how commonly the largest distance between clusters ≥ the largest distance between clusters in the actual data set. This information is shown graphically and is used generate an empirical p-value as an estimate of type I error rate.

Mentions: Bimodally or multimodally expressed mRNA transcripts were defined as those transcripts for which two or more distinct populations of expression values were observed among a set of individuals. To identify and visualise bimodally expressed transcripts, we devised a simple algorithm (written as a script in the statistical language 'R'; Additional File 1) based on unsupervised agglomerative hierarchical clustering. The algorithm can be used as either a simple 'R' script, or for use in graphical user interface it can be supplied as a GenePattern module on request (http://www.broadinstitute.org/cancer/software/genepattern/). It is illustrated schematically in Figure 1 and described in the methods section. Briefly, on a transcript-by-transcript basis, agglomerative hierarchical clustering across the dataset was carried out. The maximum cluster branch height identified for each transcript was approximately proportional to the greatest distance between the any two clusters of individuals, and is used here a surrogate marker for the degree of bimodal expression. To estimate the probability of transcripts appearing to be bimodally expressed due to chance alone we used a parametric bootstrapping method. Related methods where trees are constructed from re-sampled data have been used previously to assess the reliability of clusters in gene expression data [28]. As is often the case with microarray transcript abundance data, our log-transformed data approximated a normal distribution. Therefore, for each transcript we made a maximum likelihood estimate of the parameters of a normally distributed population from which the sample of individuals being studied may have been drawn. These parameters (mean and standard deviation) were then used to generate 10,000 simulated datasets for the transcript, each of which was clustered as described above. From the 10,000 clustering results we identified how frequently the largest distance between clusters ≥ the largest distance between clusters in the actual data set. This information is used generate an empirical p-value as an estimate of type I error rate.


MMP1 bimodal expression and differential response to inflammatory mediators is linked to promoter polymorphisms.

Affara M, Dunmore BJ, Sanders DA, Johnson N, Print CG, Charnock-Jones DS - BMC Genomics (2011)

Flow diagram of method to identify bimodally expressed transcripts from expression data. (A) Transcript abundance is quantified by microarray or RNAseq techniques. (B) On a transcript-by-transcript basis, agglomerative clustering across the dataset is carried out. The algorithm starts by assigning the same number of clusters as individuals (in this example 10 clusters were assigned since there are 10 individuals). The clusters are then progressively merged by combining the two most similar clusters, using Wards method to calculate the distance between clusters and Euclidian distance to calculate dissimilarities between the individuals. The distances between the merging clusters are recorded by the algorithm as branch "heights". The height values at either side of the dendrogram are removed to exclude transcripts that falsely appear to be bimodally expressed due to a single outlying individual. The maximum remaining branch height value (indicated by the red arrow) is identified for each transcript, which represents the greatest distance between the any two clusters of individuals, and is used a surrogate marker for the degree of bimodal expression for that particular transcript. (C) To estimate the probability of transcripts appearing to be bimodally expressed due to chance alone, for each transcript we make a maximum likelihood estimate of the parameters of the distribution of this transcript's abundance across the population from which the individuals being studied have been drawn. We use these parameters to generate 10,000 simulated datasets, each of which is clustered as described in (B) above. (D) In the 10,000 clusters formed from the bootstrapped data sets for this transcript, we identify how commonly the largest distance between clusters ≥ the largest distance between clusters in the actual data set. This information is shown graphically and is used generate an empirical p-value as an estimate of type I error rate.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3040839&req=5

Figure 1: Flow diagram of method to identify bimodally expressed transcripts from expression data. (A) Transcript abundance is quantified by microarray or RNAseq techniques. (B) On a transcript-by-transcript basis, agglomerative clustering across the dataset is carried out. The algorithm starts by assigning the same number of clusters as individuals (in this example 10 clusters were assigned since there are 10 individuals). The clusters are then progressively merged by combining the two most similar clusters, using Wards method to calculate the distance between clusters and Euclidian distance to calculate dissimilarities between the individuals. The distances between the merging clusters are recorded by the algorithm as branch "heights". The height values at either side of the dendrogram are removed to exclude transcripts that falsely appear to be bimodally expressed due to a single outlying individual. The maximum remaining branch height value (indicated by the red arrow) is identified for each transcript, which represents the greatest distance between the any two clusters of individuals, and is used a surrogate marker for the degree of bimodal expression for that particular transcript. (C) To estimate the probability of transcripts appearing to be bimodally expressed due to chance alone, for each transcript we make a maximum likelihood estimate of the parameters of the distribution of this transcript's abundance across the population from which the individuals being studied have been drawn. We use these parameters to generate 10,000 simulated datasets, each of which is clustered as described in (B) above. (D) In the 10,000 clusters formed from the bootstrapped data sets for this transcript, we identify how commonly the largest distance between clusters ≥ the largest distance between clusters in the actual data set. This information is shown graphically and is used generate an empirical p-value as an estimate of type I error rate.
Mentions: Bimodally or multimodally expressed mRNA transcripts were defined as those transcripts for which two or more distinct populations of expression values were observed among a set of individuals. To identify and visualise bimodally expressed transcripts, we devised a simple algorithm (written as a script in the statistical language 'R'; Additional File 1) based on unsupervised agglomerative hierarchical clustering. The algorithm can be used as either a simple 'R' script, or for use in graphical user interface it can be supplied as a GenePattern module on request (http://www.broadinstitute.org/cancer/software/genepattern/). It is illustrated schematically in Figure 1 and described in the methods section. Briefly, on a transcript-by-transcript basis, agglomerative hierarchical clustering across the dataset was carried out. The maximum cluster branch height identified for each transcript was approximately proportional to the greatest distance between the any two clusters of individuals, and is used here a surrogate marker for the degree of bimodal expression. To estimate the probability of transcripts appearing to be bimodally expressed due to chance alone we used a parametric bootstrapping method. Related methods where trees are constructed from re-sampled data have been used previously to assess the reliability of clusters in gene expression data [28]. As is often the case with microarray transcript abundance data, our log-transformed data approximated a normal distribution. Therefore, for each transcript we made a maximum likelihood estimate of the parameters of a normally distributed population from which the sample of individuals being studied may have been drawn. These parameters (mean and standard deviation) were then used to generate 10,000 simulated datasets for the transcript, each of which was clustered as described above. From the 10,000 clustering results we identified how frequently the largest distance between clusters ≥ the largest distance between clusters in the actual data set. This information is used generate an empirical p-value as an estimate of type I error rate.

Bottom Line: Identifying the functional importance of the millions of single nucleotide polymorphisms (SNPs) in the human genome is a difficult challenge.In this study, we used a novel but straightforward method based on agglomerative hierarchical clustering to identify bimodally expressed transcripts in human umbilical vein endothelial cell (HUVEC) microarray data from 15 individuals.We describe a simple method to identify putative bimodally expressed RNAs from transcriptome data that is effective yet easy for non-statisticians to understand and use.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Pathology, University of Cambridge, Tennis Court Road, Cambridge, CB2 1QP, UK.

ABSTRACT

Background: Identifying the functional importance of the millions of single nucleotide polymorphisms (SNPs) in the human genome is a difficult challenge. Therefore, a reverse strategy, which identifies functionally important SNPs by virtue of the bimodal abundance across the human population of the SNP-related mRNAs will be useful. Those mRNA transcripts that are expressed at two distinct abundances in proportion to SNP allele frequency may warrant further study. Matrix metalloproteinase 1 (MMP1) is important in both normal development and in numerous pathologies. Although much research has been conducted to investigate the expression of MMP1 in many different cell types and conditions, the regulation of its expression is still not fully understood.

Results: In this study, we used a novel but straightforward method based on agglomerative hierarchical clustering to identify bimodally expressed transcripts in human umbilical vein endothelial cell (HUVEC) microarray data from 15 individuals. We found that MMP1 mRNA abundance was bimodally distributed in un-treated HUVECs and showed a bimodal response to inflammatory mediator treatment. RT-PCR and MMP1 activity assays confirmed the bimodal regulation and DNA sequencing of 69 individuals identified an MMP1 gene promoter polymorphism that segregated precisely with the MMP1 bimodal expression. Chromatin immunoprecipitation (ChIP) experiments indicated that the transcription factors (TFs) ETS1, ETS2 and GATA3, bind to the MMP1 promoter in the region of this polymorphism and may contribute to the bimodal expression.

Conclusions: We describe a simple method to identify putative bimodally expressed RNAs from transcriptome data that is effective yet easy for non-statisticians to understand and use. This method identified bimodal endothelial cell expression of MMP1, which appears to be biologically significant with implications for inflammatory disease. (271 Words).

Show MeSH
Related in: MedlinePlus