Limits...
Genetics of single-cell protein abundance variation in large yeast populations.

Albert FW, Treusch S, Shockley AH, Bloom JS, Kruglyak L - Nature (2014)

Bottom Line: The effects of such variants can be detected as expression quantitative trait loci (eQTL).Consequently, many eQTL are probably missed, especially those with smaller effects.We also observed closer correspondence between loci that influence protein abundance and loci that influence mRNA abundance of a given gene.

View Article: PubMed Central - PubMed

Affiliation: 1] Department of Human Genetics, University of California, Los Angeles, California 90095, USA [2] Lewis Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey 08544, USA.

ABSTRACT
Variation among individuals arises in part from differences in DNA sequences, but the genetic basis for variation in most traits, including common diseases, remains only partly understood. Many DNA variants influence phenotypes by altering the expression level of one or several genes. The effects of such variants can be detected as expression quantitative trait loci (eQTL). Traditional eQTL mapping requires large-scale genotype and gene expression data for each individual in the study sample, which limits sample sizes to hundreds of individuals in both humans and model organisms and reduces statistical power. Consequently, many eQTL are probably missed, especially those with smaller effects. Furthermore, most studies use messenger RNA rather than protein abundance as the measure of gene expression. Studies that have used mass-spectrometry proteomics reported unexpected differences between eQTL and protein QTL (pQTL) for the same genes, but these studies have been even more limited in scope. Here we introduce a powerful method for identifying genetic loci that influence protein expression in the yeast Saccharomyces cerevisiae. We measure single-cell protein abundance through the use of green fluorescent protein tags in very large populations of genetically variable cells, and use pooled sequencing to compare allele frequencies across the genome in thousands of individuals with high versus low protein abundance. We applied this method to 160 genes and detected many more loci per gene than previous studies. We also observed closer correspondence between loci that influence protein abundance and loci that influence mRNA abundance of a given gene. Most loci that we detected were clustered in 'hotspots' that influence multiple proteins, and some hotspots were found to influence more than half of the proteins that we examined. The variants that underlie these hotspots have profound effects on the gene regulatory network and provide insights into genetic variation in cell physiology between yeast strains.

Show MeSH

Related in: MedlinePlus

Sequence analyses and X-pQTL detection exampleIn all panels, physical genomic coordinates are shown on the x-axes. The position of the gene (LEU1) is indicated by the purple horizontal line.Top panel: Frequency of the BY allele in the high (red) and low (blue) GFP population. SNPs are indicated by dots, and loess-smoothed averages as solid lines. Note the fixation for the BY allele in all segregants at the gene position as well as at the mating type locus on chromosome III, as well as the fixation for the RM allele at the SGA marker integrated at the CAN1 locus on the left arm of chromosome V.Middle panel: Subtraction of allele frequencies in the low from those in the high GFP population. SNPs are indicated by grey dots, with the loess-smoothed average indicated in black. Note that on average, there is no difference between the high and the low populations. Positive difference values correspond to a higher frequency of the BY allele in the high GFP population, which we interpret as higher expression being caused by the BY allele at that locus. The red horizontal lines indicate the 99.99% quantile from the empirical “” sort experiments. They are shown for illustration only and were not used for peak calling. The blue vertical boxes indicate positions of genome-wide X-pQTL, with the width representing the 2-LOD drop interval.Bottom panel: LOD scores obtained from MULTIPOOL 16. The red horizontal line is the genome-wide significance threshold (LOD = 4.5). Stars indicate X-pQTL called by our algorithm; these positions correspond to the blue bars in the middle panel. For this gene, 14 X-pQTL are called.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC4285441&req=5

Figure 6: Sequence analyses and X-pQTL detection exampleIn all panels, physical genomic coordinates are shown on the x-axes. The position of the gene (LEU1) is indicated by the purple horizontal line.Top panel: Frequency of the BY allele in the high (red) and low (blue) GFP population. SNPs are indicated by dots, and loess-smoothed averages as solid lines. Note the fixation for the BY allele in all segregants at the gene position as well as at the mating type locus on chromosome III, as well as the fixation for the RM allele at the SGA marker integrated at the CAN1 locus on the left arm of chromosome V.Middle panel: Subtraction of allele frequencies in the low from those in the high GFP population. SNPs are indicated by grey dots, with the loess-smoothed average indicated in black. Note that on average, there is no difference between the high and the low populations. Positive difference values correspond to a higher frequency of the BY allele in the high GFP population, which we interpret as higher expression being caused by the BY allele at that locus. The red horizontal lines indicate the 99.99% quantile from the empirical “” sort experiments. They are shown for illustration only and were not used for peak calling. The blue vertical boxes indicate positions of genome-wide X-pQTL, with the width representing the 2-LOD drop interval.Bottom panel: LOD scores obtained from MULTIPOOL 16. The red horizontal line is the genome-wide significance threshold (LOD = 4.5). Stars indicate X-pQTL called by our algorithm; these positions correspond to the blue bars in the middle panel. For this gene, 14 X-pQTL are called.

Mentions: We developed a method for detecting genetic influences on protein levels in large populations of genetically distinct individual yeast cells (Extended Data Figure 1). The method leverages extreme QTL mapping (X-QTL), a bulk segregant QTL mapping strategy with high statistical power 14. We quantified protein abundance by measuring levels of green fluorescent protein (GFP) inserted in-frame downstream of a given gene of interest. The GFP tag allows protein abundance to be rapidly and accurately measured in millions of live, single cells by fluorescence-activated cell sorting (FACS). To apply the method to many genes, we took advantage of the yeast GFP collection 15, in which over 4,000 strains each contain a different gene tagged with GFP in a common genetic background (BY). For each gene under study, we crossed the GFP strain to a genetically divergent vineyard strain (RM) and generated a large pool of haploid GFP-positive offspring (segregants) of the same mating type. Across the genome, each segregant inherits either the BY or the RM allele at each locus, some of which influence the given gene’s protein level. We took a starting population of over 500,000 segregants and used FACS to collect 10,000 cells each from the high and low tails of GFP levels (Extended Data Figure 2A). Such selection of phenotypically extreme individuals from a large population provides high power to detect loci with small effects 14. We extracted DNA in bulk from these extreme populations, sequenced it to ~34 fold coverage, and used an analysis method that combines information across linked SNPs to accurately estimate allele frequencies from this depth of coverage 16. We detected loci that influence protein abundance as genomic regions where the high and low GFP pools differ in the frequency of the parental alleles (Extended Data Figure 3). We denote these loci “extreme protein QTL” or X-pQTL.


Genetics of single-cell protein abundance variation in large yeast populations.

Albert FW, Treusch S, Shockley AH, Bloom JS, Kruglyak L - Nature (2014)

Sequence analyses and X-pQTL detection exampleIn all panels, physical genomic coordinates are shown on the x-axes. The position of the gene (LEU1) is indicated by the purple horizontal line.Top panel: Frequency of the BY allele in the high (red) and low (blue) GFP population. SNPs are indicated by dots, and loess-smoothed averages as solid lines. Note the fixation for the BY allele in all segregants at the gene position as well as at the mating type locus on chromosome III, as well as the fixation for the RM allele at the SGA marker integrated at the CAN1 locus on the left arm of chromosome V.Middle panel: Subtraction of allele frequencies in the low from those in the high GFP population. SNPs are indicated by grey dots, with the loess-smoothed average indicated in black. Note that on average, there is no difference between the high and the low populations. Positive difference values correspond to a higher frequency of the BY allele in the high GFP population, which we interpret as higher expression being caused by the BY allele at that locus. The red horizontal lines indicate the 99.99% quantile from the empirical “” sort experiments. They are shown for illustration only and were not used for peak calling. The blue vertical boxes indicate positions of genome-wide X-pQTL, with the width representing the 2-LOD drop interval.Bottom panel: LOD scores obtained from MULTIPOOL 16. The red horizontal line is the genome-wide significance threshold (LOD = 4.5). Stars indicate X-pQTL called by our algorithm; these positions correspond to the blue bars in the middle panel. For this gene, 14 X-pQTL are called.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC4285441&req=5

Figure 6: Sequence analyses and X-pQTL detection exampleIn all panels, physical genomic coordinates are shown on the x-axes. The position of the gene (LEU1) is indicated by the purple horizontal line.Top panel: Frequency of the BY allele in the high (red) and low (blue) GFP population. SNPs are indicated by dots, and loess-smoothed averages as solid lines. Note the fixation for the BY allele in all segregants at the gene position as well as at the mating type locus on chromosome III, as well as the fixation for the RM allele at the SGA marker integrated at the CAN1 locus on the left arm of chromosome V.Middle panel: Subtraction of allele frequencies in the low from those in the high GFP population. SNPs are indicated by grey dots, with the loess-smoothed average indicated in black. Note that on average, there is no difference between the high and the low populations. Positive difference values correspond to a higher frequency of the BY allele in the high GFP population, which we interpret as higher expression being caused by the BY allele at that locus. The red horizontal lines indicate the 99.99% quantile from the empirical “” sort experiments. They are shown for illustration only and were not used for peak calling. The blue vertical boxes indicate positions of genome-wide X-pQTL, with the width representing the 2-LOD drop interval.Bottom panel: LOD scores obtained from MULTIPOOL 16. The red horizontal line is the genome-wide significance threshold (LOD = 4.5). Stars indicate X-pQTL called by our algorithm; these positions correspond to the blue bars in the middle panel. For this gene, 14 X-pQTL are called.
Mentions: We developed a method for detecting genetic influences on protein levels in large populations of genetically distinct individual yeast cells (Extended Data Figure 1). The method leverages extreme QTL mapping (X-QTL), a bulk segregant QTL mapping strategy with high statistical power 14. We quantified protein abundance by measuring levels of green fluorescent protein (GFP) inserted in-frame downstream of a given gene of interest. The GFP tag allows protein abundance to be rapidly and accurately measured in millions of live, single cells by fluorescence-activated cell sorting (FACS). To apply the method to many genes, we took advantage of the yeast GFP collection 15, in which over 4,000 strains each contain a different gene tagged with GFP in a common genetic background (BY). For each gene under study, we crossed the GFP strain to a genetically divergent vineyard strain (RM) and generated a large pool of haploid GFP-positive offspring (segregants) of the same mating type. Across the genome, each segregant inherits either the BY or the RM allele at each locus, some of which influence the given gene’s protein level. We took a starting population of over 500,000 segregants and used FACS to collect 10,000 cells each from the high and low tails of GFP levels (Extended Data Figure 2A). Such selection of phenotypically extreme individuals from a large population provides high power to detect loci with small effects 14. We extracted DNA in bulk from these extreme populations, sequenced it to ~34 fold coverage, and used an analysis method that combines information across linked SNPs to accurately estimate allele frequencies from this depth of coverage 16. We detected loci that influence protein abundance as genomic regions where the high and low GFP pools differ in the frequency of the parental alleles (Extended Data Figure 3). We denote these loci “extreme protein QTL” or X-pQTL.

Bottom Line: The effects of such variants can be detected as expression quantitative trait loci (eQTL).Consequently, many eQTL are probably missed, especially those with smaller effects.We also observed closer correspondence between loci that influence protein abundance and loci that influence mRNA abundance of a given gene.

View Article: PubMed Central - PubMed

Affiliation: 1] Department of Human Genetics, University of California, Los Angeles, California 90095, USA [2] Lewis Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey 08544, USA.

ABSTRACT
Variation among individuals arises in part from differences in DNA sequences, but the genetic basis for variation in most traits, including common diseases, remains only partly understood. Many DNA variants influence phenotypes by altering the expression level of one or several genes. The effects of such variants can be detected as expression quantitative trait loci (eQTL). Traditional eQTL mapping requires large-scale genotype and gene expression data for each individual in the study sample, which limits sample sizes to hundreds of individuals in both humans and model organisms and reduces statistical power. Consequently, many eQTL are probably missed, especially those with smaller effects. Furthermore, most studies use messenger RNA rather than protein abundance as the measure of gene expression. Studies that have used mass-spectrometry proteomics reported unexpected differences between eQTL and protein QTL (pQTL) for the same genes, but these studies have been even more limited in scope. Here we introduce a powerful method for identifying genetic loci that influence protein expression in the yeast Saccharomyces cerevisiae. We measure single-cell protein abundance through the use of green fluorescent protein tags in very large populations of genetically variable cells, and use pooled sequencing to compare allele frequencies across the genome in thousands of individuals with high versus low protein abundance. We applied this method to 160 genes and detected many more loci per gene than previous studies. We also observed closer correspondence between loci that influence protein abundance and loci that influence mRNA abundance of a given gene. Most loci that we detected were clustered in 'hotspots' that influence multiple proteins, and some hotspots were found to influence more than half of the proteins that we examined. The variants that underlie these hotspots have profound effects on the gene regulatory network and provide insights into genetic variation in cell physiology between yeast strains.

Show MeSH
Related in: MedlinePlus