Limits...
Identification and Correction of Sample Mix-Ups in Expression Genetic Data: A Case Study.

Broman KW, Keller MP, Broman AT, Kendziorski C, Yandell BS, Sen Ś, Attie AD - G3 (Bethesda) (2015)

Bottom Line: Consideration of the plate positions of the DNA samples indicated a number of off-by-one and off-by-two errors, likely the result of pipetting errors.Such sample mix-ups can be a problem in any genetic study, but eQTL data allow us to identify, and even correct, such problems.Our methods have been implemented in an R package, R/lineup.

View Article: PubMed Central - PubMed

Affiliation: Department of Biostatistics and Medical Informatics, University of Wisconsin, Madison, Wisconsin 53706 kbroman@biostat.wisc.edu.

No MeSH data available.


Scheme for evaluating the similarity between genotypes and expression arrays. We first identify a set of probes with strong local expression quantitative trait loci (eQTL). For each such eQTL, we use the samples with both genotype and expression data (A) to form a classifier for predicting eQTL genotype from the expression value (B). We then compare the observed eQTL genotypes for one sample to the inferred eQTL genotypes, from the classifiers, for another sample (C). The proportion of matches, between the observed and inferred genotypes, forms a similarity matrix (D), for which darker squares indicate greater similarity. Orange squares indicate missing values (for example, samples with genotype data but no expression data).
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4592999&req=5

fig2: Scheme for evaluating the similarity between genotypes and expression arrays. We first identify a set of probes with strong local expression quantitative trait loci (eQTL). For each such eQTL, we use the samples with both genotype and expression data (A) to form a classifier for predicting eQTL genotype from the expression value (B). We then compare the observed eQTL genotypes for one sample to the inferred eQTL genotypes, from the classifiers, for another sample (C). The proportion of matches, between the observed and inferred genotypes, forms a similarity matrix (D), for which darker squares indicate greater similarity. Orange squares indicate missing values (for example, samples with genotype data but no expression data).

Mentions: As an illustration, consider the schematic in Figure 2: for each tissue, we identified a subset of array probes with strong local eQTL, we derived classifiers for predicting eQTL genotype from the corresponding expression phenotypes, and then constructed a matrix of inferred eQTL genotypes. As a measure of similarity between a DNA sample and an mRNA sample, we calculated the proportion of matches between the observed eQTL genotypes for the DNA sample and the inferred eQTL genotypes for the mRNA sample.


Identification and Correction of Sample Mix-Ups in Expression Genetic Data: A Case Study.

Broman KW, Keller MP, Broman AT, Kendziorski C, Yandell BS, Sen Ś, Attie AD - G3 (Bethesda) (2015)

Scheme for evaluating the similarity between genotypes and expression arrays. We first identify a set of probes with strong local expression quantitative trait loci (eQTL). For each such eQTL, we use the samples with both genotype and expression data (A) to form a classifier for predicting eQTL genotype from the expression value (B). We then compare the observed eQTL genotypes for one sample to the inferred eQTL genotypes, from the classifiers, for another sample (C). The proportion of matches, between the observed and inferred genotypes, forms a similarity matrix (D), for which darker squares indicate greater similarity. Orange squares indicate missing values (for example, samples with genotype data but no expression data).
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4592999&req=5

fig2: Scheme for evaluating the similarity between genotypes and expression arrays. We first identify a set of probes with strong local expression quantitative trait loci (eQTL). For each such eQTL, we use the samples with both genotype and expression data (A) to form a classifier for predicting eQTL genotype from the expression value (B). We then compare the observed eQTL genotypes for one sample to the inferred eQTL genotypes, from the classifiers, for another sample (C). The proportion of matches, between the observed and inferred genotypes, forms a similarity matrix (D), for which darker squares indicate greater similarity. Orange squares indicate missing values (for example, samples with genotype data but no expression data).
Mentions: As an illustration, consider the schematic in Figure 2: for each tissue, we identified a subset of array probes with strong local eQTL, we derived classifiers for predicting eQTL genotype from the corresponding expression phenotypes, and then constructed a matrix of inferred eQTL genotypes. As a measure of similarity between a DNA sample and an mRNA sample, we calculated the proportion of matches between the observed eQTL genotypes for the DNA sample and the inferred eQTL genotypes for the mRNA sample.

Bottom Line: Consideration of the plate positions of the DNA samples indicated a number of off-by-one and off-by-two errors, likely the result of pipetting errors.Such sample mix-ups can be a problem in any genetic study, but eQTL data allow us to identify, and even correct, such problems.Our methods have been implemented in an R package, R/lineup.

View Article: PubMed Central - PubMed

Affiliation: Department of Biostatistics and Medical Informatics, University of Wisconsin, Madison, Wisconsin 53706 kbroman@biostat.wisc.edu.

No MeSH data available.