Limits...
Towards accurate imputation of quantitative genetic interactions.

Ulitsky I, Krogan NJ, Shamir R - Genome Biol. (2009)

Bottom Line: Recent technological breakthroughs have enabled high-throughput quantitative measurements of hundreds of thousands of genetic interactions among hundreds of genes in Saccharomyces cerevisiae.Here we present a novel method, which combines genetic interaction data together with diverse genomic data, to quantitatively impute these missing interactions.We also present data on almost 190,000 novel interactions.

View Article: PubMed Central - HTML - PubMed

Affiliation: Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv 69978, Israel. ulitsky@wi.mit.edu

ABSTRACT
Recent technological breakthroughs have enabled high-throughput quantitative measurements of hundreds of thousands of genetic interactions among hundreds of genes in Saccharomyces cerevisiae. However, these assays often fail to measure the genetic interactions among up to 40% of the studied gene pairs. Here we present a novel method, which combines genetic interaction data together with diverse genomic data, to quantitatively impute these missing interactions. We also present data on almost 190,000 novel interactions.

Show MeSH
Predicted S-scores for different groups of gene pairs. The groups are categorized as synthetic lethal (SL) or synthetic sick (SS) according to the BioGrid database. The gene pairs in these groups were all missing in the ChromBio EMAP. 'Other' indicates all other pairs in BioGrid. The cumulative density function is shown for each group of gene pairs.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2812947&req=5

Figure 5: Predicted S-scores for different groups of gene pairs. The groups are categorized as synthetic lethal (SL) or synthetic sick (SS) according to the BioGrid database. The gene pairs in these groups were all missing in the ChromBio EMAP. 'Other' indicates all other pairs in BioGrid. The cumulative density function is shown for each group of gene pairs.

Mentions: As an additional test for the accuracy of our method in prediction of negative GIs, we looked for pairs of genes from the ChromBio set with reported GIs that were not measured in the ChromBio E-MAP. We found 376 (279) synthetic lethal (sick) pairs with these properties in the BioGrid database [28]. The distribution of S-scores predicted for these pairs using linear regression and GSG+MATRIX features is shown in Figure 5. Note that here all the GI information originated from the E-MAP, and no information from BioGrid was used to construct the GSG+MATRIX features. Gene pairs marked as synthetic lethal in BioGrid had lower predicted S-scores (average = -1.81) than those marked as synthetic sick (average = -0.82, t-test P-value = 4.7 × 10-9) and than all other gene pairs in the ChromBio E-MAP (average = -0.14, t-test P-value <10-200). We also tested a discrete classifier, the Naïve Bayes classifier, and found that 174 (47.4%) of the gene pairs marked as synthetic lethal in BioGrid were predicted to be negative by our method. This fraction is likely to be an underestimate for the sensitivity of our method, as GIs in BioGrid were obtained in a variety of strains and conditions that were not necessarily the same as those used for the ChromBio E-MAP. Note that it is not possible to use BioGrid to estimate the specificity of our method, as it aggregates only successful negative GI detections from many high- and low-throughput studies, and it is not known which gene pairs were actually tested unsuccessfully in each study. Unfortunately, we could not use BioGrid to validate our positive interaction prediction accuracy: BioGrid contained only 76 pairs with unambiguous positive interactions that were not measured in the E-MAP, and this number was too small for evaluating our prediction accuracy (results not shown).


Towards accurate imputation of quantitative genetic interactions.

Ulitsky I, Krogan NJ, Shamir R - Genome Biol. (2009)

Predicted S-scores for different groups of gene pairs. The groups are categorized as synthetic lethal (SL) or synthetic sick (SS) according to the BioGrid database. The gene pairs in these groups were all missing in the ChromBio EMAP. 'Other' indicates all other pairs in BioGrid. The cumulative density function is shown for each group of gene pairs.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2812947&req=5

Figure 5: Predicted S-scores for different groups of gene pairs. The groups are categorized as synthetic lethal (SL) or synthetic sick (SS) according to the BioGrid database. The gene pairs in these groups were all missing in the ChromBio EMAP. 'Other' indicates all other pairs in BioGrid. The cumulative density function is shown for each group of gene pairs.
Mentions: As an additional test for the accuracy of our method in prediction of negative GIs, we looked for pairs of genes from the ChromBio set with reported GIs that were not measured in the ChromBio E-MAP. We found 376 (279) synthetic lethal (sick) pairs with these properties in the BioGrid database [28]. The distribution of S-scores predicted for these pairs using linear regression and GSG+MATRIX features is shown in Figure 5. Note that here all the GI information originated from the E-MAP, and no information from BioGrid was used to construct the GSG+MATRIX features. Gene pairs marked as synthetic lethal in BioGrid had lower predicted S-scores (average = -1.81) than those marked as synthetic sick (average = -0.82, t-test P-value = 4.7 × 10-9) and than all other gene pairs in the ChromBio E-MAP (average = -0.14, t-test P-value <10-200). We also tested a discrete classifier, the Naïve Bayes classifier, and found that 174 (47.4%) of the gene pairs marked as synthetic lethal in BioGrid were predicted to be negative by our method. This fraction is likely to be an underestimate for the sensitivity of our method, as GIs in BioGrid were obtained in a variety of strains and conditions that were not necessarily the same as those used for the ChromBio E-MAP. Note that it is not possible to use BioGrid to estimate the specificity of our method, as it aggregates only successful negative GI detections from many high- and low-throughput studies, and it is not known which gene pairs were actually tested unsuccessfully in each study. Unfortunately, we could not use BioGrid to validate our positive interaction prediction accuracy: BioGrid contained only 76 pairs with unambiguous positive interactions that were not measured in the E-MAP, and this number was too small for evaluating our prediction accuracy (results not shown).

Bottom Line: Recent technological breakthroughs have enabled high-throughput quantitative measurements of hundreds of thousands of genetic interactions among hundreds of genes in Saccharomyces cerevisiae.Here we present a novel method, which combines genetic interaction data together with diverse genomic data, to quantitatively impute these missing interactions.We also present data on almost 190,000 novel interactions.

View Article: PubMed Central - HTML - PubMed

Affiliation: Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv 69978, Israel. ulitsky@wi.mit.edu

ABSTRACT
Recent technological breakthroughs have enabled high-throughput quantitative measurements of hundreds of thousands of genetic interactions among hundreds of genes in Saccharomyces cerevisiae. However, these assays often fail to measure the genetic interactions among up to 40% of the studied gene pairs. Here we present a novel method, which combines genetic interaction data together with diverse genomic data, to quantitatively impute these missing interactions. We also present data on almost 190,000 novel interactions.

Show MeSH