Limits...
Characterization and prediction of haploinsufficiency using systems-level gene properties in yeast.

Norris M, Lovell S, Delneri D - G3 (Bethesda) (2013)

Bottom Line: Variation in gene copy number can significantly affect organism fitness.In this work, we identified associations between Saccharomyces cerevisiae gene properties and genome-scale haploinsufficiency phenotypes from previous work.Additionally, haploinsufficiency showed negative relationships with cell cycle regulation and promoter sequence conservation.

View Article: PubMed Central - PubMed

Affiliation: Faculty of Life Sciences, University of Manchester, Manchester, Lancashire, M13 9PT, United Kingdom.

ABSTRACT
Variation in gene copy number can significantly affect organism fitness. When one allele is missing in a diploid, the phenotype can be compromised because of haploinsufficiency. In this work, we identified associations between Saccharomyces cerevisiae gene properties and genome-scale haploinsufficiency phenotypes from previous work. We compared the haploinsufficiency profiles against 23 gene properties and found that genes with higher level of connectivity (degree) in a protein-protein interaction network, higher genetic interaction degree, greater gene sequence conservation, and higher protein expression were significantly more likely to be haploinsufficient. Additionally, haploinsufficiency showed negative relationships with cell cycle regulation and promoter sequence conservation.

Show MeSH
False-positive rate (FPR) ≤ 0.1 area under curve (AUC) distributions across all combinations of gene properties, using median imputation. This demonstrates that model performance tends to increase as more gene properties are added. Our candidate six gene properties (6GP) model is highlighted with an arrow. The three letter codes identify gene properties and are described in the legend. Distributions are for 100 receiver-operating characteristic (ROC) curves generated during cross-validation (see Materials and Methods). Whiskers represent the lowest point within 1.5 interquartile range (IQR) of the lower quartile and the highest point within 1.5 IQR of the upper quartile. Dots represent outliers of the aforementioned ranges. The black horizontal line represents the random expectation from the ROC plot.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3815059&req=5

fig4: False-positive rate (FPR) ≤ 0.1 area under curve (AUC) distributions across all combinations of gene properties, using median imputation. This demonstrates that model performance tends to increase as more gene properties are added. Our candidate six gene properties (6GP) model is highlighted with an arrow. The three letter codes identify gene properties and are described in the legend. Distributions are for 100 receiver-operating characteristic (ROC) curves generated during cross-validation (see Materials and Methods). Whiskers represent the lowest point within 1.5 interquartile range (IQR) of the lower quartile and the highest point within 1.5 IQR of the upper quartile. Dots represent outliers of the aforementioned ranges. The black horizontal line represents the random expectation from the ROC plot.

Mentions: The missing value handling methods produced similar ROC curves, showing that prediction quality is largely independent of the imputation method used. We chose median imputation to produce our candidate models because it yields a high FPR ≤ 0.1 AUC and has low cross-validation AUC variation. The FPR ≤ 0.1 AUC distributions for our median imputation models incorporating all possible combinations of gene properties are presented in Figure 4. These distributions are shown for all imputation methods tested in Figure S3.


Characterization and prediction of haploinsufficiency using systems-level gene properties in yeast.

Norris M, Lovell S, Delneri D - G3 (Bethesda) (2013)

False-positive rate (FPR) ≤ 0.1 area under curve (AUC) distributions across all combinations of gene properties, using median imputation. This demonstrates that model performance tends to increase as more gene properties are added. Our candidate six gene properties (6GP) model is highlighted with an arrow. The three letter codes identify gene properties and are described in the legend. Distributions are for 100 receiver-operating characteristic (ROC) curves generated during cross-validation (see Materials and Methods). Whiskers represent the lowest point within 1.5 interquartile range (IQR) of the lower quartile and the highest point within 1.5 IQR of the upper quartile. Dots represent outliers of the aforementioned ranges. The black horizontal line represents the random expectation from the ROC plot.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3815059&req=5

fig4: False-positive rate (FPR) ≤ 0.1 area under curve (AUC) distributions across all combinations of gene properties, using median imputation. This demonstrates that model performance tends to increase as more gene properties are added. Our candidate six gene properties (6GP) model is highlighted with an arrow. The three letter codes identify gene properties and are described in the legend. Distributions are for 100 receiver-operating characteristic (ROC) curves generated during cross-validation (see Materials and Methods). Whiskers represent the lowest point within 1.5 interquartile range (IQR) of the lower quartile and the highest point within 1.5 IQR of the upper quartile. Dots represent outliers of the aforementioned ranges. The black horizontal line represents the random expectation from the ROC plot.
Mentions: The missing value handling methods produced similar ROC curves, showing that prediction quality is largely independent of the imputation method used. We chose median imputation to produce our candidate models because it yields a high FPR ≤ 0.1 AUC and has low cross-validation AUC variation. The FPR ≤ 0.1 AUC distributions for our median imputation models incorporating all possible combinations of gene properties are presented in Figure 4. These distributions are shown for all imputation methods tested in Figure S3.

Bottom Line: Variation in gene copy number can significantly affect organism fitness.In this work, we identified associations between Saccharomyces cerevisiae gene properties and genome-scale haploinsufficiency phenotypes from previous work.Additionally, haploinsufficiency showed negative relationships with cell cycle regulation and promoter sequence conservation.

View Article: PubMed Central - PubMed

Affiliation: Faculty of Life Sciences, University of Manchester, Manchester, Lancashire, M13 9PT, United Kingdom.

ABSTRACT
Variation in gene copy number can significantly affect organism fitness. When one allele is missing in a diploid, the phenotype can be compromised because of haploinsufficiency. In this work, we identified associations between Saccharomyces cerevisiae gene properties and genome-scale haploinsufficiency phenotypes from previous work. We compared the haploinsufficiency profiles against 23 gene properties and found that genes with higher level of connectivity (degree) in a protein-protein interaction network, higher genetic interaction degree, greater gene sequence conservation, and higher protein expression were significantly more likely to be haploinsufficient. Additionally, haploinsufficiency showed negative relationships with cell cycle regulation and promoter sequence conservation.

Show MeSH