Limits...
Deciphering the code for retroviral integration target site selection.

Santoni FA, Hartley O, Luban J - PLoS Comput. Biol. (2010)

Bottom Line: ChIPSeq datasets for more than 60 factors were compared with 14 retroviral integration datasets.When compared with MLV, PERV or XMRV integration sites, strong association was observed with STAT1, acetylation of H3 and H4 at several positions, and methylation of H2AZ, H3K4, and K9.The supermarker thus identifies chromosomal features highly favored for retroviral integration, provides clues to the mechanism by which retrovirus integration sites are selected, and offers a tool for predicting cell-type specific proto-oncogene activation by retroviruses.

View Article: PubMed Central - PubMed

Affiliation: Department of Microbiology and Molecular Medicine, University of Geneva, Geneva, Switzerland.

ABSTRACT
Upon cell invasion, retroviruses generate a DNA copy of their RNA genome and integrate retroviral cDNA within host chromosomal DNA. Integration occurs throughout the host cell genome, but target site selection is not random. Each subgroup of retrovirus is distinguished from the others by attraction to particular features on chromosomes. Despite extensive efforts to identify host factors that interact with retrovirion components or chromosome features predictive of integration, little is known about how integration sites are selected. We attempted to identify markers predictive of retroviral integration by exploiting Precision-Recall methods for extracting information from highly skewed datasets to derive robust and discriminating measures of association. ChIPSeq datasets for more than 60 factors were compared with 14 retroviral integration datasets. When compared with MLV, PERV or XMRV integration sites, strong association was observed with STAT1, acetylation of H3 and H4 at several positions, and methylation of H2AZ, H3K4, and K9. By combining peaks from ChIPSeq datasets, a supermarker was identified that localized within 2 kB of 75% of MLV proviruses and detected differences in integration preferences among different cell types. The supermarker predicted the likelihood of integration within specific chromosomal regions in a cell-type specific manner, yielding probabilities for integration into proto-oncogene LMO2 identical to experimentally determined values. The supermarker thus identifies chromosomal features highly favored for retroviral integration, provides clues to the mechanism by which retrovirus integration sites are selected, and offers a tool for predicting cell-type specific proto-oncogene activation by retroviruses.

Show MeSH
Effect of the window size on F score.(A) Plot of Area Under Curve (AUC) or F score (both on left Y scale) or the absolute value of the p-value exponent (right Y scale) for MLV with respect to H3K4me3 as a function of window size in basepairs. (B) Pearson correlation for AUC or F score (both on X axis) versus the absolute value of the p-value exponent (Y axis).
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC2991247&req=5

pcbi-1001008-g007: Effect of the window size on F score.(A) Plot of Area Under Curve (AUC) or F score (both on left Y scale) or the absolute value of the p-value exponent (right Y scale) for MLV with respect to H3K4me3 as a function of window size in basepairs. (B) Pearson correlation for AUC or F score (both on X axis) versus the absolute value of the p-value exponent (Y axis).

Mentions: As demonstrated for the F score (Figure 6), the area under the curve (AUC) ROC method used previously to evaluate markers associated with retroviral integration sites [78], [80], [81] is a robust measure that is insensitive to dataset size. Like the F score, AUC(ROC) also works well to assess markers that are weakly or moderately associated with integration sites (Text S1). But, as demonstrated for the highly associated marker H3K4me3, AUC(ROC) does not respond to the increase in false positives that is expected with increasing window size (Figure 7A). Moreover, this insensitivity to false positives leads AUC(ROC) to overestimate the association of markers that are more common in the genome. Consequently, AUC ranks markers differently from statistical significance, as shown in Figure 8 and discussed in more detail in Text S1. In contrast, the p-value and the F0.5 score incorporate an adjustment for the increase in false positives as window size increases, and both measures achieve a maximal value at a window size of 2 kB (Figure 7A). A standard regression plot shows that the F0.5 score tracks with the p-value almost perfectly (R2 = 0.97), whereas the AUC(ROC) diverges considerably (R2 = 0.37) (Figure 7B). The F0.5 score and the p-value adjust similarly for the increasing number of false positives.


Deciphering the code for retroviral integration target site selection.

Santoni FA, Hartley O, Luban J - PLoS Comput. Biol. (2010)

Effect of the window size on F score.(A) Plot of Area Under Curve (AUC) or F score (both on left Y scale) or the absolute value of the p-value exponent (right Y scale) for MLV with respect to H3K4me3 as a function of window size in basepairs. (B) Pearson correlation for AUC or F score (both on X axis) versus the absolute value of the p-value exponent (Y axis).
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC2991247&req=5

pcbi-1001008-g007: Effect of the window size on F score.(A) Plot of Area Under Curve (AUC) or F score (both on left Y scale) or the absolute value of the p-value exponent (right Y scale) for MLV with respect to H3K4me3 as a function of window size in basepairs. (B) Pearson correlation for AUC or F score (both on X axis) versus the absolute value of the p-value exponent (Y axis).
Mentions: As demonstrated for the F score (Figure 6), the area under the curve (AUC) ROC method used previously to evaluate markers associated with retroviral integration sites [78], [80], [81] is a robust measure that is insensitive to dataset size. Like the F score, AUC(ROC) also works well to assess markers that are weakly or moderately associated with integration sites (Text S1). But, as demonstrated for the highly associated marker H3K4me3, AUC(ROC) does not respond to the increase in false positives that is expected with increasing window size (Figure 7A). Moreover, this insensitivity to false positives leads AUC(ROC) to overestimate the association of markers that are more common in the genome. Consequently, AUC ranks markers differently from statistical significance, as shown in Figure 8 and discussed in more detail in Text S1. In contrast, the p-value and the F0.5 score incorporate an adjustment for the increase in false positives as window size increases, and both measures achieve a maximal value at a window size of 2 kB (Figure 7A). A standard regression plot shows that the F0.5 score tracks with the p-value almost perfectly (R2 = 0.97), whereas the AUC(ROC) diverges considerably (R2 = 0.37) (Figure 7B). The F0.5 score and the p-value adjust similarly for the increasing number of false positives.

Bottom Line: ChIPSeq datasets for more than 60 factors were compared with 14 retroviral integration datasets.When compared with MLV, PERV or XMRV integration sites, strong association was observed with STAT1, acetylation of H3 and H4 at several positions, and methylation of H2AZ, H3K4, and K9.The supermarker thus identifies chromosomal features highly favored for retroviral integration, provides clues to the mechanism by which retrovirus integration sites are selected, and offers a tool for predicting cell-type specific proto-oncogene activation by retroviruses.

View Article: PubMed Central - PubMed

Affiliation: Department of Microbiology and Molecular Medicine, University of Geneva, Geneva, Switzerland.

ABSTRACT
Upon cell invasion, retroviruses generate a DNA copy of their RNA genome and integrate retroviral cDNA within host chromosomal DNA. Integration occurs throughout the host cell genome, but target site selection is not random. Each subgroup of retrovirus is distinguished from the others by attraction to particular features on chromosomes. Despite extensive efforts to identify host factors that interact with retrovirion components or chromosome features predictive of integration, little is known about how integration sites are selected. We attempted to identify markers predictive of retroviral integration by exploiting Precision-Recall methods for extracting information from highly skewed datasets to derive robust and discriminating measures of association. ChIPSeq datasets for more than 60 factors were compared with 14 retroviral integration datasets. When compared with MLV, PERV or XMRV integration sites, strong association was observed with STAT1, acetylation of H3 and H4 at several positions, and methylation of H2AZ, H3K4, and K9. By combining peaks from ChIPSeq datasets, a supermarker was identified that localized within 2 kB of 75% of MLV proviruses and detected differences in integration preferences among different cell types. The supermarker predicted the likelihood of integration within specific chromosomal regions in a cell-type specific manner, yielding probabilities for integration into proto-oncogene LMO2 identical to experimentally determined values. The supermarker thus identifies chromosomal features highly favored for retroviral integration, provides clues to the mechanism by which retrovirus integration sites are selected, and offers a tool for predicting cell-type specific proto-oncogene activation by retroviruses.

Show MeSH