Limits...
Deciphering the code for retroviral integration target site selection.

Santoni FA, Hartley O, Luban J - PLoS Comput. Biol. (2010)

Bottom Line: ChIPSeq datasets for more than 60 factors were compared with 14 retroviral integration datasets.When compared with MLV, PERV or XMRV integration sites, strong association was observed with STAT1, acetylation of H3 and H4 at several positions, and methylation of H2AZ, H3K4, and K9.The supermarker thus identifies chromosomal features highly favored for retroviral integration, provides clues to the mechanism by which retrovirus integration sites are selected, and offers a tool for predicting cell-type specific proto-oncogene activation by retroviruses.

View Article: PubMed Central - PubMed

Affiliation: Department of Microbiology and Molecular Medicine, University of Geneva, Geneva, Switzerland.

ABSTRACT
Upon cell invasion, retroviruses generate a DNA copy of their RNA genome and integrate retroviral cDNA within host chromosomal DNA. Integration occurs throughout the host cell genome, but target site selection is not random. Each subgroup of retrovirus is distinguished from the others by attraction to particular features on chromosomes. Despite extensive efforts to identify host factors that interact with retrovirion components or chromosome features predictive of integration, little is known about how integration sites are selected. We attempted to identify markers predictive of retroviral integration by exploiting Precision-Recall methods for extracting information from highly skewed datasets to derive robust and discriminating measures of association. ChIPSeq datasets for more than 60 factors were compared with 14 retroviral integration datasets. When compared with MLV, PERV or XMRV integration sites, strong association was observed with STAT1, acetylation of H3 and H4 at several positions, and methylation of H2AZ, H3K4, and K9. By combining peaks from ChIPSeq datasets, a supermarker was identified that localized within 2 kB of 75% of MLV proviruses and detected differences in integration preferences among different cell types. The supermarker predicted the likelihood of integration within specific chromosomal regions in a cell-type specific manner, yielding probabilities for integration into proto-oncogene LMO2 identical to experimentally determined values. The supermarker thus identifies chromosomal features highly favored for retroviral integration, provides clues to the mechanism by which retrovirus integration sites are selected, and offers a tool for predicting cell-type specific proto-oncogene activation by retroviruses.

Show MeSH
Association (F score) between HIV-1 proviruses and two markers as a function of window size in kB.The first marker is H3K4me3 sites in HeLa (green solid line). The second is a virtual marker placed in the promoter region of transcribed genes in HeLa cells (blue dashed line), assuming a uniform distribution of integration sites in transcribed regions. The median length of transcribed genes in HeLa is ∼20kB.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC2991247&req=5

pcbi-1001008-g005: Association (F score) between HIV-1 proviruses and two markers as a function of window size in kB.The first marker is H3K4me3 sites in HeLa (green solid line). The second is a virtual marker placed in the promoter region of transcribed genes in HeLa cells (blue dashed line), assuming a uniform distribution of integration sites in transcribed regions. The median length of transcribed genes in HeLa is ∼20kB.

Mentions: To examine this further, the F score for HIV-1 versus H3K4me3 in HeLa cells was plotted as a function of window size (Figure 5). For comparison, a similar plot was generated for a hypothetical marker at the TSS of transcribed genes in HeLa cells, taking into account the length of these genes, and considering a uniform distribution of proviruses on each gene. For both H3K4me3 and the hypothetical TSS marker, the F score plateaued at a window size of 20 kB, the median gene length. Thus if the window size is large enough to encompass the TSS and half of the gene length, the F score becomes significant. This could explain the window-size dependence of HIV-1 association with H3K4me3.


Deciphering the code for retroviral integration target site selection.

Santoni FA, Hartley O, Luban J - PLoS Comput. Biol. (2010)

Association (F score) between HIV-1 proviruses and two markers as a function of window size in kB.The first marker is H3K4me3 sites in HeLa (green solid line). The second is a virtual marker placed in the promoter region of transcribed genes in HeLa cells (blue dashed line), assuming a uniform distribution of integration sites in transcribed regions. The median length of transcribed genes in HeLa is ∼20kB.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC2991247&req=5

pcbi-1001008-g005: Association (F score) between HIV-1 proviruses and two markers as a function of window size in kB.The first marker is H3K4me3 sites in HeLa (green solid line). The second is a virtual marker placed in the promoter region of transcribed genes in HeLa cells (blue dashed line), assuming a uniform distribution of integration sites in transcribed regions. The median length of transcribed genes in HeLa is ∼20kB.
Mentions: To examine this further, the F score for HIV-1 versus H3K4me3 in HeLa cells was plotted as a function of window size (Figure 5). For comparison, a similar plot was generated for a hypothetical marker at the TSS of transcribed genes in HeLa cells, taking into account the length of these genes, and considering a uniform distribution of proviruses on each gene. For both H3K4me3 and the hypothetical TSS marker, the F score plateaued at a window size of 20 kB, the median gene length. Thus if the window size is large enough to encompass the TSS and half of the gene length, the F score becomes significant. This could explain the window-size dependence of HIV-1 association with H3K4me3.

Bottom Line: ChIPSeq datasets for more than 60 factors were compared with 14 retroviral integration datasets.When compared with MLV, PERV or XMRV integration sites, strong association was observed with STAT1, acetylation of H3 and H4 at several positions, and methylation of H2AZ, H3K4, and K9.The supermarker thus identifies chromosomal features highly favored for retroviral integration, provides clues to the mechanism by which retrovirus integration sites are selected, and offers a tool for predicting cell-type specific proto-oncogene activation by retroviruses.

View Article: PubMed Central - PubMed

Affiliation: Department of Microbiology and Molecular Medicine, University of Geneva, Geneva, Switzerland.

ABSTRACT
Upon cell invasion, retroviruses generate a DNA copy of their RNA genome and integrate retroviral cDNA within host chromosomal DNA. Integration occurs throughout the host cell genome, but target site selection is not random. Each subgroup of retrovirus is distinguished from the others by attraction to particular features on chromosomes. Despite extensive efforts to identify host factors that interact with retrovirion components or chromosome features predictive of integration, little is known about how integration sites are selected. We attempted to identify markers predictive of retroviral integration by exploiting Precision-Recall methods for extracting information from highly skewed datasets to derive robust and discriminating measures of association. ChIPSeq datasets for more than 60 factors were compared with 14 retroviral integration datasets. When compared with MLV, PERV or XMRV integration sites, strong association was observed with STAT1, acetylation of H3 and H4 at several positions, and methylation of H2AZ, H3K4, and K9. By combining peaks from ChIPSeq datasets, a supermarker was identified that localized within 2 kB of 75% of MLV proviruses and detected differences in integration preferences among different cell types. The supermarker predicted the likelihood of integration within specific chromosomal regions in a cell-type specific manner, yielding probabilities for integration into proto-oncogene LMO2 identical to experimentally determined values. The supermarker thus identifies chromosomal features highly favored for retroviral integration, provides clues to the mechanism by which retrovirus integration sites are selected, and offers a tool for predicting cell-type specific proto-oncogene activation by retroviruses.

Show MeSH