Limits...
A systematic characterization of factors that regulate Drosophila segmentation via a bacterial one-hybrid system.

Noyes MB, Meng X, Wakabayashi A, Sinha S, Brodsky MH, Wolfe SA - Nucleic Acids Res. (2008)

Bottom Line: These tools allow specificities for any combination of factors to be used to perform rapid local or genome-wide searches for cis-regulatory modules.The utility of these factor specificities and tools is demonstrated on the well-characterized segmentation network.By incorporating specificity data on an additional 66 factors that we have characterized, our tools utilize approximately 14% of the predicted factors within the fly genome and provide an important new community resource for the identification of cis-regulatory modules.

View Article: PubMed Central - PubMed

Affiliation: Program in Gene Function and Expression, Department of Biochemistry and Molecular Pharmacology, Program in Molecular Medicine, University of Massachusetts Medical School, Worcester, MA 01605, USA.

ABSTRACT
Specificity data for groups of transcription factors (TFs) in a common regulatory network can be used to computationally identify the location of cis-regulatory modules in a genome. The primary limitation for this type of analysis is the paucity of specificity data that is available for the majority of TFs. We describe an omega-based bacterial one-hybrid system that provides a rapid method for characterizing DNA-binding specificities on a genome-wide scale. Using this system, 35 members of the Drosophila melanogaster segmentation network have been characterized, including representative members of all of the major classes of DNA-binding domains. A suite of web-based tools was created that uses this binding site dataset and phylogenetic comparisons to identify cis-regulatory modules throughout the fly genome. These tools allow specificities for any combination of factors to be used to perform rapid local or genome-wide searches for cis-regulatory modules. The utility of these factor specificities and tools is demonstrated on the well-characterized segmentation network. By incorporating specificity data on an additional 66 factors that we have characterized, our tools utilize approximately 14% of the predicted factors within the fly genome and provide an important new community resource for the identification of cis-regulatory modules.

Show MeSH
Genome Surveyor display interface. A 20-kb region surrounding the ‘eve’ locus is displayed. Annotations for the D. melanogaster genome are shown at the top of the browser window. The predicted transcripts and genes in the D. melanogaster genome are indicated within the genomic region. Immediately below is a line indicating the regions where a high confidence alignment with the D. pseudoobsura genome has been assembled onto the melanogaster scaffold. Annotations for identified CRMs [downloaded from REDfly (62)] can also be displayed within this region. The user-configurable tracks for individual factors or groups of factors are displayed below the annotations. Multiple factor combination tracks can be displayed simultaneously. These tracks represent the average of the z-scores for each factor plotted over this genomic region for the combination of TFs selected by the user, where the factors included are indicated above each track (i.e. Kr, Bcd, Hkb, Tll and Hb, which were the anterior factor search set used to generate the list of hits in Table 1). The numbers in the upper left-hand corner indicate the maximum value (z-scores) for each plot, the estimated genome-wide mean and the mean + 2 SD, respectively. The positions of the genome-wide mean and the mean + 2 SD are also indicated on the plot by horizontal lines of the same color that transect the plot. In this view the two combination tracks (red) for the anterior factor search set are shown across D. melanogaster genome (mel) and the average over the D. melanogaster and D. pseudoobsura genomes (melpse). Both of these factor combinations contain a strong peak within the ‘eve’ stripe 1 CRM. Two other Combination tracks for other groups of factors (a different gap set and a pair-rule set) are also shown. These groups display significant peaks that overlap with other CRMs. Below the five Combination tracks are a number of tracks for individual factors. These tracks provide a rapid assessment of the individual factors that are potentially contributing to each combination track. For example, significant peaks for Bcd, Hkb, Kr and Tll all overlap with the stripe 1 CRM (blue box).
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2377422&req=5

Figure 6: Genome Surveyor display interface. A 20-kb region surrounding the ‘eve’ locus is displayed. Annotations for the D. melanogaster genome are shown at the top of the browser window. The predicted transcripts and genes in the D. melanogaster genome are indicated within the genomic region. Immediately below is a line indicating the regions where a high confidence alignment with the D. pseudoobsura genome has been assembled onto the melanogaster scaffold. Annotations for identified CRMs [downloaded from REDfly (62)] can also be displayed within this region. The user-configurable tracks for individual factors or groups of factors are displayed below the annotations. Multiple factor combination tracks can be displayed simultaneously. These tracks represent the average of the z-scores for each factor plotted over this genomic region for the combination of TFs selected by the user, where the factors included are indicated above each track (i.e. Kr, Bcd, Hkb, Tll and Hb, which were the anterior factor search set used to generate the list of hits in Table 1). The numbers in the upper left-hand corner indicate the maximum value (z-scores) for each plot, the estimated genome-wide mean and the mean + 2 SD, respectively. The positions of the genome-wide mean and the mean + 2 SD are also indicated on the plot by horizontal lines of the same color that transect the plot. In this view the two combination tracks (red) for the anterior factor search set are shown across D. melanogaster genome (mel) and the average over the D. melanogaster and D. pseudoobsura genomes (melpse). Both of these factor combinations contain a strong peak within the ‘eve’ stripe 1 CRM. Two other Combination tracks for other groups of factors (a different gap set and a pair-rule set) are also shown. These groups display significant peaks that overlap with other CRMs. Below the five Combination tracks are a number of tracks for individual factors. These tracks provide a rapid assessment of the individual factors that are potentially contributing to each combination track. For example, significant peaks for Bcd, Hkb, Kr and Tll all overlap with the stripe 1 CRM (blue box).

Mentions: We developed a flexible user interface that operates through the GBrowse software package (39) to allow a user to utilize our scoring function and library of PWMs to search for CRMs in the D. melanogaster genome (Figure 6). This interface allows gene-specific browsing or genome-wide searching for CRMs. For gene-specific browsing, tracks that indicate the scores for individual factors, along with their significance values, can be displayed across a genomic region of interest (up to 500 kb). Combination tracks can also be generated to identify peaks of binding site overrepresentation for any collection of factors. For example, in the genomic region surrounding ‘eve’ the tracks for individual maternal and gap factors (e.g. Bcd, Hkb, Hb, Kr and Tll) display small peaks indicating overrepresentation of sites at various positions, but when certain groups of these factors are combined, strong peaks of binding site overrepresentation are evident that correspond to known ‘eve’ pair-rule stripe CRMs (Figure 6). The accuracy of these CRM predictions can be increased by cross-species comparisons to identify peaks that are present in the D. melanogaster genome and in a syntenic region of the D. pseudoobscura genome (18,26). Using our scoring function, the identification of CRMs in a population of intergenic sequences is improved if scores from two genomes are combined (Supplementary Table 3). These comparisons are implemented in Genome Surveyor by calculating z-scores for each TF within the D. pseudoobscura genome and mapping the homologous regions onto the D. melanogaster genome. The Gbrowse window can be used to display individual and combination tracks for TFs in the D. pseudoobscura genome as well as ‘two-species tracks’ that average the z-scores of each factor or group of factors between the two genomes (Figure 6). This cross-species analysis over syntenic windows evaluates the total number of sites in each window, not the conservation of individual sites, as individual sites in a CRM may not be conserved but the entire element should be under stabilizing selection (56). These features allow a user to define significant clusters of binding sites for a group of factors in each genome independently, as well as within both genomes.Figure 6.


A systematic characterization of factors that regulate Drosophila segmentation via a bacterial one-hybrid system.

Noyes MB, Meng X, Wakabayashi A, Sinha S, Brodsky MH, Wolfe SA - Nucleic Acids Res. (2008)

Genome Surveyor display interface. A 20-kb region surrounding the ‘eve’ locus is displayed. Annotations for the D. melanogaster genome are shown at the top of the browser window. The predicted transcripts and genes in the D. melanogaster genome are indicated within the genomic region. Immediately below is a line indicating the regions where a high confidence alignment with the D. pseudoobsura genome has been assembled onto the melanogaster scaffold. Annotations for identified CRMs [downloaded from REDfly (62)] can also be displayed within this region. The user-configurable tracks for individual factors or groups of factors are displayed below the annotations. Multiple factor combination tracks can be displayed simultaneously. These tracks represent the average of the z-scores for each factor plotted over this genomic region for the combination of TFs selected by the user, where the factors included are indicated above each track (i.e. Kr, Bcd, Hkb, Tll and Hb, which were the anterior factor search set used to generate the list of hits in Table 1). The numbers in the upper left-hand corner indicate the maximum value (z-scores) for each plot, the estimated genome-wide mean and the mean + 2 SD, respectively. The positions of the genome-wide mean and the mean + 2 SD are also indicated on the plot by horizontal lines of the same color that transect the plot. In this view the two combination tracks (red) for the anterior factor search set are shown across D. melanogaster genome (mel) and the average over the D. melanogaster and D. pseudoobsura genomes (melpse). Both of these factor combinations contain a strong peak within the ‘eve’ stripe 1 CRM. Two other Combination tracks for other groups of factors (a different gap set and a pair-rule set) are also shown. These groups display significant peaks that overlap with other CRMs. Below the five Combination tracks are a number of tracks for individual factors. These tracks provide a rapid assessment of the individual factors that are potentially contributing to each combination track. For example, significant peaks for Bcd, Hkb, Kr and Tll all overlap with the stripe 1 CRM (blue box).
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2377422&req=5

Figure 6: Genome Surveyor display interface. A 20-kb region surrounding the ‘eve’ locus is displayed. Annotations for the D. melanogaster genome are shown at the top of the browser window. The predicted transcripts and genes in the D. melanogaster genome are indicated within the genomic region. Immediately below is a line indicating the regions where a high confidence alignment with the D. pseudoobsura genome has been assembled onto the melanogaster scaffold. Annotations for identified CRMs [downloaded from REDfly (62)] can also be displayed within this region. The user-configurable tracks for individual factors or groups of factors are displayed below the annotations. Multiple factor combination tracks can be displayed simultaneously. These tracks represent the average of the z-scores for each factor plotted over this genomic region for the combination of TFs selected by the user, where the factors included are indicated above each track (i.e. Kr, Bcd, Hkb, Tll and Hb, which were the anterior factor search set used to generate the list of hits in Table 1). The numbers in the upper left-hand corner indicate the maximum value (z-scores) for each plot, the estimated genome-wide mean and the mean + 2 SD, respectively. The positions of the genome-wide mean and the mean + 2 SD are also indicated on the plot by horizontal lines of the same color that transect the plot. In this view the two combination tracks (red) for the anterior factor search set are shown across D. melanogaster genome (mel) and the average over the D. melanogaster and D. pseudoobsura genomes (melpse). Both of these factor combinations contain a strong peak within the ‘eve’ stripe 1 CRM. Two other Combination tracks for other groups of factors (a different gap set and a pair-rule set) are also shown. These groups display significant peaks that overlap with other CRMs. Below the five Combination tracks are a number of tracks for individual factors. These tracks provide a rapid assessment of the individual factors that are potentially contributing to each combination track. For example, significant peaks for Bcd, Hkb, Kr and Tll all overlap with the stripe 1 CRM (blue box).
Mentions: We developed a flexible user interface that operates through the GBrowse software package (39) to allow a user to utilize our scoring function and library of PWMs to search for CRMs in the D. melanogaster genome (Figure 6). This interface allows gene-specific browsing or genome-wide searching for CRMs. For gene-specific browsing, tracks that indicate the scores for individual factors, along with their significance values, can be displayed across a genomic region of interest (up to 500 kb). Combination tracks can also be generated to identify peaks of binding site overrepresentation for any collection of factors. For example, in the genomic region surrounding ‘eve’ the tracks for individual maternal and gap factors (e.g. Bcd, Hkb, Hb, Kr and Tll) display small peaks indicating overrepresentation of sites at various positions, but when certain groups of these factors are combined, strong peaks of binding site overrepresentation are evident that correspond to known ‘eve’ pair-rule stripe CRMs (Figure 6). The accuracy of these CRM predictions can be increased by cross-species comparisons to identify peaks that are present in the D. melanogaster genome and in a syntenic region of the D. pseudoobscura genome (18,26). Using our scoring function, the identification of CRMs in a population of intergenic sequences is improved if scores from two genomes are combined (Supplementary Table 3). These comparisons are implemented in Genome Surveyor by calculating z-scores for each TF within the D. pseudoobscura genome and mapping the homologous regions onto the D. melanogaster genome. The Gbrowse window can be used to display individual and combination tracks for TFs in the D. pseudoobscura genome as well as ‘two-species tracks’ that average the z-scores of each factor or group of factors between the two genomes (Figure 6). This cross-species analysis over syntenic windows evaluates the total number of sites in each window, not the conservation of individual sites, as individual sites in a CRM may not be conserved but the entire element should be under stabilizing selection (56). These features allow a user to define significant clusters of binding sites for a group of factors in each genome independently, as well as within both genomes.Figure 6.

Bottom Line: These tools allow specificities for any combination of factors to be used to perform rapid local or genome-wide searches for cis-regulatory modules.The utility of these factor specificities and tools is demonstrated on the well-characterized segmentation network.By incorporating specificity data on an additional 66 factors that we have characterized, our tools utilize approximately 14% of the predicted factors within the fly genome and provide an important new community resource for the identification of cis-regulatory modules.

View Article: PubMed Central - PubMed

Affiliation: Program in Gene Function and Expression, Department of Biochemistry and Molecular Pharmacology, Program in Molecular Medicine, University of Massachusetts Medical School, Worcester, MA 01605, USA.

ABSTRACT
Specificity data for groups of transcription factors (TFs) in a common regulatory network can be used to computationally identify the location of cis-regulatory modules in a genome. The primary limitation for this type of analysis is the paucity of specificity data that is available for the majority of TFs. We describe an omega-based bacterial one-hybrid system that provides a rapid method for characterizing DNA-binding specificities on a genome-wide scale. Using this system, 35 members of the Drosophila melanogaster segmentation network have been characterized, including representative members of all of the major classes of DNA-binding domains. A suite of web-based tools was created that uses this binding site dataset and phylogenetic comparisons to identify cis-regulatory modules throughout the fly genome. These tools allow specificities for any combination of factors to be used to perform rapid local or genome-wide searches for cis-regulatory modules. The utility of these factor specificities and tools is demonstrated on the well-characterized segmentation network. By incorporating specificity data on an additional 66 factors that we have characterized, our tools utilize approximately 14% of the predicted factors within the fly genome and provide an important new community resource for the identification of cis-regulatory modules.

Show MeSH