Limits...
UniPROBE, update 2015: new tools and content for the online database of protein-binding microarray data on protein-DNA interactions.

Hume MA, Barrera LA, Gisselbrecht SS, Bulyk ML - Nucleic Acids Res. (2014)

Bottom Line: The Universal PBM Resource for Oligonucleotide Binding Evaluation (UniPROBE) serves as a convenient source of information on published data generated using universal protein-binding microarray (PBM) technology, which provides in vitro data about the relative DNA-binding preferences of transcription factors for all possible sequence variants of a length k ('k-mers').The database displays important information about the proteins and displays their DNA-binding specificity data in terms of k-mers, position weight matrices and graphical sequence logos.This update to the database documents the growth of UniPROBE since the last update 4 years ago, and introduces a variety of new features and tools, including a new streamlined pipeline that facilitates data deposition by universal PBM data generators in the research community, a tool that generates putative nonbinding (i.e. negative control) DNA sequences for one or more proteins and novel motifs obtained by analyzing the PBM data using the BEEML-PBM algorithm for motif inference.

View Article: PubMed Central - PubMed

Affiliation: Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA 02115, USA Bioinformatics Graduate Program, Northeastern University, Boston, MA 02115, USA.

Show MeSH
Seed-and-Wobble and BEEML-PBM motif displays. Examples of displays for data generated using the (A) Seed-and-Wobble and (B) BEEML-PBM algorithms for the Erg protein, from Wei et al., 2010 (31). (A) The Seed-and-Wobble data displays a sequence logo, links for downloading the PWM data and the top-scoring k-mer along with its PBM enrichment score. (B) The BEEML-PBM data display format is essentially the same, but because k-mers and enrichment scores are not utilized in this algorithm, an IUPAC consensus sequence derived from the PWM is instead displayed above the motif. The reverse complement sequence orientation can be displayed for either data set individually by clicking the appropriate button; this changes the logo, the PWM file link and the displayed sequence. Assignment of ‘forward’ versus ‘reverse complement’ orientation is arbitrary for each PWM—here, the BEEML-PBM data have been switched to ‘reverse complement’ mode in order to display a more obvious comparison between the logos, since its ‘forward’ orientation happens to correspond more closely to the Seed-and-Wobble data's ‘reverse complement’ orientation.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4383892&req=5

Figure 2: Seed-and-Wobble and BEEML-PBM motif displays. Examples of displays for data generated using the (A) Seed-and-Wobble and (B) BEEML-PBM algorithms for the Erg protein, from Wei et al., 2010 (31). (A) The Seed-and-Wobble data displays a sequence logo, links for downloading the PWM data and the top-scoring k-mer along with its PBM enrichment score. (B) The BEEML-PBM data display format is essentially the same, but because k-mers and enrichment scores are not utilized in this algorithm, an IUPAC consensus sequence derived from the PWM is instead displayed above the motif. The reverse complement sequence orientation can be displayed for either data set individually by clicking the appropriate button; this changes the logo, the PWM file link and the displayed sequence. Assignment of ‘forward’ versus ‘reverse complement’ orientation is arbitrary for each PWM—here, the BEEML-PBM data have been switched to ‘reverse complement’ mode in order to display a more obvious comparison between the logos, since its ‘forward’ orientation happens to correspond more closely to the Seed-and-Wobble data's ‘reverse complement’ orientation.

Mentions: All of the raw PBM data posted in UniPROBE until recently have been handled in the same manner: the Seed-and-Wobble algorithm, introduced jointly with universal PBM technology (1,22), is used to generate a position weight matrix (PWM) (23,24), which in turn is used to generate sequence logos (25) that are displayed on the protein's Details page (e.g. see Figure 2A). Since the development of universal PBM technology, other algorithms have been developed to derive PWMs from the PBM data. BEEML-PBM employs a maximum likelihood approach, using a weighted nonlinear least-squares regression to infer free energy parameters for TF–DNA interactions (4). BEEML-PBM was one of the top two algorithms in the DREAM5 challenge (18) and provided PWMs with better performance than Seed-and-Wobble for the majority of TFs. We have generated PWMs using BEEML-PBM for the PBM data from all publications whose data have been incorporated into UniPROBE, including those mentioned in this paper (1,5–16,26–32). The free energy parameters derived from BEEML-PBM were converted into PWM frequencies by applying a Boltzmann distribution probability mass function to each matrix column. Figure 2 shows an example of Seed-and-Wobble and BEEML-PBM logos in UniPROBE. All of the new logos are currently viewable on the appropriate protein pages and the PWMs are available for download either individually on these pages or in bulk on the Downloads page.


UniPROBE, update 2015: new tools and content for the online database of protein-binding microarray data on protein-DNA interactions.

Hume MA, Barrera LA, Gisselbrecht SS, Bulyk ML - Nucleic Acids Res. (2014)

Seed-and-Wobble and BEEML-PBM motif displays. Examples of displays for data generated using the (A) Seed-and-Wobble and (B) BEEML-PBM algorithms for the Erg protein, from Wei et al., 2010 (31). (A) The Seed-and-Wobble data displays a sequence logo, links for downloading the PWM data and the top-scoring k-mer along with its PBM enrichment score. (B) The BEEML-PBM data display format is essentially the same, but because k-mers and enrichment scores are not utilized in this algorithm, an IUPAC consensus sequence derived from the PWM is instead displayed above the motif. The reverse complement sequence orientation can be displayed for either data set individually by clicking the appropriate button; this changes the logo, the PWM file link and the displayed sequence. Assignment of ‘forward’ versus ‘reverse complement’ orientation is arbitrary for each PWM—here, the BEEML-PBM data have been switched to ‘reverse complement’ mode in order to display a more obvious comparison between the logos, since its ‘forward’ orientation happens to correspond more closely to the Seed-and-Wobble data's ‘reverse complement’ orientation.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4383892&req=5

Figure 2: Seed-and-Wobble and BEEML-PBM motif displays. Examples of displays for data generated using the (A) Seed-and-Wobble and (B) BEEML-PBM algorithms for the Erg protein, from Wei et al., 2010 (31). (A) The Seed-and-Wobble data displays a sequence logo, links for downloading the PWM data and the top-scoring k-mer along with its PBM enrichment score. (B) The BEEML-PBM data display format is essentially the same, but because k-mers and enrichment scores are not utilized in this algorithm, an IUPAC consensus sequence derived from the PWM is instead displayed above the motif. The reverse complement sequence orientation can be displayed for either data set individually by clicking the appropriate button; this changes the logo, the PWM file link and the displayed sequence. Assignment of ‘forward’ versus ‘reverse complement’ orientation is arbitrary for each PWM—here, the BEEML-PBM data have been switched to ‘reverse complement’ mode in order to display a more obvious comparison between the logos, since its ‘forward’ orientation happens to correspond more closely to the Seed-and-Wobble data's ‘reverse complement’ orientation.
Mentions: All of the raw PBM data posted in UniPROBE until recently have been handled in the same manner: the Seed-and-Wobble algorithm, introduced jointly with universal PBM technology (1,22), is used to generate a position weight matrix (PWM) (23,24), which in turn is used to generate sequence logos (25) that are displayed on the protein's Details page (e.g. see Figure 2A). Since the development of universal PBM technology, other algorithms have been developed to derive PWMs from the PBM data. BEEML-PBM employs a maximum likelihood approach, using a weighted nonlinear least-squares regression to infer free energy parameters for TF–DNA interactions (4). BEEML-PBM was one of the top two algorithms in the DREAM5 challenge (18) and provided PWMs with better performance than Seed-and-Wobble for the majority of TFs. We have generated PWMs using BEEML-PBM for the PBM data from all publications whose data have been incorporated into UniPROBE, including those mentioned in this paper (1,5–16,26–32). The free energy parameters derived from BEEML-PBM were converted into PWM frequencies by applying a Boltzmann distribution probability mass function to each matrix column. Figure 2 shows an example of Seed-and-Wobble and BEEML-PBM logos in UniPROBE. All of the new logos are currently viewable on the appropriate protein pages and the PWMs are available for download either individually on these pages or in bulk on the Downloads page.

Bottom Line: The Universal PBM Resource for Oligonucleotide Binding Evaluation (UniPROBE) serves as a convenient source of information on published data generated using universal protein-binding microarray (PBM) technology, which provides in vitro data about the relative DNA-binding preferences of transcription factors for all possible sequence variants of a length k ('k-mers').The database displays important information about the proteins and displays their DNA-binding specificity data in terms of k-mers, position weight matrices and graphical sequence logos.This update to the database documents the growth of UniPROBE since the last update 4 years ago, and introduces a variety of new features and tools, including a new streamlined pipeline that facilitates data deposition by universal PBM data generators in the research community, a tool that generates putative nonbinding (i.e. negative control) DNA sequences for one or more proteins and novel motifs obtained by analyzing the PBM data using the BEEML-PBM algorithm for motif inference.

View Article: PubMed Central - PubMed

Affiliation: Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA 02115, USA Bioinformatics Graduate Program, Northeastern University, Boston, MA 02115, USA.

Show MeSH