UniPROBE, update 2015: new tools and content for the online database of protein-binding microarray data on protein-DNA interactions.
Bottom Line: The Universal PBM Resource for Oligonucleotide Binding Evaluation (UniPROBE) serves as a convenient source of information on published data generated using universal protein-binding microarray (PBM) technology, which provides in vitro data about the relative DNA-binding preferences of transcription factors for all possible sequence variants of a length k ('k-mers').The database displays important information about the proteins and displays their DNA-binding specificity data in terms of k-mers, position weight matrices and graphical sequence logos.This update to the database documents the growth of UniPROBE since the last update 4 years ago, and introduces a variety of new features and tools, including a new streamlined pipeline that facilitates data deposition by universal PBM data generators in the research community, a tool that generates putative nonbinding (i.e. negative control) DNA sequences for one or more proteins and novel motifs obtained by analyzing the PBM data using the BEEML-PBM algorithm for motif inference.
Affiliation: Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA 02115, USA Bioinformatics Graduate Program, Northeastern University, Boston, MA 02115, USA.Show MeSH
Mentions: All of the raw PBM data posted in UniPROBE until recently have been handled in the same manner: the Seed-and-Wobble algorithm, introduced jointly with universal PBM technology (1,22), is used to generate a position weight matrix (PWM) (23,24), which in turn is used to generate sequence logos (25) that are displayed on the protein's Details page (e.g. see Figure 2A). Since the development of universal PBM technology, other algorithms have been developed to derive PWMs from the PBM data. BEEML-PBM employs a maximum likelihood approach, using a weighted nonlinear least-squares regression to infer free energy parameters for TF–DNA interactions (4). BEEML-PBM was one of the top two algorithms in the DREAM5 challenge (18) and provided PWMs with better performance than Seed-and-Wobble for the majority of TFs. We have generated PWMs using BEEML-PBM for the PBM data from all publications whose data have been incorporated into UniPROBE, including those mentioned in this paper (1,5–16,26–32). The free energy parameters derived from BEEML-PBM were converted into PWM frequencies by applying a Boltzmann distribution probability mass function to each matrix column. Figure 2 shows an example of Seed-and-Wobble and BEEML-PBM logos in UniPROBE. All of the new logos are currently viewable on the appropriate protein pages and the PWMs are available for download either individually on these pages or in bulk on the Downloads page.
Affiliation: Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA 02115, USA Bioinformatics Graduate Program, Northeastern University, Boston, MA 02115, USA.