Limits...
Generating quantitative models describing the sequence specificity of biological processes with the stabilized matrix method.

Peters B, Sette A - BMC Bioinformatics (2005)

Bottom Line: First, they can provide a summary of experimental results, allowing for a deeper understanding of the mechanisms involved in sequence recognition.This method has been successfully applied to predicting peptide binding to MHC molecules, peptide transport by the transporter associated with antigen presentation (TAP) and proteasomal cleavage of protein sequences.Making the SMM method publicly available enables bioinformaticians and experimental biologists to easily access it, to compare its performance to other prediction methods, and to extend it to other applications.

View Article: PubMed Central - HTML - PubMed

Affiliation: La Jolla Institute for Allergy and Immunology, 3030 Bunker Hill Street, Suite 326, San Diego, CA 92109, USA. bjoern_peters@gmx.net

ABSTRACT

Background: Many processes in molecular biology involve the recognition of short sequences of nucleic-or amino acids, such as the binding of immunogenic peptides to major histocompatibility complex (MHC) molecules. From experimental data, a model of the sequence specificity of these processes can be constructed, such as a sequence motif, a scoring matrix or an artificial neural network. The purpose of these models is two-fold. First, they can provide a summary of experimental results, allowing for a deeper understanding of the mechanisms involved in sequence recognition. Second, such models can be used to predict the experimental outcome for yet untested sequences. In the past we reported the development of a method to generate such models called the Stabilized Matrix Method (SMM). This method has been successfully applied to predicting peptide binding to MHC molecules, peptide transport by the transporter associated with antigen presentation (TAP) and proteasomal cleavage of protein sequences.

Results: Herein we report the implementation of the SMM algorithm as a publicly available software package. Specific features determining the type of problems the method is most appropriate for are discussed. Advantageous features of the package are: (1) the output generated is easy to interpret, (2) input and output are both quantitative, (3) specific computational strategies to handle experimental noise are built in, (4) the algorithm is designed to effectively handle bounded experimental data, (5) experimental data from randomized peptide libraries and conventional peptides can easily be combined, and (6) it is possible to incorporate pair interactions between positions of a sequence.

Conclusion: Making the SMM method publicly available enables bioinformaticians and experimental biologists to easily access it, to compare its performance to other prediction methods, and to extend it to other applications.

Show MeSH
Input training data. The <TrainingData> element consists of a series of <DataPoints> (2). Each contains a sequence and a measurement value. The characters allowed in <Sequence> are specified in <Alphabet> (1), and the number of characters has to correspond to <SequenceLength> (1). In this example, <Alphabet> and <SequenceLength> specify 8-mer peptides in single letter amino acid code. Each measurement can optionally be associated with a threshold (3) that can either be <Greater> or <Lesser>, signaling that the measurement corresponds to an upper or lower boundary of measurable values.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC1173087&req=5

Figure 1: Input training data. The <TrainingData> element consists of a series of <DataPoints> (2). Each contains a sequence and a measurement value. The characters allowed in <Sequence> are specified in <Alphabet> (1), and the number of characters has to correspond to <SequenceLength> (1). In this example, <Alphabet> and <SequenceLength> specify 8-mer peptides in single letter amino acid code. Each measurement can optionally be associated with a threshold (3) that can either be <Greater> or <Lesser>, signaling that the measurement corresponds to an upper or lower boundary of measurable values.

Mentions: The program expects an XML document as its standard input. A more sophisticated (graphical) user interface is not likely to be of great interest for the projected user community, most of which will probably call the executable file from a script or integrate the code into an existing program. The XML input document contains either training data to generate a new tool (Figure 1) or sequences for which a prediction should be made with a previously generated tool. The output of the program is again in XML format. For all types of input and output, XML schemas defining their exact structure are supplied. The schemas contain annotation for each input element, documenting their intended use. A simplified alternative input format exists for the most common application, namely the generation of a scoring matrix from a set of standard amino acid sequences. Several example input files are supplied with the program, which should make it easy for users to create similar input files with their own data.


Generating quantitative models describing the sequence specificity of biological processes with the stabilized matrix method.

Peters B, Sette A - BMC Bioinformatics (2005)

Input training data. The <TrainingData> element consists of a series of <DataPoints> (2). Each contains a sequence and a measurement value. The characters allowed in <Sequence> are specified in <Alphabet> (1), and the number of characters has to correspond to <SequenceLength> (1). In this example, <Alphabet> and <SequenceLength> specify 8-mer peptides in single letter amino acid code. Each measurement can optionally be associated with a threshold (3) that can either be <Greater> or <Lesser>, signaling that the measurement corresponds to an upper or lower boundary of measurable values.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC1173087&req=5

Figure 1: Input training data. The <TrainingData> element consists of a series of <DataPoints> (2). Each contains a sequence and a measurement value. The characters allowed in <Sequence> are specified in <Alphabet> (1), and the number of characters has to correspond to <SequenceLength> (1). In this example, <Alphabet> and <SequenceLength> specify 8-mer peptides in single letter amino acid code. Each measurement can optionally be associated with a threshold (3) that can either be <Greater> or <Lesser>, signaling that the measurement corresponds to an upper or lower boundary of measurable values.
Mentions: The program expects an XML document as its standard input. A more sophisticated (graphical) user interface is not likely to be of great interest for the projected user community, most of which will probably call the executable file from a script or integrate the code into an existing program. The XML input document contains either training data to generate a new tool (Figure 1) or sequences for which a prediction should be made with a previously generated tool. The output of the program is again in XML format. For all types of input and output, XML schemas defining their exact structure are supplied. The schemas contain annotation for each input element, documenting their intended use. A simplified alternative input format exists for the most common application, namely the generation of a scoring matrix from a set of standard amino acid sequences. Several example input files are supplied with the program, which should make it easy for users to create similar input files with their own data.

Bottom Line: First, they can provide a summary of experimental results, allowing for a deeper understanding of the mechanisms involved in sequence recognition.This method has been successfully applied to predicting peptide binding to MHC molecules, peptide transport by the transporter associated with antigen presentation (TAP) and proteasomal cleavage of protein sequences.Making the SMM method publicly available enables bioinformaticians and experimental biologists to easily access it, to compare its performance to other prediction methods, and to extend it to other applications.

View Article: PubMed Central - HTML - PubMed

Affiliation: La Jolla Institute for Allergy and Immunology, 3030 Bunker Hill Street, Suite 326, San Diego, CA 92109, USA. bjoern_peters@gmx.net

ABSTRACT

Background: Many processes in molecular biology involve the recognition of short sequences of nucleic-or amino acids, such as the binding of immunogenic peptides to major histocompatibility complex (MHC) molecules. From experimental data, a model of the sequence specificity of these processes can be constructed, such as a sequence motif, a scoring matrix or an artificial neural network. The purpose of these models is two-fold. First, they can provide a summary of experimental results, allowing for a deeper understanding of the mechanisms involved in sequence recognition. Second, such models can be used to predict the experimental outcome for yet untested sequences. In the past we reported the development of a method to generate such models called the Stabilized Matrix Method (SMM). This method has been successfully applied to predicting peptide binding to MHC molecules, peptide transport by the transporter associated with antigen presentation (TAP) and proteasomal cleavage of protein sequences.

Results: Herein we report the implementation of the SMM algorithm as a publicly available software package. Specific features determining the type of problems the method is most appropriate for are discussed. Advantageous features of the package are: (1) the output generated is easy to interpret, (2) input and output are both quantitative, (3) specific computational strategies to handle experimental noise are built in, (4) the algorithm is designed to effectively handle bounded experimental data, (5) experimental data from randomized peptide libraries and conventional peptides can easily be combined, and (6) it is possible to incorporate pair interactions between positions of a sequence.

Conclusion: Making the SMM method publicly available enables bioinformaticians and experimental biologists to easily access it, to compare its performance to other prediction methods, and to extend it to other applications.

Show MeSH