Limits...
Universal database search tool for proteomics

View Article: PubMed Central - PubMed

ABSTRACT

Mass spectrometry (MS) instruments and experimental protocols are rapidly advancing, but the software tools to analyze tandem mass spectra are lagging behind. We present a database search tool MS-GF+ that is sensitive (it identifies more peptides than most other database search tools) and universal (it works well for diverse types of spectra, different configurations of MS instruments and different experimental protocols). We benchmark MS-GF+ using diverse spectral datasets: (i) spectra of varying fragmentation methods; (ii) spectra of multiple enzyme digests; (iii) spectra of phosphorylated peptides; (iv) spectra of peptides with unusual fragmentation propensities produced by a novel alpha-lytic protease. For all these datasets, MS-GF+ significantly increases the number of identified peptides compared to commonly used methods for peptide identifications. We emphasize that while MS-GF+ is not specifically designed for any particular experimental set-up, it improves upon the performance of tools specifically designed for these applications (e.g., specialized tools for phosphoproteomics).

No MeSH data available.


Illustration of the MS-GF+ Directed Acyclic Graph (DAG) scoring. The peptide ABAA is converted into its Boolean string P = 010010101 and the spectrum S is converted into a labeled DAG G as described in the text. The number in the vertex represents its label. The color of the edge represents its label (0 for grey and 1 for black). The vertex i is colored depending on the peptide character i (white for 0 and black for 1). We also color vertex 0 as black. The procedure to compute Score(P, G) is illustrated. All edges are partitioned into 8 classes depending on si,j, pi, and pj. For example, there are 5 edges with si,j = pi = pj = 0.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC5036525&req=5

Figure 5: Illustration of the MS-GF+ Directed Acyclic Graph (DAG) scoring. The peptide ABAA is converted into its Boolean string P = 010010101 and the spectrum S is converted into a labeled DAG G as described in the text. The number in the vertex represents its label. The color of the edge represents its label (0 for grey and 1 for black). The vertex i is colored depending on the peptide character i (white for 0 and black for 1). We also color vertex 0 as black. The procedure to compute Score(P, G) is illustrated. All edges are partitioned into 8 classes depending on si,j, pi, and pj. For example, there are 5 edges with si,j = pi = pj = 0.

Mentions: When applying this model for scoring a peptide P and a DAG G, we consider a test comparing two hypotheses: one assuming G is generated by P and the other assuming G is generated by an “empty” string consisting of all zeros (denoted by O). The log-likelihood score of (P, G) (denoted Score(P, G)) is defined as follows (see Figure 5 for an example): (2)Score(P,G)=logProb(G/P)Prob(G/O)=log∏i∈VProb(si/pi)⋅∏(i.j)∈EProb(si,j/pi,pj)∏i∈VProb(si/0)⋅∏(i.j)∈EProb(si,j/0,0)=∑i∈VlogProb(si/pi)Prob(si/0)+∑(i,j)∈ElogProb(si,j/pi,pj)Prob(si,j/0,0)≈∑i∈{i/i∈V,pi=1}logProb(si/1)Prob(si/0)︸VertexScore(i)︸vertexscoring+∑(i,j)∈{(i,j)/(i,j)∈E,pi=1,pj=1}logProb(si,j/1,1)Prob(si,j/0,0)︸EdgeScore(i,j)︸edge scoringNote that the last equation assumes that only the edges (i, j) with pi = pj = 1 contribute to the edge scoring because β1≈ β2≈ β3.


Universal database search tool for proteomics
Illustration of the MS-GF+ Directed Acyclic Graph (DAG) scoring. The peptide ABAA is converted into its Boolean string P = 010010101 and the spectrum S is converted into a labeled DAG G as described in the text. The number in the vertex represents its label. The color of the edge represents its label (0 for grey and 1 for black). The vertex i is colored depending on the peptide character i (white for 0 and black for 1). We also color vertex 0 as black. The procedure to compute Score(P, G) is illustrated. All edges are partitioned into 8 classes depending on si,j, pi, and pj. For example, there are 5 edges with si,j = pi = pj = 0.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC5036525&req=5

Figure 5: Illustration of the MS-GF+ Directed Acyclic Graph (DAG) scoring. The peptide ABAA is converted into its Boolean string P = 010010101 and the spectrum S is converted into a labeled DAG G as described in the text. The number in the vertex represents its label. The color of the edge represents its label (0 for grey and 1 for black). The vertex i is colored depending on the peptide character i (white for 0 and black for 1). We also color vertex 0 as black. The procedure to compute Score(P, G) is illustrated. All edges are partitioned into 8 classes depending on si,j, pi, and pj. For example, there are 5 edges with si,j = pi = pj = 0.
Mentions: When applying this model for scoring a peptide P and a DAG G, we consider a test comparing two hypotheses: one assuming G is generated by P and the other assuming G is generated by an “empty” string consisting of all zeros (denoted by O). The log-likelihood score of (P, G) (denoted Score(P, G)) is defined as follows (see Figure 5 for an example): (2)Score(P,G)=logProb(G/P)Prob(G/O)=log∏i∈VProb(si/pi)⋅∏(i.j)∈EProb(si,j/pi,pj)∏i∈VProb(si/0)⋅∏(i.j)∈EProb(si,j/0,0)=∑i∈VlogProb(si/pi)Prob(si/0)+∑(i,j)∈ElogProb(si,j/pi,pj)Prob(si,j/0,0)≈∑i∈{i/i∈V,pi=1}logProb(si/1)Prob(si/0)︸VertexScore(i)︸vertexscoring+∑(i,j)∈{(i,j)/(i,j)∈E,pi=1,pj=1}logProb(si,j/1,1)Prob(si,j/0,0)︸EdgeScore(i,j)︸edge scoringNote that the last equation assumes that only the edges (i, j) with pi = pj = 1 contribute to the edge scoring because β1≈ β2≈ β3.

View Article: PubMed Central - PubMed

ABSTRACT

Mass spectrometry (MS) instruments and experimental protocols are rapidly advancing, but the software tools to analyze tandem mass spectra are lagging behind. We present a database search tool MS-GF+ that is sensitive (it identifies more peptides than most other database search tools) and universal (it works well for diverse types of spectra, different configurations of MS instruments and different experimental protocols). We benchmark MS-GF+ using diverse spectral datasets: (i) spectra of varying fragmentation methods; (ii) spectra of multiple enzyme digests; (iii) spectra of phosphorylated peptides; (iv) spectra of peptides with unusual fragmentation propensities produced by a novel alpha-lytic protease. For all these datasets, MS-GF+ significantly increases the number of identified peptides compared to commonly used methods for peptide identifications. We emphasize that while MS-GF+ is not specifically designed for any particular experimental set-up, it improves upon the performance of tools specifically designed for these applications (e.g., specialized tools for phosphoproteomics).

No MeSH data available.