Limits...
A note on the false discovery rate of novel peptides in proteogenomics.

Zhang K, Fu Y, Zeng WF, He K, Chi H, Liu C, Li YC, Gao Y, Xu P, He SM - Bioinformatics (2015)

Bottom Line: However, it has been found that the actual level of false positives in novel peptides is often out of control and behaves differently for different genomes.To quantitatively model this problem, we theoretically analyze the subgroup false discovery rates of annotated and novel peptides.Our analysis shows that the annotation completeness ratio of a genome is the dominant factor influencing the subgroup FDR of novel peptides.

View Article: PubMed Central - PubMed

Affiliation: Key Lab of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing 100190, University of Chinese Academy of Sciences, Beijing 100049.

No MeSH data available.


Related in: MedlinePlus

Simulation results on the E.coli and M.tuberculosis datasets. To simulate partial annotation, we randomly removed some annotated genes from the database. Gene sampling was performed on the basis of θ, with a step of 0.1 from 0 to 1, and in addition, 0.95 and 0.99 were also appended. (A) The experimental  obtained on the E.coli dataset as shown by red crosses fits well with the theoretical value (blue line). The deduced values for θ were approximately identical to the sampled ones, as shown by magenta triangles on the diagonal line. (B) On the M.tuberculosis dataset, genes were sampled 10 times for each value of θ. The experimental  values as shown by red boxes fit well with the theoretical values (blue line) when θ is less than 0.9. As truly novel peptides may exist, the experimental  diverges from the theoretical counterpart. The experimental  is 0.69 when sampled θ = 1, and the deduced θ is 0.996 correspondingly. However, all deduced values for θ still match the sampled ones (green box), since the annotation completeness ratio is very close to 1.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4595894&req=5

btv340-F2: Simulation results on the E.coli and M.tuberculosis datasets. To simulate partial annotation, we randomly removed some annotated genes from the database. Gene sampling was performed on the basis of θ, with a step of 0.1 from 0 to 1, and in addition, 0.95 and 0.99 were also appended. (A) The experimental obtained on the E.coli dataset as shown by red crosses fits well with the theoretical value (blue line). The deduced values for θ were approximately identical to the sampled ones, as shown by magenta triangles on the diagonal line. (B) On the M.tuberculosis dataset, genes were sampled 10 times for each value of θ. The experimental values as shown by red boxes fit well with the theoretical values (blue line) when θ is less than 0.9. As truly novel peptides may exist, the experimental diverges from the theoretical counterpart. The experimental is 0.69 when sampled θ = 1, and the deduced θ is 0.996 correspondingly. However, all deduced values for θ still match the sampled ones (green box), since the annotation completeness ratio is very close to 1.

Mentions: To simulate θ, we randomly removed some genes from the annotated database to vary the ratio of their summed length to all genes. Using the same global FDR threshold of 1%, we obtained the numbers of targets and decoys separately on annotated and novel peptides, allowing us to calculate the experimental and . Given θ and μ, the theoretical and were also calculated through Equations (2) and (3). Our simulation showed that, the experimental were close to the theoretical value of 1.5‰, with a minimum of 2.1‰ and a maximum of 3.6‰. Moreover, the experimental fits well with the theoretical counterpart, as shown in Figure 2A. On the other hand, we can deduce the value of θ from the experimental based on Equation (3). As shown in Figure 2, the pairs of sampled and deduced values of θ distribute diagonally, indicating that the deduced θ could be used as an estimate of the real annotation completeness ratio.Fig. 2.


A note on the false discovery rate of novel peptides in proteogenomics.

Zhang K, Fu Y, Zeng WF, He K, Chi H, Liu C, Li YC, Gao Y, Xu P, He SM - Bioinformatics (2015)

Simulation results on the E.coli and M.tuberculosis datasets. To simulate partial annotation, we randomly removed some annotated genes from the database. Gene sampling was performed on the basis of θ, with a step of 0.1 from 0 to 1, and in addition, 0.95 and 0.99 were also appended. (A) The experimental  obtained on the E.coli dataset as shown by red crosses fits well with the theoretical value (blue line). The deduced values for θ were approximately identical to the sampled ones, as shown by magenta triangles on the diagonal line. (B) On the M.tuberculosis dataset, genes were sampled 10 times for each value of θ. The experimental  values as shown by red boxes fit well with the theoretical values (blue line) when θ is less than 0.9. As truly novel peptides may exist, the experimental  diverges from the theoretical counterpart. The experimental  is 0.69 when sampled θ = 1, and the deduced θ is 0.996 correspondingly. However, all deduced values for θ still match the sampled ones (green box), since the annotation completeness ratio is very close to 1.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4595894&req=5

btv340-F2: Simulation results on the E.coli and M.tuberculosis datasets. To simulate partial annotation, we randomly removed some annotated genes from the database. Gene sampling was performed on the basis of θ, with a step of 0.1 from 0 to 1, and in addition, 0.95 and 0.99 were also appended. (A) The experimental obtained on the E.coli dataset as shown by red crosses fits well with the theoretical value (blue line). The deduced values for θ were approximately identical to the sampled ones, as shown by magenta triangles on the diagonal line. (B) On the M.tuberculosis dataset, genes were sampled 10 times for each value of θ. The experimental values as shown by red boxes fit well with the theoretical values (blue line) when θ is less than 0.9. As truly novel peptides may exist, the experimental diverges from the theoretical counterpart. The experimental is 0.69 when sampled θ = 1, and the deduced θ is 0.996 correspondingly. However, all deduced values for θ still match the sampled ones (green box), since the annotation completeness ratio is very close to 1.
Mentions: To simulate θ, we randomly removed some genes from the annotated database to vary the ratio of their summed length to all genes. Using the same global FDR threshold of 1%, we obtained the numbers of targets and decoys separately on annotated and novel peptides, allowing us to calculate the experimental and . Given θ and μ, the theoretical and were also calculated through Equations (2) and (3). Our simulation showed that, the experimental were close to the theoretical value of 1.5‰, with a minimum of 2.1‰ and a maximum of 3.6‰. Moreover, the experimental fits well with the theoretical counterpart, as shown in Figure 2A. On the other hand, we can deduce the value of θ from the experimental based on Equation (3). As shown in Figure 2, the pairs of sampled and deduced values of θ distribute diagonally, indicating that the deduced θ could be used as an estimate of the real annotation completeness ratio.Fig. 2.

Bottom Line: However, it has been found that the actual level of false positives in novel peptides is often out of control and behaves differently for different genomes.To quantitatively model this problem, we theoretically analyze the subgroup false discovery rates of annotated and novel peptides.Our analysis shows that the annotation completeness ratio of a genome is the dominant factor influencing the subgroup FDR of novel peptides.

View Article: PubMed Central - PubMed

Affiliation: Key Lab of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing 100190, University of Chinese Academy of Sciences, Beijing 100049.

No MeSH data available.


Related in: MedlinePlus