Limits...
PanOCT: automated clustering of orthologs using conserved gene neighborhood for pan-genomic analysis of bacterial strains and closely related species.

Fouts DE, Brinkac L, Beck E, Inman J, Sutton G - Nucleic Acids Res. (2012)

Bottom Line: Pan-genome ortholog clustering tool (PanOCT) is a tool for pan-genomic analysis of closely related prokaryotic species or strains.The results from PanOCT and three commonly used graph-based ortholog-finding programs were compared using a set of four publicly available strains of the same bacterial species.The clusters that did not agree were inspected for evidence of correctness resulting in 85 high-confidence manually curated clusters that were used to compare all four methods.

View Article: PubMed Central - PubMed

Affiliation: J. Craig Venter Institute, 9704 Medical Center Drive, Rockville, MD 20850, USA. dfouts@jcvi.org

ABSTRACT
Pan-genome ortholog clustering tool (PanOCT) is a tool for pan-genomic analysis of closely related prokaryotic species or strains. PanOCT uses conserved gene neighborhood information to separate recently diverged paralogs into orthologous clusters where homology-only clustering methods cannot. The results from PanOCT and three commonly used graph-based ortholog-finding programs were compared using a set of four publicly available strains of the same bacterial species. All four methods agreed on ∼70% of the clusters and ∼86% of the proteins. The clusters that did not agree were inspected for evidence of correctness resulting in 85 high-confidence manually curated clusters that were used to compare all four methods.

Show MeSH
Agreement/disagreement between how proteins are clustered by the four methods for the entire set of clusters (A) and for the 85 manually curated clusters (B). The number of proteins (A) or clusters (B) that are in agreement for each possible subset of the four methods is graphed. Each subset pattern is indicated with shaded boxes for agreement and open boxes for disagreement. For example, when there are two shaded boxes and two open boxes the two shaded methods agree and the two open methods disagree with all three other methods; diagonal lines in a box indicate that while the two methods with diagonal lines disagree with the two shaded methods they agree with each other.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3526259&req=5

gks757-F4: Agreement/disagreement between how proteins are clustered by the four methods for the entire set of clusters (A) and for the 85 manually curated clusters (B). The number of proteins (A) or clusters (B) that are in agreement for each possible subset of the four methods is graphed. Each subset pattern is indicated with shaded boxes for agreement and open boxes for disagreement. For example, when there are two shaded boxes and two open boxes the two shaded methods agree and the two open methods disagree with all three other methods; diagonal lines in a box indicate that while the two methods with diagonal lines disagree with the two shaded methods they agree with each other.

Mentions: Except when all four methods agree, it is hard to directly compare clusters. This is because members of a single cluster from one clustering method could correspond to multiple clusters from another method, which may in turn correspond to different clusters from the original method. Therefore, instead of comparing clusters to evaluate the results of each clustering method, the cluster membership for each protein was evaluated. For each protein, two methods agreed if the protein was included in clusters with identical membership and disagreed otherwise. Of 6710 total non-redundant clusters containing 15 180 proteins, all four methods agreed for 86% of proteins (13 041) in 69% of the clusters (4631; Figure 4A). Three methods agreed and one disagreed: PanOCT, InParanoid and Sybil agreed for 4% of proteins; PanOCT, OrthoMCL and Sybil agreed for 3%; InParanoid, OrthoMCL and Sybil agreed for 1% and PanOCT, InParanoid and OrthoMCL agreed for <1% of proteins (Figure 4A).Figure 4.


PanOCT: automated clustering of orthologs using conserved gene neighborhood for pan-genomic analysis of bacterial strains and closely related species.

Fouts DE, Brinkac L, Beck E, Inman J, Sutton G - Nucleic Acids Res. (2012)

Agreement/disagreement between how proteins are clustered by the four methods for the entire set of clusters (A) and for the 85 manually curated clusters (B). The number of proteins (A) or clusters (B) that are in agreement for each possible subset of the four methods is graphed. Each subset pattern is indicated with shaded boxes for agreement and open boxes for disagreement. For example, when there are two shaded boxes and two open boxes the two shaded methods agree and the two open methods disagree with all three other methods; diagonal lines in a box indicate that while the two methods with diagonal lines disagree with the two shaded methods they agree with each other.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3526259&req=5

gks757-F4: Agreement/disagreement between how proteins are clustered by the four methods for the entire set of clusters (A) and for the 85 manually curated clusters (B). The number of proteins (A) or clusters (B) that are in agreement for each possible subset of the four methods is graphed. Each subset pattern is indicated with shaded boxes for agreement and open boxes for disagreement. For example, when there are two shaded boxes and two open boxes the two shaded methods agree and the two open methods disagree with all three other methods; diagonal lines in a box indicate that while the two methods with diagonal lines disagree with the two shaded methods they agree with each other.
Mentions: Except when all four methods agree, it is hard to directly compare clusters. This is because members of a single cluster from one clustering method could correspond to multiple clusters from another method, which may in turn correspond to different clusters from the original method. Therefore, instead of comparing clusters to evaluate the results of each clustering method, the cluster membership for each protein was evaluated. For each protein, two methods agreed if the protein was included in clusters with identical membership and disagreed otherwise. Of 6710 total non-redundant clusters containing 15 180 proteins, all four methods agreed for 86% of proteins (13 041) in 69% of the clusters (4631; Figure 4A). Three methods agreed and one disagreed: PanOCT, InParanoid and Sybil agreed for 4% of proteins; PanOCT, OrthoMCL and Sybil agreed for 3%; InParanoid, OrthoMCL and Sybil agreed for 1% and PanOCT, InParanoid and OrthoMCL agreed for <1% of proteins (Figure 4A).Figure 4.

Bottom Line: Pan-genome ortholog clustering tool (PanOCT) is a tool for pan-genomic analysis of closely related prokaryotic species or strains.The results from PanOCT and three commonly used graph-based ortholog-finding programs were compared using a set of four publicly available strains of the same bacterial species.The clusters that did not agree were inspected for evidence of correctness resulting in 85 high-confidence manually curated clusters that were used to compare all four methods.

View Article: PubMed Central - PubMed

Affiliation: J. Craig Venter Institute, 9704 Medical Center Drive, Rockville, MD 20850, USA. dfouts@jcvi.org

ABSTRACT
Pan-genome ortholog clustering tool (PanOCT) is a tool for pan-genomic analysis of closely related prokaryotic species or strains. PanOCT uses conserved gene neighborhood information to separate recently diverged paralogs into orthologous clusters where homology-only clustering methods cannot. The results from PanOCT and three commonly used graph-based ortholog-finding programs were compared using a set of four publicly available strains of the same bacterial species. All four methods agreed on ∼70% of the clusters and ∼86% of the proteins. The clusters that did not agree were inspected for evidence of correctness resulting in 85 high-confidence manually curated clusters that were used to compare all four methods.

Show MeSH