Improving pan-genome annotation using whole genome multiple alignment.
Bottom Line: Whole genome multiple alignment can be used to efficiently identify orthologs and annotation problem areas in a bacterial pan-genome.Comparisons of annotated gene structures within a species may show more variation than is actually present in the genome, indicating errors in genome annotation.Our new tool Mugsy-Annotator assists re-annotation efforts by highlighting edits that improve annotation consistency.
Affiliation: Center for Bioinformatics and Computational Biology, University of Maryland, College Park, MD 20742, USA. email@example.comShow MeSH
Related in: MedlinePlus
Mentions: As a case study, we evaluated the Mugsy-Annotator report for the dataset of 20 Nmen genomes. Inconsistent TIS are the most commonly detected anomaly in Nmen with 30% of aligned gene sets containing more than one annotated TIS. Due to lack of precision in TIS prediction, we expect the number of TIS inconsistencies to increase as the number of genomes increases, especially since our method marks a group as inconsistent even if the annotation error is limited to a single genome. To see how overall consistency is affected by any single genome, Mugsy-Annotator reports the number of times a single genome is inconsistent in comparison to the set. An examination of the Nmen genomes shows that certain subsets of genomes have better internal consistency. In 27% of groups with TIS inconsistencies, an alternative annotation in a single genome will resolve the inconsistencies for the group (Figure 6). Although some of the Nmen genomes contributed to more annotation inconsistencies than others, all of the genomes contributed to inconsistencies in at least one group.
Affiliation: Center for Bioinformatics and Computational Biology, University of Maryland, College Park, MD 20742, USA. firstname.lastname@example.org