Limits...
Ab initio identification of transcription start sites in the Rhesus macaque genome by histone modification and RNA-Seq.

Liu Y, Han D, Han Y, Yan Z, Xie B, Li J, Qiao N, Hu H, Khaitovich P, Gao Y, Han JD - Nucleic Acids Res. (2010)

Bottom Line: These provide an important rich resource for close examination of the species-specific transcript structures and transcription regulations in the Rhesus macaque genome.Our approach exemplifies a relatively inexpensive way to generate a reasonably reliable TSS map for a large genome.It may serve as a guiding example for similar genome annotation efforts targeted at other model organisms.

View Article: PubMed Central - PubMed

Affiliation: Chinese Academy of Sciences Key Laboratory of Molecular Developmental Biology, Center for Molecular Systems Biology, Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Lincui East Road, Beijing, 100101, Chinese.

ABSTRACT
Rhesus macaque is a widely used primate model organism. Its genome annotations are however still largely comparative computational predictions derived mainly from human genes, which precludes studies on the macaque-specific genes, gene isoforms or their regulations. Here we took advantage of histone H3 lysine 4 trimethylation (H3K4me3)'s ability to mark transcription start sites (TSSs) and the recently developed ChIP-Seq and RNA-Seq technology to survey the transcript structures. We generated 14,013,757 sequence tags by H3K4me3 ChIP-Seq and obtained 17,322,358 paired end reads for mRNA, and 10,698,419 short reads for sRNA from the macaque brain. By integrating these data with genomic sequence features and extending and improving a state-of-the-art TSS prediction algorithm, we ab initio predicted and verified 17,933 of previously electronically annotated TSSs at 500-bp resolution. We also predicted approximately 10,000 novel TSSs. These provide an important rich resource for close examination of the species-specific transcript structures and transcription regulations in the Rhesus macaque genome. Our approach exemplifies a relatively inexpensive way to generate a reasonably reliable TSS map for a large genome. It may serve as a guiding example for similar genome annotation efforts targeted at other model organisms.

Show MeSH

Related in: MedlinePlus

TSS prediction accuracy enhanced by RNA validation and refinement. (A) and (B) ROCs for CpG TSS and non-CpG TSS predictions before and after RNA validation and refinement. ‘sRNA validated’ and ‘mRNA validated’ refer to the TSS predictions validated by the sRNA-Seq and mRNA-Seq signals, respectively, while ‘sRNA refined’ and ‘mRNA refined’ refer to the final sets of TSS predictions which were further refined to the mRNA upward edges in the predicted direction of transcription. Each point in a ROC denotes the percentage of the set of TSS predictions that contain a homology-based electronic TSS annotation within the indicated distance. Only the TSS predictions with >0 GentleBoost scores are included. (C–F). The average profile of normalized H3K4me3 ChIP-Seq (C and D)/mRNA-Seq (E and F) tag counts for different sets of TSS predictions in the [−2 kb, +2 kb] region.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3045608&req=5

Figure 3: TSS prediction accuracy enhanced by RNA validation and refinement. (A) and (B) ROCs for CpG TSS and non-CpG TSS predictions before and after RNA validation and refinement. ‘sRNA validated’ and ‘mRNA validated’ refer to the TSS predictions validated by the sRNA-Seq and mRNA-Seq signals, respectively, while ‘sRNA refined’ and ‘mRNA refined’ refer to the final sets of TSS predictions which were further refined to the mRNA upward edges in the predicted direction of transcription. Each point in a ROC denotes the percentage of the set of TSS predictions that contain a homology-based electronic TSS annotation within the indicated distance. Only the TSS predictions with >0 GentleBoost scores are included. (C–F). The average profile of normalized H3K4me3 ChIP-Seq (C and D)/mRNA-Seq (E and F) tag counts for different sets of TSS predictions in the [−2 kb, +2 kb] region.

Mentions: The ROC plot clearly indicates that TSS predictions supported by nearby RNA-Seq signals overlap better with the known TSSs (Figure 3A and B), hence filtering using mRNA-Seq and sRNA-Seq data can further increase the accuracy of TSS mapping. Interestingly, we find that TSS predictions overlapped with sRNA signals tend to have higher log-odds scores, suggesting that sRNA signal is a better TSS predictor than mRNA signal (Figure 3A and B). This might be because mRNA signals can be mapped to many different exons, whereas sRNA signals more often map to a single exon for a particular gene and that many sRNAs are associated with TSS (see below).Figure 3.


Ab initio identification of transcription start sites in the Rhesus macaque genome by histone modification and RNA-Seq.

Liu Y, Han D, Han Y, Yan Z, Xie B, Li J, Qiao N, Hu H, Khaitovich P, Gao Y, Han JD - Nucleic Acids Res. (2010)

TSS prediction accuracy enhanced by RNA validation and refinement. (A) and (B) ROCs for CpG TSS and non-CpG TSS predictions before and after RNA validation and refinement. ‘sRNA validated’ and ‘mRNA validated’ refer to the TSS predictions validated by the sRNA-Seq and mRNA-Seq signals, respectively, while ‘sRNA refined’ and ‘mRNA refined’ refer to the final sets of TSS predictions which were further refined to the mRNA upward edges in the predicted direction of transcription. Each point in a ROC denotes the percentage of the set of TSS predictions that contain a homology-based electronic TSS annotation within the indicated distance. Only the TSS predictions with >0 GentleBoost scores are included. (C–F). The average profile of normalized H3K4me3 ChIP-Seq (C and D)/mRNA-Seq (E and F) tag counts for different sets of TSS predictions in the [−2 kb, +2 kb] region.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3045608&req=5

Figure 3: TSS prediction accuracy enhanced by RNA validation and refinement. (A) and (B) ROCs for CpG TSS and non-CpG TSS predictions before and after RNA validation and refinement. ‘sRNA validated’ and ‘mRNA validated’ refer to the TSS predictions validated by the sRNA-Seq and mRNA-Seq signals, respectively, while ‘sRNA refined’ and ‘mRNA refined’ refer to the final sets of TSS predictions which were further refined to the mRNA upward edges in the predicted direction of transcription. Each point in a ROC denotes the percentage of the set of TSS predictions that contain a homology-based electronic TSS annotation within the indicated distance. Only the TSS predictions with >0 GentleBoost scores are included. (C–F). The average profile of normalized H3K4me3 ChIP-Seq (C and D)/mRNA-Seq (E and F) tag counts for different sets of TSS predictions in the [−2 kb, +2 kb] region.
Mentions: The ROC plot clearly indicates that TSS predictions supported by nearby RNA-Seq signals overlap better with the known TSSs (Figure 3A and B), hence filtering using mRNA-Seq and sRNA-Seq data can further increase the accuracy of TSS mapping. Interestingly, we find that TSS predictions overlapped with sRNA signals tend to have higher log-odds scores, suggesting that sRNA signal is a better TSS predictor than mRNA signal (Figure 3A and B). This might be because mRNA signals can be mapped to many different exons, whereas sRNA signals more often map to a single exon for a particular gene and that many sRNAs are associated with TSS (see below).Figure 3.

Bottom Line: These provide an important rich resource for close examination of the species-specific transcript structures and transcription regulations in the Rhesus macaque genome.Our approach exemplifies a relatively inexpensive way to generate a reasonably reliable TSS map for a large genome.It may serve as a guiding example for similar genome annotation efforts targeted at other model organisms.

View Article: PubMed Central - PubMed

Affiliation: Chinese Academy of Sciences Key Laboratory of Molecular Developmental Biology, Center for Molecular Systems Biology, Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Lincui East Road, Beijing, 100101, Chinese.

ABSTRACT
Rhesus macaque is a widely used primate model organism. Its genome annotations are however still largely comparative computational predictions derived mainly from human genes, which precludes studies on the macaque-specific genes, gene isoforms or their regulations. Here we took advantage of histone H3 lysine 4 trimethylation (H3K4me3)'s ability to mark transcription start sites (TSSs) and the recently developed ChIP-Seq and RNA-Seq technology to survey the transcript structures. We generated 14,013,757 sequence tags by H3K4me3 ChIP-Seq and obtained 17,322,358 paired end reads for mRNA, and 10,698,419 short reads for sRNA from the macaque brain. By integrating these data with genomic sequence features and extending and improving a state-of-the-art TSS prediction algorithm, we ab initio predicted and verified 17,933 of previously electronically annotated TSSs at 500-bp resolution. We also predicted approximately 10,000 novel TSSs. These provide an important rich resource for close examination of the species-specific transcript structures and transcription regulations in the Rhesus macaque genome. Our approach exemplifies a relatively inexpensive way to generate a reasonably reliable TSS map for a large genome. It may serve as a guiding example for similar genome annotation efforts targeted at other model organisms.

Show MeSH
Related in: MedlinePlus