Limits...
snoSeeker: an advanced computational package for screening of guide and orphan snoRNA genes in the human genome.

Yang JH, Zhang XC, Huang ZP, Zhou H, Huang MB, Zhang S, Chen YQ, Qu LH - Nucleic Acids Res. (2006)

Bottom Line: By using these programs, we have systematically scanned four human-mammal whole-genome alignment (WGA) sequences and identified 54 novel candidates including 26 orphan candidates as well as 266 known snoRNA genes.Eighteen novel snoRNAs were further experimentally confirmed with four snoRNAs exhibiting a tissue-specific or restricted expression pattern.The results of this study provide the most comprehensive listing of two families of snoRNA genes in the human genome till date.

View Article: PubMed Central - PubMed

Affiliation: Key Laboratory of Gene Engineering of the Ministry of Education, Zhongshan University, Guangzhou 510275, PR China.

ABSTRACT
Small nucleolar RNAs (snoRNAs) represent an abundant group of non-coding RNAs in eukaryotes. They can be divided into guide and orphan snoRNAs according to the presence or absence of antisense sequence to rRNAs or snRNAs. Current snoRNA-searching programs, which are essentially based on sequence complementarity to rRNAs or snRNAs, exist only for the screening of guide snoRNAs. In this study, we have developed an advanced computational package, snoSeeker, which includes CDseeker and ACAseeker programs, for the highly efficient and specific screening of both guide and orphan snoRNA genes in mammalian genomes. By using these programs, we have systematically scanned four human-mammal whole-genome alignment (WGA) sequences and identified 54 novel candidates including 26 orphan candidates as well as 266 known snoRNA genes. Eighteen novel snoRNAs were further experimentally confirmed with four snoRNAs exhibiting a tissue-specific or restricted expression pattern. The results of this study provide the most comprehensive listing of two families of snoRNA genes in the human genome till date.

Show MeSH

Related in: MedlinePlus

CDseeker and ACAseeker core algorithm workflow. (A) The C/D snoRNA model. The C/D box snoRNAs carry the conserved boxes C (RUGAUGA, R = purine) and D (CUGA) near their 5′ and 3′ ends, respectively. The two boxes are frequently folded together by a short (4–5 bp) terminal helix, to form a structure similar to a kink-turn. Often, imperfect copies of the C and D boxes, named C′ and D′, are located internally, in the order C/D′/C′/D. The 2′-O-ribose methylation of target RNAs is guided by one or two 10–21 antisense elements located upstream of the D and/or D′ boxes, so that the modified base is paired with the snoRNA nucleotide located precisely 5 nt upstream of the D or D′ box (17). (B) Schematic representation of the CDseeker algorithms. (C) The H/ACA snoRNA model. The H/ACA box snoRNAs consist of two hairpins and two short single-stranded regions, which contain the H box (ANANNA) and the ACA box. The latter is always located 3 nt upstream of the 3′ end of the snoRNA. The hairpins contain bulges, or recognition loops that form complex pseudoknots with the target RNA, where the target uridine is the first unpaired base. The position of the substrate uridine always resides 13–16 nt upstream of the H box (left recognition pocket) or of the ACA box (right recognition pocket) (17). (D) Schematic representation of the ACAseeker algorithms.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC1636440&req=5

fig1: CDseeker and ACAseeker core algorithm workflow. (A) The C/D snoRNA model. The C/D box snoRNAs carry the conserved boxes C (RUGAUGA, R = purine) and D (CUGA) near their 5′ and 3′ ends, respectively. The two boxes are frequently folded together by a short (4–5 bp) terminal helix, to form a structure similar to a kink-turn. Often, imperfect copies of the C and D boxes, named C′ and D′, are located internally, in the order C/D′/C′/D. The 2′-O-ribose methylation of target RNAs is guided by one or two 10–21 antisense elements located upstream of the D and/or D′ boxes, so that the modified base is paired with the snoRNA nucleotide located precisely 5 nt upstream of the D or D′ box (17). (B) Schematic representation of the CDseeker algorithms. (C) The H/ACA snoRNA model. The H/ACA box snoRNAs consist of two hairpins and two short single-stranded regions, which contain the H box (ANANNA) and the ACA box. The latter is always located 3 nt upstream of the 3′ end of the snoRNA. The hairpins contain bulges, or recognition loops that form complex pseudoknots with the target RNA, where the target uridine is the first unpaired base. The position of the substrate uridine always resides 13–16 nt upstream of the H box (left recognition pocket) or of the ACA box (right recognition pocket) (17). (D) Schematic representation of the ACAseeker algorithms.

Mentions: The CDseeker program combines probabilistic model (20), conserved primary and secondary structure motifs to search orphan and guide C/D snoRNAs in WGA sequences. Common algorithm components for guide and orphan snoRNA genes are box C, box D and the terminal stem base pairing. Two additional components are applicable to guide snoRNA genes, a region of sequence complementary to rRNA or snRNA and box D′ if the rRNA or snRNA complementary region is not directly adjacent to box D. For orphan snoRNA genes, two additional components are (i) predicted conserved functional region next to box D or box D′ and (ii) box D′ if the conserved functional sequence is not next to box D. The distance between components (e.g. the maximum distance is 100 nt between box C and box D) was also taken into account. The program searches box C, box D, terminal stem pairing and the antisense element step-by-step in WGA consensus sequences, and scores the corresponding elements with probabilistic models, then evaluates them based on the standard cutoff score of the training set. Candidates progress to the next evaluation only if the element score is higher than the cutoff. The examination of antisense is an optional criterion in the CDseeker program. The program assigns the candidates as guide snoRNAs or orphan snoRNAs using this evaluation. Finally, in order to rank the candidates, the program sums the scores of the motifs resulting in a final score. The standard cutoff score of a training set is also applied for selecting candidates. The structural model of C/D snoRNA genes for CDseeker is based on the canonical C/D structure shown in Figure 1A and the core algorithm workflow is shown using a schematic diagram in Figure 1B.


snoSeeker: an advanced computational package for screening of guide and orphan snoRNA genes in the human genome.

Yang JH, Zhang XC, Huang ZP, Zhou H, Huang MB, Zhang S, Chen YQ, Qu LH - Nucleic Acids Res. (2006)

CDseeker and ACAseeker core algorithm workflow. (A) The C/D snoRNA model. The C/D box snoRNAs carry the conserved boxes C (RUGAUGA, R = purine) and D (CUGA) near their 5′ and 3′ ends, respectively. The two boxes are frequently folded together by a short (4–5 bp) terminal helix, to form a structure similar to a kink-turn. Often, imperfect copies of the C and D boxes, named C′ and D′, are located internally, in the order C/D′/C′/D. The 2′-O-ribose methylation of target RNAs is guided by one or two 10–21 antisense elements located upstream of the D and/or D′ boxes, so that the modified base is paired with the snoRNA nucleotide located precisely 5 nt upstream of the D or D′ box (17). (B) Schematic representation of the CDseeker algorithms. (C) The H/ACA snoRNA model. The H/ACA box snoRNAs consist of two hairpins and two short single-stranded regions, which contain the H box (ANANNA) and the ACA box. The latter is always located 3 nt upstream of the 3′ end of the snoRNA. The hairpins contain bulges, or recognition loops that form complex pseudoknots with the target RNA, where the target uridine is the first unpaired base. The position of the substrate uridine always resides 13–16 nt upstream of the H box (left recognition pocket) or of the ACA box (right recognition pocket) (17). (D) Schematic representation of the ACAseeker algorithms.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC1636440&req=5

fig1: CDseeker and ACAseeker core algorithm workflow. (A) The C/D snoRNA model. The C/D box snoRNAs carry the conserved boxes C (RUGAUGA, R = purine) and D (CUGA) near their 5′ and 3′ ends, respectively. The two boxes are frequently folded together by a short (4–5 bp) terminal helix, to form a structure similar to a kink-turn. Often, imperfect copies of the C and D boxes, named C′ and D′, are located internally, in the order C/D′/C′/D. The 2′-O-ribose methylation of target RNAs is guided by one or two 10–21 antisense elements located upstream of the D and/or D′ boxes, so that the modified base is paired with the snoRNA nucleotide located precisely 5 nt upstream of the D or D′ box (17). (B) Schematic representation of the CDseeker algorithms. (C) The H/ACA snoRNA model. The H/ACA box snoRNAs consist of two hairpins and two short single-stranded regions, which contain the H box (ANANNA) and the ACA box. The latter is always located 3 nt upstream of the 3′ end of the snoRNA. The hairpins contain bulges, or recognition loops that form complex pseudoknots with the target RNA, where the target uridine is the first unpaired base. The position of the substrate uridine always resides 13–16 nt upstream of the H box (left recognition pocket) or of the ACA box (right recognition pocket) (17). (D) Schematic representation of the ACAseeker algorithms.
Mentions: The CDseeker program combines probabilistic model (20), conserved primary and secondary structure motifs to search orphan and guide C/D snoRNAs in WGA sequences. Common algorithm components for guide and orphan snoRNA genes are box C, box D and the terminal stem base pairing. Two additional components are applicable to guide snoRNA genes, a region of sequence complementary to rRNA or snRNA and box D′ if the rRNA or snRNA complementary region is not directly adjacent to box D. For orphan snoRNA genes, two additional components are (i) predicted conserved functional region next to box D or box D′ and (ii) box D′ if the conserved functional sequence is not next to box D. The distance between components (e.g. the maximum distance is 100 nt between box C and box D) was also taken into account. The program searches box C, box D, terminal stem pairing and the antisense element step-by-step in WGA consensus sequences, and scores the corresponding elements with probabilistic models, then evaluates them based on the standard cutoff score of the training set. Candidates progress to the next evaluation only if the element score is higher than the cutoff. The examination of antisense is an optional criterion in the CDseeker program. The program assigns the candidates as guide snoRNAs or orphan snoRNAs using this evaluation. Finally, in order to rank the candidates, the program sums the scores of the motifs resulting in a final score. The standard cutoff score of a training set is also applied for selecting candidates. The structural model of C/D snoRNA genes for CDseeker is based on the canonical C/D structure shown in Figure 1A and the core algorithm workflow is shown using a schematic diagram in Figure 1B.

Bottom Line: By using these programs, we have systematically scanned four human-mammal whole-genome alignment (WGA) sequences and identified 54 novel candidates including 26 orphan candidates as well as 266 known snoRNA genes.Eighteen novel snoRNAs were further experimentally confirmed with four snoRNAs exhibiting a tissue-specific or restricted expression pattern.The results of this study provide the most comprehensive listing of two families of snoRNA genes in the human genome till date.

View Article: PubMed Central - PubMed

Affiliation: Key Laboratory of Gene Engineering of the Ministry of Education, Zhongshan University, Guangzhou 510275, PR China.

ABSTRACT
Small nucleolar RNAs (snoRNAs) represent an abundant group of non-coding RNAs in eukaryotes. They can be divided into guide and orphan snoRNAs according to the presence or absence of antisense sequence to rRNAs or snRNAs. Current snoRNA-searching programs, which are essentially based on sequence complementarity to rRNAs or snRNAs, exist only for the screening of guide snoRNAs. In this study, we have developed an advanced computational package, snoSeeker, which includes CDseeker and ACAseeker programs, for the highly efficient and specific screening of both guide and orphan snoRNA genes in mammalian genomes. By using these programs, we have systematically scanned four human-mammal whole-genome alignment (WGA) sequences and identified 54 novel candidates including 26 orphan candidates as well as 266 known snoRNA genes. Eighteen novel snoRNAs were further experimentally confirmed with four snoRNAs exhibiting a tissue-specific or restricted expression pattern. The results of this study provide the most comprehensive listing of two families of snoRNA genes in the human genome till date.

Show MeSH
Related in: MedlinePlus