Limits...
Identification and characterization of pseudogenes in the rice gene complement.

Thibaud-Nissen F, Ouyang S, Buell CR - BMC Genomics (2009)

Bottom Line: Nipponbare) is the product of a semi-automated pipeline that does not explicitly predict pseudogenes.Among the 816 pseudogenes for which a probable origin could be determined, 75% originated from gene duplication events while 25% were the result of retrotransposition events.Finally, F-box proteins, BTB/POZ proteins, terpene synthases, chalcone synthases and cytochrome P450 protein families were found to harbor large numbers of pseudogenes.

View Article: PubMed Central - HTML - PubMed

Affiliation: The J. Craig Venter Institute, 9712 Medical Center Dr, Rockville, MD 20850, USA. thibaudf@ncbi.nlm.nih.gov

ABSTRACT

Background: The Osa1 Genome Annotation of rice (Oryza sativa L. ssp. japonica cv. Nipponbare) is the product of a semi-automated pipeline that does not explicitly predict pseudogenes. As such, it is likely to mis-annotate pseudogenes as functional genes. A total of 22,033 gene models within the Osa1 Release 5 were investigated as potential pseudogenes as these genes exhibit at least one feature potentially indicative of pseudogenes: lack of transcript support, short coding region, long untranslated region, or, for genes residing within a segmentally duplicated region, lack of a paralog or significantly shorter corresponding paralog.

Results: A total of 1,439 pseudogenes, identified among genes with pseudogene features, were characterized by similarity to fully-supported gene models and the presence of frameshifts or premature translational stop codons. Significant difference in the length of duplicated genes within segmentally-duplicated regions was the optimal indicator of pseudogenization. Among the 816 pseudogenes for which a probable origin could be determined, 75% originated from gene duplication events while 25% were the result of retrotransposition events. A total of 12% of the pseudogenes were expressed. Finally, F-box proteins, BTB/POZ proteins, terpene synthases, chalcone synthases and cytochrome P450 protein families were found to harbor large numbers of pseudogenes.

Conclusion: These pseudogenes still have a detectable open reading frame and are thus distinct from pseudogenes detected within intergenic regions which typically lack definable open reading frames. Families containing the highest number of pseudogenes are fast-evolving families involved in ubiquitination and secondary metabolism.

Show MeSH

Related in: MedlinePlus

Number of pseudogenes per paralogous family. Pseudogenes were associated with paralogous families, based on their parents. Families discussed in the text are labeled with their number and the name of the associated Pfam domain, if characterized. BTB-MATH: Bric-a-Brac/Tramtrack/Broad Complex and Meprin and TRAF homology domain. DUF: domain of unknown function. The straight line represents the linear regression of the number of pseudogenes per family over the number functional genes per family. In the inserted plot, the y-axis has a greater range to represent family 3724.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2724416&req=5

Figure 4: Number of pseudogenes per paralogous family. Pseudogenes were associated with paralogous families, based on their parents. Families discussed in the text are labeled with their number and the name of the associated Pfam domain, if characterized. BTB-MATH: Bric-a-Brac/Tramtrack/Broad Complex and Meprin and TRAF homology domain. DUF: domain of unknown function. The straight line represents the linear regression of the number of pseudogenes per family over the number functional genes per family. In the inserted plot, the y-axis has a greater range to represent family 3724.

Mentions: In order to refine this general categorization, the pseudogenization frequency was examined within paralogous families that were constructed through clustering of PFAM and novel domains of the entire rice proteome [31]. A total of 558 parents of 815 pseudogenes belonging to 444 paralogous families were examined. The number of pseudogenes per family was plotted against the size of the family (Figure 4). The scatter of the data suggests that the number of pseudogenes per paralogous family is poorly correlated to the size of the family (r2 = 0.01).


Identification and characterization of pseudogenes in the rice gene complement.

Thibaud-Nissen F, Ouyang S, Buell CR - BMC Genomics (2009)

Number of pseudogenes per paralogous family. Pseudogenes were associated with paralogous families, based on their parents. Families discussed in the text are labeled with their number and the name of the associated Pfam domain, if characterized. BTB-MATH: Bric-a-Brac/Tramtrack/Broad Complex and Meprin and TRAF homology domain. DUF: domain of unknown function. The straight line represents the linear regression of the number of pseudogenes per family over the number functional genes per family. In the inserted plot, the y-axis has a greater range to represent family 3724.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2724416&req=5

Figure 4: Number of pseudogenes per paralogous family. Pseudogenes were associated with paralogous families, based on their parents. Families discussed in the text are labeled with their number and the name of the associated Pfam domain, if characterized. BTB-MATH: Bric-a-Brac/Tramtrack/Broad Complex and Meprin and TRAF homology domain. DUF: domain of unknown function. The straight line represents the linear regression of the number of pseudogenes per family over the number functional genes per family. In the inserted plot, the y-axis has a greater range to represent family 3724.
Mentions: In order to refine this general categorization, the pseudogenization frequency was examined within paralogous families that were constructed through clustering of PFAM and novel domains of the entire rice proteome [31]. A total of 558 parents of 815 pseudogenes belonging to 444 paralogous families were examined. The number of pseudogenes per family was plotted against the size of the family (Figure 4). The scatter of the data suggests that the number of pseudogenes per paralogous family is poorly correlated to the size of the family (r2 = 0.01).

Bottom Line: Nipponbare) is the product of a semi-automated pipeline that does not explicitly predict pseudogenes.Among the 816 pseudogenes for which a probable origin could be determined, 75% originated from gene duplication events while 25% were the result of retrotransposition events.Finally, F-box proteins, BTB/POZ proteins, terpene synthases, chalcone synthases and cytochrome P450 protein families were found to harbor large numbers of pseudogenes.

View Article: PubMed Central - HTML - PubMed

Affiliation: The J. Craig Venter Institute, 9712 Medical Center Dr, Rockville, MD 20850, USA. thibaudf@ncbi.nlm.nih.gov

ABSTRACT

Background: The Osa1 Genome Annotation of rice (Oryza sativa L. ssp. japonica cv. Nipponbare) is the product of a semi-automated pipeline that does not explicitly predict pseudogenes. As such, it is likely to mis-annotate pseudogenes as functional genes. A total of 22,033 gene models within the Osa1 Release 5 were investigated as potential pseudogenes as these genes exhibit at least one feature potentially indicative of pseudogenes: lack of transcript support, short coding region, long untranslated region, or, for genes residing within a segmentally duplicated region, lack of a paralog or significantly shorter corresponding paralog.

Results: A total of 1,439 pseudogenes, identified among genes with pseudogene features, were characterized by similarity to fully-supported gene models and the presence of frameshifts or premature translational stop codons. Significant difference in the length of duplicated genes within segmentally-duplicated regions was the optimal indicator of pseudogenization. Among the 816 pseudogenes for which a probable origin could be determined, 75% originated from gene duplication events while 25% were the result of retrotransposition events. A total of 12% of the pseudogenes were expressed. Finally, F-box proteins, BTB/POZ proteins, terpene synthases, chalcone synthases and cytochrome P450 protein families were found to harbor large numbers of pseudogenes.

Conclusion: These pseudogenes still have a detectable open reading frame and are thus distinct from pseudogenes detected within intergenic regions which typically lack definable open reading frames. Families containing the highest number of pseudogenes are fast-evolving families involved in ubiquitination and secondary metabolism.

Show MeSH
Related in: MedlinePlus