Limits...
POEM, A 3-dimensional exon taxonomy and patterns in untranslated exons.

Knapp K, Chonka A, Chen YP - BMC Genomics (2008)

Bottom Line: POEM is applied to two congruent untranslated exon datasets resulting in the following statistical properties.The use of POEM will improve the accuracy of genefinder comparisons and analysis by means of a common taxonomy.It will also facilitate unambiguous communication due to its fine granularity.

View Article: PubMed Central - HTML - PubMed

Affiliation: Faculty of Science and Technology, Deakin University, Victoria, Australia. kdk@deakin.edu.au

ABSTRACT

Background: The existence of exons and introns has been known for thirty years. Despite this knowledge, there is a lack of formal research into the categorization of exons. Exon taxonomies used by researchers tend to be selected ad hoc or based on an information poor de-facto standard. Exons have been shown to have specific properties and functions based on among other things their location and order. These factors should play a role in the naming to increase specificity about which exon type(s) are in question.

Results: POEM (Protein Oriented Exon Monikers) is a new taxonomy focused on protein proximal exons. It integrates three dimensions of information (Global Position, Regional Position and Region), thus its exon categories are based on known statistical exon features. POEM is applied to two congruent untranslated exon datasets resulting in the following statistical properties. Using the POEM taxonomy previous wide ranging estimates of initial 5' untranslated region exons are resolved. According to our datasets, 29-36% of genes have wholly untranslated first exons. Untranslated exon containing sequences are shown to have consistently up to 6 times more 5' untranslated exons than 3' untranslated exons. Finally, three exon patterns are determined which account for 70% of untranslated exon genes.

Conclusion: We describe a thorough three-dimensional exon taxonomy called POEM, which is biologically and statistically relevant. No previous taxonomy provides such fine grained information and yet still includes all valid information dimensions. The use of POEM will improve the accuracy of genefinder comparisons and analysis by means of a common taxonomy. It will also facilitate unambiguous communication due to its fine granularity.

Show MeSH

Related in: MedlinePlus

The 29 exon categories in the POEM taxonomy. The vertical lines to the left indicate to which component(s) an exon belongs. The regions and CDS boundaries appear across the top of the diagram. The dashed vertical lines underneath "UT" and "TU" indicate the CDS boundaries. Each box or combination of adjacent white and shaded boxes represents one of the 29 exon categories. The translated region is darkened to aid visual demarcation from untranslated regions. An exon's moniker (or category name) is the combination of letters found within an exon. Dimension values are separated by periods (despite any CDS boundary). Lower case category names represent exons which can occur multiple times in the same protein coding gene; whereas all upper case monikers indicate exon categories that occur 0 or 1 times in a given protein coding gene. Space between two exon categories is to be understood as intronic. Place-holding exons are not displayed. Place-holding exons are those required by the taxonomical constraints to precede or follow a particular exon category. For example, internal exons (those whose monikers commence with an "I" or "i") can only exist if both preceded and followed by another exon.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2561055&req=5

Figure 1: The 29 exon categories in the POEM taxonomy. The vertical lines to the left indicate to which component(s) an exon belongs. The regions and CDS boundaries appear across the top of the diagram. The dashed vertical lines underneath "UT" and "TU" indicate the CDS boundaries. Each box or combination of adjacent white and shaded boxes represents one of the 29 exon categories. The translated region is darkened to aid visual demarcation from untranslated regions. An exon's moniker (or category name) is the combination of letters found within an exon. Dimension values are separated by periods (despite any CDS boundary). Lower case category names represent exons which can occur multiple times in the same protein coding gene; whereas all upper case monikers indicate exon categories that occur 0 or 1 times in a given protein coding gene. Space between two exon categories is to be understood as intronic. Place-holding exons are not displayed. Place-holding exons are those required by the taxonomical constraints to precede or follow a particular exon category. For example, internal exons (those whose monikers commence with an "I" or "i") can only exist if both preceded and followed by another exon.

Mentions: Given the lack of thorough protein oriented exon taxonomies the need for a new categorization method was obvious. The 29 categories of the POEM are displayed in Figure 1; Table 1 lists each exon category by name and gives a brief description. Despite the relatively long length of Table 1 the monikers are learned and used relatively quickly. The POEM taxonomy is divided into multi-exon genes and intronless-genes (also known as single exon genes [13]). Multi-exon genes are further sub-divided into region- and CDS- oriented exons. POEM therefore consists of three main components: intronless genes, CDS-oriented exons and region-oriented exons.


POEM, A 3-dimensional exon taxonomy and patterns in untranslated exons.

Knapp K, Chonka A, Chen YP - BMC Genomics (2008)

The 29 exon categories in the POEM taxonomy. The vertical lines to the left indicate to which component(s) an exon belongs. The regions and CDS boundaries appear across the top of the diagram. The dashed vertical lines underneath "UT" and "TU" indicate the CDS boundaries. Each box or combination of adjacent white and shaded boxes represents one of the 29 exon categories. The translated region is darkened to aid visual demarcation from untranslated regions. An exon's moniker (or category name) is the combination of letters found within an exon. Dimension values are separated by periods (despite any CDS boundary). Lower case category names represent exons which can occur multiple times in the same protein coding gene; whereas all upper case monikers indicate exon categories that occur 0 or 1 times in a given protein coding gene. Space between two exon categories is to be understood as intronic. Place-holding exons are not displayed. Place-holding exons are those required by the taxonomical constraints to precede or follow a particular exon category. For example, internal exons (those whose monikers commence with an "I" or "i") can only exist if both preceded and followed by another exon.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2561055&req=5

Figure 1: The 29 exon categories in the POEM taxonomy. The vertical lines to the left indicate to which component(s) an exon belongs. The regions and CDS boundaries appear across the top of the diagram. The dashed vertical lines underneath "UT" and "TU" indicate the CDS boundaries. Each box or combination of adjacent white and shaded boxes represents one of the 29 exon categories. The translated region is darkened to aid visual demarcation from untranslated regions. An exon's moniker (or category name) is the combination of letters found within an exon. Dimension values are separated by periods (despite any CDS boundary). Lower case category names represent exons which can occur multiple times in the same protein coding gene; whereas all upper case monikers indicate exon categories that occur 0 or 1 times in a given protein coding gene. Space between two exon categories is to be understood as intronic. Place-holding exons are not displayed. Place-holding exons are those required by the taxonomical constraints to precede or follow a particular exon category. For example, internal exons (those whose monikers commence with an "I" or "i") can only exist if both preceded and followed by another exon.
Mentions: Given the lack of thorough protein oriented exon taxonomies the need for a new categorization method was obvious. The 29 categories of the POEM are displayed in Figure 1; Table 1 lists each exon category by name and gives a brief description. Despite the relatively long length of Table 1 the monikers are learned and used relatively quickly. The POEM taxonomy is divided into multi-exon genes and intronless-genes (also known as single exon genes [13]). Multi-exon genes are further sub-divided into region- and CDS- oriented exons. POEM therefore consists of three main components: intronless genes, CDS-oriented exons and region-oriented exons.

Bottom Line: POEM is applied to two congruent untranslated exon datasets resulting in the following statistical properties.The use of POEM will improve the accuracy of genefinder comparisons and analysis by means of a common taxonomy.It will also facilitate unambiguous communication due to its fine granularity.

View Article: PubMed Central - HTML - PubMed

Affiliation: Faculty of Science and Technology, Deakin University, Victoria, Australia. kdk@deakin.edu.au

ABSTRACT

Background: The existence of exons and introns has been known for thirty years. Despite this knowledge, there is a lack of formal research into the categorization of exons. Exon taxonomies used by researchers tend to be selected ad hoc or based on an information poor de-facto standard. Exons have been shown to have specific properties and functions based on among other things their location and order. These factors should play a role in the naming to increase specificity about which exon type(s) are in question.

Results: POEM (Protein Oriented Exon Monikers) is a new taxonomy focused on protein proximal exons. It integrates three dimensions of information (Global Position, Regional Position and Region), thus its exon categories are based on known statistical exon features. POEM is applied to two congruent untranslated exon datasets resulting in the following statistical properties. Using the POEM taxonomy previous wide ranging estimates of initial 5' untranslated region exons are resolved. According to our datasets, 29-36% of genes have wholly untranslated first exons. Untranslated exon containing sequences are shown to have consistently up to 6 times more 5' untranslated exons than 3' untranslated exons. Finally, three exon patterns are determined which account for 70% of untranslated exon genes.

Conclusion: We describe a thorough three-dimensional exon taxonomy called POEM, which is biologically and statistically relevant. No previous taxonomy provides such fine grained information and yet still includes all valid information dimensions. The use of POEM will improve the accuracy of genefinder comparisons and analysis by means of a common taxonomy. It will also facilitate unambiguous communication due to its fine granularity.

Show MeSH
Related in: MedlinePlus