Limits...
Computational characterization of 3' splice variants in the GFAP isoform family.

Boyd SE, Nair B, Ng SW, Keith JM, Orian JM - PLoS ONE (2012)

Bottom Line: The gene consists of 9 exons, and to date all isoforms associated with 3' end splicing have been identified from modifications within intron 7, resulting in the generation of exon 7a (GFAPδ/ε) and 7b (GFAPκ).This is the first successful application of this method to a single gene--it has previously only been used in whole-genome analyses.The results illustrate a computational approach for characterising splicing isoform families, using both DNA and protein sequences.

View Article: PubMed Central - PubMed

Affiliation: School of Mathematical Sciences, Monash University, Clayton, Victoria, Australia.

ABSTRACT
Glial fibrillary acidic protein (GFAP) is an intermediate filament (IF) protein specific to central nervous system (CNS) astrocytes. It has been the subject of intense interest due to its association with neurodegenerative diseases, and because of growing evidence that IF proteins not only modulate cellular structure, but also cellular function. Moreover, GFAP has a family of splicing isoforms apparently more complex than that of other CNS IF proteins, consistent with it possessing a range of functional and structural roles. The gene consists of 9 exons, and to date all isoforms associated with 3' end splicing have been identified from modifications within intron 7, resulting in the generation of exon 7a (GFAPδ/ε) and 7b (GFAPκ). To better understand the nature and functional significance of variation in this region, we used a Bayesian multiple change-point approach to identify conserved regions. This is the first successful application of this method to a single gene--it has previously only been used in whole-genome analyses. We identified several highly or moderately conserved regions throughout the intron 7/7a/7b regions, including untranslated regions and regulatory features, consistent with the biology of GFAP. Several putative unconfirmed features were also identified, including a possible new isoform. We then integrated multiple computational analyses on both the DNA and protein sequences from the mouse, rat and human, showing that the major isoform, GFAPα, has highly conserved structure and features across the three species, whereas the minor isoforms GFAPδ/ε and GFAPκ have low conservation of structure and features at the distal 3' end, both relative to each other and relative to GFAPα. The overall picture suggests distinct and tightly regulated functions for the 3' end isoforms, consistent with complex astrocyte biology. The results illustrate a computational approach for characterising splicing isoform families, using both DNA and protein sequences.

Show MeSH

Related in: MedlinePlus

The four segment classes identified in the GFAP gene using changept.The top four profiles show, for each sequence position in the human GFAP DNA sequence (chr17: 42982993–42992914 in UCSC genomic coordinates), the probability that the base at that position belongs to conservation groups 1 to 4 respectively, as identified by the program changept applied to a 3-way alignment of rat, mouse and human sequences. At any position, the sum of the four profiles is 1. The two rows below the Group 4 profile display the exons (wide bars), the UTRs (narrow bars) and the introns (thin lines) of GFAP genes recorded in the UCSC and RefSeq collections respectively. Below these are the UCSC conservation tracks relative to mouse and rat, in which darker regions correspond to higher conservation, and parallel lines indicate deletions. At the bottom of the figure are the exon numbers. Note that the gene is transcribed from right to left. Exon boundaries are indicated with red vertical lines. Group 1 identifies regions of insertions specific to the human version of the gene; group 2 corresponds mainly to the mapped exons of the GFAP gene, appearing to cover regions of high conservation between the three species; group 3 is comprised of segments in which deletions occur in either the rat or the mouse genes, but not both; group 4 represents the least conserved parts of the gene.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC3316583&req=5

pone-0033565-g002: The four segment classes identified in the GFAP gene using changept.The top four profiles show, for each sequence position in the human GFAP DNA sequence (chr17: 42982993–42992914 in UCSC genomic coordinates), the probability that the base at that position belongs to conservation groups 1 to 4 respectively, as identified by the program changept applied to a 3-way alignment of rat, mouse and human sequences. At any position, the sum of the four profiles is 1. The two rows below the Group 4 profile display the exons (wide bars), the UTRs (narrow bars) and the introns (thin lines) of GFAP genes recorded in the UCSC and RefSeq collections respectively. Below these are the UCSC conservation tracks relative to mouse and rat, in which darker regions correspond to higher conservation, and parallel lines indicate deletions. At the bottom of the figure are the exon numbers. Note that the gene is transcribed from right to left. Exon boundaries are indicated with red vertical lines. Group 1 identifies regions of insertions specific to the human version of the gene; group 2 corresponds mainly to the mapped exons of the GFAP gene, appearing to cover regions of high conservation between the three species; group 3 is comprised of segments in which deletions occur in either the rat or the mouse genes, but not both; group 4 represents the least conserved parts of the gene.

Mentions: We generated a 3-way alignment of human, rat and mouse DNA sequences, using a 200 kB fragment of DNA centered on GFAP, and used the software changept [47], [48] to search for classes of conservation that captured features of structure and functional regulation of GFAP. Only three species were considered because alternative splicing variants have been extensively studied in only these species. We identified four classes of segment that were characterised by distinct patterns of conservation (Figure 2). We investigated the characteristics of the alignment columns that could be unambiguously assigned to each class (Figure S1, Table S1). One of the segment classes (Group 1) corresponded to regions where there are insertions specific to the human version of the gene, and a second class (Group 4) was comprised of segments in which deletions occur in either the rat or the mouse genes, but only rarely in both. Both of these segment classes are predominantly intronic, but are also observed in the 3′ UTR of the gene. They may represent regions that are functionally unimportant and can be deleted without detriment to the organism. Alternatively, they may have lineage-specific functional roles. The fact that there are extensive human-specific insertions, but no corresponding mouse- or rat-specific insertions, perhaps favours the latter interpretation. A third segment class (Group 2) is of most interest, corresponding remarkably closely with the mapped exons of the GFAP gene, and appearing to cover regions of high conservation between the three species. The fourth class (Group 3) makes up the balance, representing the less well-conserved parts of the gene. A curious feature of Group 3 is that it contains a high proportion of alignment columns in which rat and mouse match but human differs. Throughout the rest of this paper, we refer to regions of the gene with a high Group 2 profile value as conserved regions, and they form the focus of the remainder of our analysis. We also focus here on the features related to alternative splicing at the 3′ end of GFAP, even though there are several conserved regions throughout the 5′ end of the sequence.


Computational characterization of 3' splice variants in the GFAP isoform family.

Boyd SE, Nair B, Ng SW, Keith JM, Orian JM - PLoS ONE (2012)

The four segment classes identified in the GFAP gene using changept.The top four profiles show, for each sequence position in the human GFAP DNA sequence (chr17: 42982993–42992914 in UCSC genomic coordinates), the probability that the base at that position belongs to conservation groups 1 to 4 respectively, as identified by the program changept applied to a 3-way alignment of rat, mouse and human sequences. At any position, the sum of the four profiles is 1. The two rows below the Group 4 profile display the exons (wide bars), the UTRs (narrow bars) and the introns (thin lines) of GFAP genes recorded in the UCSC and RefSeq collections respectively. Below these are the UCSC conservation tracks relative to mouse and rat, in which darker regions correspond to higher conservation, and parallel lines indicate deletions. At the bottom of the figure are the exon numbers. Note that the gene is transcribed from right to left. Exon boundaries are indicated with red vertical lines. Group 1 identifies regions of insertions specific to the human version of the gene; group 2 corresponds mainly to the mapped exons of the GFAP gene, appearing to cover regions of high conservation between the three species; group 3 is comprised of segments in which deletions occur in either the rat or the mouse genes, but not both; group 4 represents the least conserved parts of the gene.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC3316583&req=5

pone-0033565-g002: The four segment classes identified in the GFAP gene using changept.The top four profiles show, for each sequence position in the human GFAP DNA sequence (chr17: 42982993–42992914 in UCSC genomic coordinates), the probability that the base at that position belongs to conservation groups 1 to 4 respectively, as identified by the program changept applied to a 3-way alignment of rat, mouse and human sequences. At any position, the sum of the four profiles is 1. The two rows below the Group 4 profile display the exons (wide bars), the UTRs (narrow bars) and the introns (thin lines) of GFAP genes recorded in the UCSC and RefSeq collections respectively. Below these are the UCSC conservation tracks relative to mouse and rat, in which darker regions correspond to higher conservation, and parallel lines indicate deletions. At the bottom of the figure are the exon numbers. Note that the gene is transcribed from right to left. Exon boundaries are indicated with red vertical lines. Group 1 identifies regions of insertions specific to the human version of the gene; group 2 corresponds mainly to the mapped exons of the GFAP gene, appearing to cover regions of high conservation between the three species; group 3 is comprised of segments in which deletions occur in either the rat or the mouse genes, but not both; group 4 represents the least conserved parts of the gene.
Mentions: We generated a 3-way alignment of human, rat and mouse DNA sequences, using a 200 kB fragment of DNA centered on GFAP, and used the software changept [47], [48] to search for classes of conservation that captured features of structure and functional regulation of GFAP. Only three species were considered because alternative splicing variants have been extensively studied in only these species. We identified four classes of segment that were characterised by distinct patterns of conservation (Figure 2). We investigated the characteristics of the alignment columns that could be unambiguously assigned to each class (Figure S1, Table S1). One of the segment classes (Group 1) corresponded to regions where there are insertions specific to the human version of the gene, and a second class (Group 4) was comprised of segments in which deletions occur in either the rat or the mouse genes, but only rarely in both. Both of these segment classes are predominantly intronic, but are also observed in the 3′ UTR of the gene. They may represent regions that are functionally unimportant and can be deleted without detriment to the organism. Alternatively, they may have lineage-specific functional roles. The fact that there are extensive human-specific insertions, but no corresponding mouse- or rat-specific insertions, perhaps favours the latter interpretation. A third segment class (Group 2) is of most interest, corresponding remarkably closely with the mapped exons of the GFAP gene, and appearing to cover regions of high conservation between the three species. The fourth class (Group 3) makes up the balance, representing the less well-conserved parts of the gene. A curious feature of Group 3 is that it contains a high proportion of alignment columns in which rat and mouse match but human differs. Throughout the rest of this paper, we refer to regions of the gene with a high Group 2 profile value as conserved regions, and they form the focus of the remainder of our analysis. We also focus here on the features related to alternative splicing at the 3′ end of GFAP, even though there are several conserved regions throughout the 5′ end of the sequence.

Bottom Line: The gene consists of 9 exons, and to date all isoforms associated with 3' end splicing have been identified from modifications within intron 7, resulting in the generation of exon 7a (GFAPδ/ε) and 7b (GFAPκ).This is the first successful application of this method to a single gene--it has previously only been used in whole-genome analyses.The results illustrate a computational approach for characterising splicing isoform families, using both DNA and protein sequences.

View Article: PubMed Central - PubMed

Affiliation: School of Mathematical Sciences, Monash University, Clayton, Victoria, Australia.

ABSTRACT
Glial fibrillary acidic protein (GFAP) is an intermediate filament (IF) protein specific to central nervous system (CNS) astrocytes. It has been the subject of intense interest due to its association with neurodegenerative diseases, and because of growing evidence that IF proteins not only modulate cellular structure, but also cellular function. Moreover, GFAP has a family of splicing isoforms apparently more complex than that of other CNS IF proteins, consistent with it possessing a range of functional and structural roles. The gene consists of 9 exons, and to date all isoforms associated with 3' end splicing have been identified from modifications within intron 7, resulting in the generation of exon 7a (GFAPδ/ε) and 7b (GFAPκ). To better understand the nature and functional significance of variation in this region, we used a Bayesian multiple change-point approach to identify conserved regions. This is the first successful application of this method to a single gene--it has previously only been used in whole-genome analyses. We identified several highly or moderately conserved regions throughout the intron 7/7a/7b regions, including untranslated regions and regulatory features, consistent with the biology of GFAP. Several putative unconfirmed features were also identified, including a possible new isoform. We then integrated multiple computational analyses on both the DNA and protein sequences from the mouse, rat and human, showing that the major isoform, GFAPα, has highly conserved structure and features across the three species, whereas the minor isoforms GFAPδ/ε and GFAPκ have low conservation of structure and features at the distal 3' end, both relative to each other and relative to GFAPα. The overall picture suggests distinct and tightly regulated functions for the 3' end isoforms, consistent with complex astrocyte biology. The results illustrate a computational approach for characterising splicing isoform families, using both DNA and protein sequences.

Show MeSH
Related in: MedlinePlus