Limits...
CATMA, a comprehensive genome-scale resource for silencing and transcript profiling of Arabidopsis genes.

Sclep G, Allemeersch J, Liechti R, De Meyer B, Beynon J, Bhalerao R, Moreau Y, Nietfeld W, Renou JP, Reymond P, Kuiper MT, Hilson P - BMC Bioinformatics (2007)

Bottom Line: To validate the efficacy of GST mapping criteria and design rules, the predicted and experimentally observed hybridization characteristics associated to GST features were correlated in transcript profiling datasets obtained with the CATMAv2 microarray, confirming the reliability of this platform.These latter 1,533 features constitute the CATMAv4 addition.This resource is used both for the production of spotted microarrays and the large-scale cloning of hairpin RNA silencing vectors.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Plant Systems Biology, VIB, Technologiepark 927, 9052 Ghent, Belgium. gert.sclep@tecnoparco.org

ABSTRACT

Background: The Complete Arabidopsis Transcript MicroArray (CATMA) initiative combines the efforts of laboratories in eight European countries 1 to deliver gene-specific sequence tags (GSTs) for the Arabidopsis research community. The CATMA initiative offers the power and flexibility to regularly update the GST collection according to evolving knowledge about the gene repertoire. These GST amplicons can easily be reamplified and shared, subsets can be picked at will to print dedicated arrays, and the GSTs can be cloned and used for other functional studies. This ongoing initiative has already produced approximately 24,000 GSTs that have been made publicly available for spotted microarray printing and RNA interference.

Results: GSTs from the CATMA version 2 repertoire (CATMAv2, created in 2002) were mapped onto the gene models from two independent Arabidopsis nuclear genome annotation efforts, TIGR5 and PSB-EuGène, to consolidate a list of genes that were targeted by previously designed CATMA tags. A total of 9,027 gene models were not tagged by any amplified CATMAv2 GST, and 2,533 amplified GSTs were no longer predicted to tag an updated gene model. To validate the efficacy of GST mapping criteria and design rules, the predicted and experimentally observed hybridization characteristics associated to GST features were correlated in transcript profiling datasets obtained with the CATMAv2 microarray, confirming the reliability of this platform. To complete the CATMA repertoire, all 9,027 gene models for which no GST had yet been designed were processed with an adjusted version of the Specific Primer and Amplicon Design Software (SPADS). A total of 5,756 novel GSTs were designed and amplified by PCR from genomic DNA. Together with the pre-existing GST collection, this new addition constitutes the CATMAv3 repertoire. It comprises 30,343 unique amplified sequences that tag 24,202 and 23,009 protein-encoding nuclear gene models in the TAIR6 and EuGène genome annotations, respectively. To cover the remaining untagged genes, we identified 543 additional GSTs using less stringent design criteria and designed 990 sequence tags matching multiple members of gene families (Gene Family Tags or GFTs) to cover any remaining untagged genes. These latter 1,533 features constitute the CATMAv4 addition.

Conclusion: To update the CATMA GST repertoire, we designed 7,289 additional sequence tags, bringing the total number of tagged TAIR6-annotated Arabidopsis nuclear protein-coding genes to 26,173. This resource is used both for the production of spotted microarrays and the large-scale cloning of hairpin RNA silencing vectors. All information about the resulting updated CATMA repertoire is available through the CATMA database http://www.catma.org.

Show MeSH

Related in: MedlinePlus

Overview of the GST classification and design process yielding the CATMAv3 repertoire. The design and classification process was started with the creation of a MySQL database containing three types of information: the exon coordinates of the TIGR5 annotated protein-coding nuclear genes, the exon coordinates of Eugène 040917, an in-house generated and curated annotation, and the BLAST hit coordinates of the CATMAv2 GSTs, blasted against the Arabidopsis genome. For each annotation source, regions of overlapping genes were marked and gene models that ended with the ORF stop codon were extended with an 'artificial 3' UTR of 150 bp. Information on the prior CATMAv2 GST amplification success or failure was also added to the database. In a second step, both GSTs and genes were classified into five different categories. The classification routine is depicted in Additional File 2 and the categories themselves are described in detail in Table 2. Only successfully amplified GSTs were taken into consideration for the gene classification. When a gene was classified as GE5, it was considered as having a 'unique' tag. When a GST was classified as GST5, it was considered as 'tagging uniquely'. The GST classification was added to the CATMA database, flagging the non-tagging GSTs without actually removing them from the repository. The gene classification was used as a basis for the third and final step, the design of new GSTs for all genes not classified as GE5. To this end, we used the SPADS 1.1.5 software on virtual gene models from which all overlapping exon regions and all exon regions not common to all of the gene's alternative splice forms were removed When no GST can be designed in the most divergent exon regions, SPADS increasingly incorporates less divergent exon regions in its search space (producing GSTs with progressively lower specificity (high, medium or low) and at one point also allows the design of intron-spanning GSTs. At each design level, SPADS scans the gene model from the 3' end to the 5' end. Newly designed GSTs were added to the CATMA database.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2147040&req=5

Figure 1: Overview of the GST classification and design process yielding the CATMAv3 repertoire. The design and classification process was started with the creation of a MySQL database containing three types of information: the exon coordinates of the TIGR5 annotated protein-coding nuclear genes, the exon coordinates of Eugène 040917, an in-house generated and curated annotation, and the BLAST hit coordinates of the CATMAv2 GSTs, blasted against the Arabidopsis genome. For each annotation source, regions of overlapping genes were marked and gene models that ended with the ORF stop codon were extended with an 'artificial 3' UTR of 150 bp. Information on the prior CATMAv2 GST amplification success or failure was also added to the database. In a second step, both GSTs and genes were classified into five different categories. The classification routine is depicted in Additional File 2 and the categories themselves are described in detail in Table 2. Only successfully amplified GSTs were taken into consideration for the gene classification. When a gene was classified as GE5, it was considered as having a 'unique' tag. When a GST was classified as GST5, it was considered as 'tagging uniquely'. The GST classification was added to the CATMA database, flagging the non-tagging GSTs without actually removing them from the repository. The gene classification was used as a basis for the third and final step, the design of new GSTs for all genes not classified as GE5. To this end, we used the SPADS 1.1.5 software on virtual gene models from which all overlapping exon regions and all exon regions not common to all of the gene's alternative splice forms were removed When no GST can be designed in the most divergent exon regions, SPADS increasingly incorporates less divergent exon regions in its search space (producing GSTs with progressively lower specificity (high, medium or low) and at one point also allows the design of intron-spanning GSTs. At each design level, SPADS scans the gene model from the 3' end to the 5' end. Newly designed GSTs were added to the CATMA database.

Mentions: To keep up with evolving and increasingly more accurate genome annotations, continued efforts are needed to synchronize probe repertoires with changes in the list of annotated genes. The first CATMA GST design rounds [3,5] were based on earlier Arabidopsis genome annotation releases, namely EuGène 2003, TIGR3 and TIGR4, that were outdated at the time the present work was initiated. Therefore, we first determined which genes described in the more recent EuGène 040917 and TIGR5 (January 2004) annotation releases were still unambiguously tagged by pre-existing GSTs to identify the list of 'orphan' genes that should be considered for upgrading and expanding the GST repertoire (Figure 1). We finally mapped all newly designed GSTs onto the TAIR6 (October 2005) gene models. This work is part of our ongoing efforts to assure the comprehensive nature of the CATMA resources (see also 'Note added in proof').


CATMA, a comprehensive genome-scale resource for silencing and transcript profiling of Arabidopsis genes.

Sclep G, Allemeersch J, Liechti R, De Meyer B, Beynon J, Bhalerao R, Moreau Y, Nietfeld W, Renou JP, Reymond P, Kuiper MT, Hilson P - BMC Bioinformatics (2007)

Overview of the GST classification and design process yielding the CATMAv3 repertoire. The design and classification process was started with the creation of a MySQL database containing three types of information: the exon coordinates of the TIGR5 annotated protein-coding nuclear genes, the exon coordinates of Eugène 040917, an in-house generated and curated annotation, and the BLAST hit coordinates of the CATMAv2 GSTs, blasted against the Arabidopsis genome. For each annotation source, regions of overlapping genes were marked and gene models that ended with the ORF stop codon were extended with an 'artificial 3' UTR of 150 bp. Information on the prior CATMAv2 GST amplification success or failure was also added to the database. In a second step, both GSTs and genes were classified into five different categories. The classification routine is depicted in Additional File 2 and the categories themselves are described in detail in Table 2. Only successfully amplified GSTs were taken into consideration for the gene classification. When a gene was classified as GE5, it was considered as having a 'unique' tag. When a GST was classified as GST5, it was considered as 'tagging uniquely'. The GST classification was added to the CATMA database, flagging the non-tagging GSTs without actually removing them from the repository. The gene classification was used as a basis for the third and final step, the design of new GSTs for all genes not classified as GE5. To this end, we used the SPADS 1.1.5 software on virtual gene models from which all overlapping exon regions and all exon regions not common to all of the gene's alternative splice forms were removed When no GST can be designed in the most divergent exon regions, SPADS increasingly incorporates less divergent exon regions in its search space (producing GSTs with progressively lower specificity (high, medium or low) and at one point also allows the design of intron-spanning GSTs. At each design level, SPADS scans the gene model from the 3' end to the 5' end. Newly designed GSTs were added to the CATMA database.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2147040&req=5

Figure 1: Overview of the GST classification and design process yielding the CATMAv3 repertoire. The design and classification process was started with the creation of a MySQL database containing three types of information: the exon coordinates of the TIGR5 annotated protein-coding nuclear genes, the exon coordinates of Eugène 040917, an in-house generated and curated annotation, and the BLAST hit coordinates of the CATMAv2 GSTs, blasted against the Arabidopsis genome. For each annotation source, regions of overlapping genes were marked and gene models that ended with the ORF stop codon were extended with an 'artificial 3' UTR of 150 bp. Information on the prior CATMAv2 GST amplification success or failure was also added to the database. In a second step, both GSTs and genes were classified into five different categories. The classification routine is depicted in Additional File 2 and the categories themselves are described in detail in Table 2. Only successfully amplified GSTs were taken into consideration for the gene classification. When a gene was classified as GE5, it was considered as having a 'unique' tag. When a GST was classified as GST5, it was considered as 'tagging uniquely'. The GST classification was added to the CATMA database, flagging the non-tagging GSTs without actually removing them from the repository. The gene classification was used as a basis for the third and final step, the design of new GSTs for all genes not classified as GE5. To this end, we used the SPADS 1.1.5 software on virtual gene models from which all overlapping exon regions and all exon regions not common to all of the gene's alternative splice forms were removed When no GST can be designed in the most divergent exon regions, SPADS increasingly incorporates less divergent exon regions in its search space (producing GSTs with progressively lower specificity (high, medium or low) and at one point also allows the design of intron-spanning GSTs. At each design level, SPADS scans the gene model from the 3' end to the 5' end. Newly designed GSTs were added to the CATMA database.
Mentions: To keep up with evolving and increasingly more accurate genome annotations, continued efforts are needed to synchronize probe repertoires with changes in the list of annotated genes. The first CATMA GST design rounds [3,5] were based on earlier Arabidopsis genome annotation releases, namely EuGène 2003, TIGR3 and TIGR4, that were outdated at the time the present work was initiated. Therefore, we first determined which genes described in the more recent EuGène 040917 and TIGR5 (January 2004) annotation releases were still unambiguously tagged by pre-existing GSTs to identify the list of 'orphan' genes that should be considered for upgrading and expanding the GST repertoire (Figure 1). We finally mapped all newly designed GSTs onto the TAIR6 (October 2005) gene models. This work is part of our ongoing efforts to assure the comprehensive nature of the CATMA resources (see also 'Note added in proof').

Bottom Line: To validate the efficacy of GST mapping criteria and design rules, the predicted and experimentally observed hybridization characteristics associated to GST features were correlated in transcript profiling datasets obtained with the CATMAv2 microarray, confirming the reliability of this platform.These latter 1,533 features constitute the CATMAv4 addition.This resource is used both for the production of spotted microarrays and the large-scale cloning of hairpin RNA silencing vectors.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Plant Systems Biology, VIB, Technologiepark 927, 9052 Ghent, Belgium. gert.sclep@tecnoparco.org

ABSTRACT

Background: The Complete Arabidopsis Transcript MicroArray (CATMA) initiative combines the efforts of laboratories in eight European countries 1 to deliver gene-specific sequence tags (GSTs) for the Arabidopsis research community. The CATMA initiative offers the power and flexibility to regularly update the GST collection according to evolving knowledge about the gene repertoire. These GST amplicons can easily be reamplified and shared, subsets can be picked at will to print dedicated arrays, and the GSTs can be cloned and used for other functional studies. This ongoing initiative has already produced approximately 24,000 GSTs that have been made publicly available for spotted microarray printing and RNA interference.

Results: GSTs from the CATMA version 2 repertoire (CATMAv2, created in 2002) were mapped onto the gene models from two independent Arabidopsis nuclear genome annotation efforts, TIGR5 and PSB-EuGène, to consolidate a list of genes that were targeted by previously designed CATMA tags. A total of 9,027 gene models were not tagged by any amplified CATMAv2 GST, and 2,533 amplified GSTs were no longer predicted to tag an updated gene model. To validate the efficacy of GST mapping criteria and design rules, the predicted and experimentally observed hybridization characteristics associated to GST features were correlated in transcript profiling datasets obtained with the CATMAv2 microarray, confirming the reliability of this platform. To complete the CATMA repertoire, all 9,027 gene models for which no GST had yet been designed were processed with an adjusted version of the Specific Primer and Amplicon Design Software (SPADS). A total of 5,756 novel GSTs were designed and amplified by PCR from genomic DNA. Together with the pre-existing GST collection, this new addition constitutes the CATMAv3 repertoire. It comprises 30,343 unique amplified sequences that tag 24,202 and 23,009 protein-encoding nuclear gene models in the TAIR6 and EuGène genome annotations, respectively. To cover the remaining untagged genes, we identified 543 additional GSTs using less stringent design criteria and designed 990 sequence tags matching multiple members of gene families (Gene Family Tags or GFTs) to cover any remaining untagged genes. These latter 1,533 features constitute the CATMAv4 addition.

Conclusion: To update the CATMA GST repertoire, we designed 7,289 additional sequence tags, bringing the total number of tagged TAIR6-annotated Arabidopsis nuclear protein-coding genes to 26,173. This resource is used both for the production of spotted microarrays and the large-scale cloning of hairpin RNA silencing vectors. All information about the resulting updated CATMA repertoire is available through the CATMA database http://www.catma.org.

Show MeSH
Related in: MedlinePlus