Plastid-LCGbase: a collection of evolutionarily conserved plastid-associated gene pairs.
Bottom Line: Second, it shows patterns and modes of all paired plastid genes and their physical distances across user-defined lineages, which are facilitated by a step-wise stratification of taxonomic groups.Fourth, the gene pairing scheme is expandable, where neighboring genes can also be included in species-/lineage-specific comparisons.We hope that Plastid-LCGbase facilitates gene variation (insertion-deletion, translocation and rearrangement) and transcription-level studies of plastid genomes.
Affiliation: CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, P. R. China Stem Cell Laboratory, UCL Cancer Institute, University College London, London WC1E 6BT, UK email@example.com.Show MeSH
Related in: MedlinePlus
Mentions: We started by constructing phylogenetic trees for species involved using CVTree (39) based on proteomic data to provide a glance of evolutionary relationship among all the species for users (Figure 1A). In general, the database offers a genome map function to show an overview of gene distributions on the browse page (Figure 1G). At the same time, it provides graphic views for structural changes among plastid genomes at a global level, which include both DNAs and translated coding sequences (CDSs). For better display of structural features, we divided plastid genome into three gene groups: protein-coding, non-coding RNA and all genes (including the previous two groups). One of the functions for the database is to define paired genes into all three types and to discover conserved patterns of the gene pairs in different evolutionary lineages. In the search page, we provide eight different colors to distinguish the distance of neighboring genes (0–300 bp, 300–500 bp, 500–800 bp, 800–1000 bp, 1000–1200 bp, 1200–1500 bp, 1500–2000 bp and > 2000 bp) and multiple-checked boxes to determine the species of interest on the sorted display of taxa from kingdom all the way down to genus (Figure 1B). When a gene identifier is entered by a user, the resulting page produces a figure containing a list of conserved gene pairs (both homology-based and strand-specific) (Figure 1C). Since the distances between paired genes are color-coded, the dynamics of TSS of homologous gene pairs in different species can be visually compared. If query gene is unknown, the database provides two alternative choices since all featured data have been summarized in the species table (Figure 1D). One way is to browse the gene list in particular genomes to find their names in various nomenclature system (e.g. Gene Identifier, Protein ID, Gene ID and Product) and position information (e.g. Strand, Start and End) (Figure 1E). Another way is to view the gene pair list including their relationship and individual features (Figure 1F). We also calculated all conserved gene pairs in the 470 plastid genomes for browsing and downloading. Furthermore, we define operon-like structures as determined by concatenating highly-conserved gene pairs (at least conserved in 100 plastid genomes) in certain species. In addition, we classify gene pairs into nine categories based on whether they are co-directionally-paired genes (CDPGs), convergently-paired genes (CPGs) or divergently-paired genes (DPGs) and in ‘Separation’, ‘Overlap’ and ‘Inclusion’ as patterns. The former is an orientation parameter that defines gene clusters based on relative transcription direction of neighboring genes; the latter is a distance parameter that characterizes physical distance of neighboring genes (Figure 2). In addition, we plot densities of TSS distance in logarithmic scale for CDPGs, CPGs, DPGs (Figure 1H) and all paired genes, and show barplots of all nine paired gene types on the ‘Parameter’ page (Figure 1I). We offer processed gene pair data of all plastid genomes for free-download by users. Every figure in this database can be enlarged to display a high-resolution version. In order to establish connections between this database and external public databases, we linked many keywords to their NCBI definitions and annotation pages; for example, ‘Species’, ‘Protein GI’, ‘Locus’, ‘Protein Accession’ and ‘Gene ID’ are all appropriately linked.
Affiliation: CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, P. R. China Stem Cell Laboratory, UCL Cancer Institute, University College London, London WC1E 6BT, UK firstname.lastname@example.org.