Limits...
Comparisons between Arabidopsis thaliana and Drosophila melanogaster in relation to Coding and Noncoding Sequence Length and Gene Expression.

Caldwell R, Lin YX, Zhang R - Int J Genomics (2015)

Bottom Line: Advances in high-quality sequencing technologies and large-scale resource datasets have increased the understanding of relationships and cross-referencing of expression data to the large genome data.Although a negative correlation between expression level and gene (especially transcript) length has been generally accepted, there have been some conflicting results arising from the literature concerning the impacts of different regions of genes, and the underlying reason is not well understood.In conclusion, the information described in this study provides the basis for further exploration into gene regulation with regard to coding and noncoding sequence length.

View Article: PubMed Central - PubMed

Affiliation: School of Biological Sciences, University of Wollongong, Northfields Avenue, Keiraville, Wollongong, NSW 2522, Australia.

ABSTRACT
There is a continuing interest in the analysis of gene architecture and gene expression to determine the relationship that may exist. Advances in high-quality sequencing technologies and large-scale resource datasets have increased the understanding of relationships and cross-referencing of expression data to the large genome data. Although a negative correlation between expression level and gene (especially transcript) length has been generally accepted, there have been some conflicting results arising from the literature concerning the impacts of different regions of genes, and the underlying reason is not well understood. The research aims to apply quantile regression techniques for statistical analysis of coding and noncoding sequence length and gene expression data in the plant, Arabidopsis thaliana, and fruit fly, Drosophila melanogaster, to determine if a relationship exists and if there is any variation or similarities between these species. The quantile regression analysis found that the coding sequence length and gene expression correlations varied, and similarities emerged for the noncoding sequence length (5' and 3' UTRs) between animal and plant species. In conclusion, the information described in this study provides the basis for further exploration into gene regulation with regard to coding and noncoding sequence length.

No MeSH data available.


Related in: MedlinePlus

18,445 genes in Arabidopsis thaliana for the 5′ untranslated region (UTR) length, excluding introns. The distribution of this data is positively skewed (skewness = 2.511).
© Copyright Policy - open-access
Related In: Results  -  Collection


getmorefigures.php?uid=PMC4465843&req=5

fig1: 18,445 genes in Arabidopsis thaliana for the 5′ untranslated region (UTR) length, excluding introns. The distribution of this data is positively skewed (skewness = 2.511).

Mentions: Strong skewness was identified in all the length datasets for each gene region. For example, the distribution of the 5′ UTR length without introns in Arabidopsis thaliana was positively skewed (skewness = 2.511) (Figure 1). Consequently, the Kruskal-Wallis nonparametric analysis method using SPSS version 19 (SPSS IBM, New York, USA) was applied to the data to determine whether there are differences between the quartile groups, in relation to gene expression and the length of the coding and noncoding regions. This test makes no assumptions about the distribution of the data:(1)K=N−1∑i=1gnir¯i·−r¯2∑i=1g∑j=1nirij−r¯2,where ni is the number of observations in group i and rij is the rank (among all observations) of observation j from group i. N is the total number of observations across all groups. and is the average of all rij.


Comparisons between Arabidopsis thaliana and Drosophila melanogaster in relation to Coding and Noncoding Sequence Length and Gene Expression.

Caldwell R, Lin YX, Zhang R - Int J Genomics (2015)

18,445 genes in Arabidopsis thaliana for the 5′ untranslated region (UTR) length, excluding introns. The distribution of this data is positively skewed (skewness = 2.511).
© Copyright Policy - open-access
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC4465843&req=5

fig1: 18,445 genes in Arabidopsis thaliana for the 5′ untranslated region (UTR) length, excluding introns. The distribution of this data is positively skewed (skewness = 2.511).
Mentions: Strong skewness was identified in all the length datasets for each gene region. For example, the distribution of the 5′ UTR length without introns in Arabidopsis thaliana was positively skewed (skewness = 2.511) (Figure 1). Consequently, the Kruskal-Wallis nonparametric analysis method using SPSS version 19 (SPSS IBM, New York, USA) was applied to the data to determine whether there are differences between the quartile groups, in relation to gene expression and the length of the coding and noncoding regions. This test makes no assumptions about the distribution of the data:(1)K=N−1∑i=1gnir¯i·−r¯2∑i=1g∑j=1nirij−r¯2,where ni is the number of observations in group i and rij is the rank (among all observations) of observation j from group i. N is the total number of observations across all groups. and is the average of all rij.

Bottom Line: Advances in high-quality sequencing technologies and large-scale resource datasets have increased the understanding of relationships and cross-referencing of expression data to the large genome data.Although a negative correlation between expression level and gene (especially transcript) length has been generally accepted, there have been some conflicting results arising from the literature concerning the impacts of different regions of genes, and the underlying reason is not well understood.In conclusion, the information described in this study provides the basis for further exploration into gene regulation with regard to coding and noncoding sequence length.

View Article: PubMed Central - PubMed

Affiliation: School of Biological Sciences, University of Wollongong, Northfields Avenue, Keiraville, Wollongong, NSW 2522, Australia.

ABSTRACT
There is a continuing interest in the analysis of gene architecture and gene expression to determine the relationship that may exist. Advances in high-quality sequencing technologies and large-scale resource datasets have increased the understanding of relationships and cross-referencing of expression data to the large genome data. Although a negative correlation between expression level and gene (especially transcript) length has been generally accepted, there have been some conflicting results arising from the literature concerning the impacts of different regions of genes, and the underlying reason is not well understood. The research aims to apply quantile regression techniques for statistical analysis of coding and noncoding sequence length and gene expression data in the plant, Arabidopsis thaliana, and fruit fly, Drosophila melanogaster, to determine if a relationship exists and if there is any variation or similarities between these species. The quantile regression analysis found that the coding sequence length and gene expression correlations varied, and similarities emerged for the noncoding sequence length (5' and 3' UTRs) between animal and plant species. In conclusion, the information described in this study provides the basis for further exploration into gene regulation with regard to coding and noncoding sequence length.

No MeSH data available.


Related in: MedlinePlus