Limits...
Whole transcriptome analyses of six thoroughbred horses before and after exercise using RNA-Seq.

Park KD, Park J, Ko J, Kim BC, Kim HS, Ahn K, Do KT, Choi H, Kim HM, Song S, Lee S, Jho S, Kong HS, Yang YM, Jhun BH, Kim C, Kim TH, Hwang S, Bhak J, Lee HK, Cho BW - BMC Genomics (2012)

Bottom Line: More than 60% (20,428) of the unigene clusters did not match any current equine gene model.Most SNVs (171,558 SNVs; 90.31%) were novel when compared with over 1.1 million equine SNPs from two SNP databases.In addition, we found interesting RNA expression patterns where different alternative splicing forms of the same gene showed reversed expressions before and after exercising.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Biotechnology, Hankyong National University, Anseong, Republic of Korea.

ABSTRACT

Background: Thoroughbred horses are the most expensive domestic animals, and their running ability and knowledge about their muscle-related diseases are important in animal genetics. While the horse reference genome is available, there has been no large-scale functional annotation of the genome using expressed genes derived from transcriptomes.

Results: We present a large-scale analysis of whole transcriptome data. We sequenced the whole mRNA from the blood and muscle tissues of six thoroughbred horses before and after exercise. By comparing current genome annotations, we identified 32,361 unigene clusters spanning 51.83 Mb that contained 11,933 (36.87%) annotated genes. More than 60% (20,428) of the unigene clusters did not match any current equine gene model. We also identified 189,973 single nucleotide variations (SNVs) from the sequences aligned against the horse reference genome. Most SNVs (171,558 SNVs; 90.31%) were novel when compared with over 1.1 million equine SNPs from two SNP databases. Using differential expression analysis, we further identified a number of exercise-regulated genes: 62 up-regulated and 80 down-regulated genes in the blood, and 878 up-regulated and 285 down-regulated genes in the muscle. Six of 28 previously-known exercise-related genes were over-expressed in the muscle after exercise. Among the differentially expressed genes, there were 91 transcription factor-encoding genes, which included 56 functionally unknown transcription factor candidates that are probably associated with an early regulatory exercise mechanism. In addition, we found interesting RNA expression patterns where different alternative splicing forms of the same gene showed reversed expressions before and after exercising.

Conclusion: The first sequencing-based horse transcriptome data, extensive analyses results, deferentially expressed genes before and after exercise, and candidate genes that are related to the exercise are provided in this study.

Show MeSH
Enhanced genome annotation, single nucleotide variation analyses, and differentially expressed genes before and after exercise in horse. (A) Red and green circles indicate expressed genes in the blood and muscle tissues, respectively, and the blue circle shows the current Ensembl annotation (Release 62). The grey rectangle indicates the coverage of the current horse genome. (B) Green circle: SNPs provided by the Broad Institute, red circle: SNPs provided by Ensembl (release 62), blue circle: SNPs identified from this study. (C) SNV profiles of six horses for the titin (TTN) gene. The top of the blue arrow is the 5' end and the bottom is 3' end of TTN gene. The X-axis shows the names of the horses. Dark green horizontal bars are non-synonymous SNVs. Light green horizontal bars are synonymous SNVs. (D) Blue bars: >2-fold upregulated genes, red bars: >2-fold downregulated genes, white bars: not differentially expressed. The four pie charts display the composition of the DEGs supported by four horses (white), five horses (light grey), and six horses (grey).
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3472166&req=5

Figure 1: Enhanced genome annotation, single nucleotide variation analyses, and differentially expressed genes before and after exercise in horse. (A) Red and green circles indicate expressed genes in the blood and muscle tissues, respectively, and the blue circle shows the current Ensembl annotation (Release 62). The grey rectangle indicates the coverage of the current horse genome. (B) Green circle: SNPs provided by the Broad Institute, red circle: SNPs provided by Ensembl (release 62), blue circle: SNPs identified from this study. (C) SNV profiles of six horses for the titin (TTN) gene. The top of the blue arrow is the 5' end and the bottom is 3' end of TTN gene. The X-axis shows the names of the horses. Dark green horizontal bars are non-synonymous SNVs. Light green horizontal bars are synonymous SNVs. (D) Blue bars: >2-fold upregulated genes, red bars: >2-fold downregulated genes, white bars: not differentially expressed. The four pie charts display the composition of the DEGs supported by four horses (white), five horses (light grey), and six horses (grey).

Mentions: To construct high quality horse transcriptome data, we generated over 1.3 billion 90-bp pair-end reads using an Illumina HiSeq2000 (Additional file1: Figure S1, Additional file1: Table S1, and Table S2). Using TopHat[26] and Cufflinks[27], 84.60% of all the reads were successfully mapped against the current horse reference genome (Additional file1: Table S3). A novel bioinformatics protocol for processing large amounts of transcriptome sequences was built (Additional file1: Figure S2). RNA sequences were obtained from 24 different samples; therefore, we defined a new concept, unigene cluster (UC), which contains overlapped unigene sequences originating from multiple samples. Utilizing the current annotation (Ensembl 62), 32,361 unigene clusters (UCs), with a total length of 51.83 Mb, were identified. 11,933 UCs matched current gene models, which comprised 36.87% of the 32,361 UCs (Figure 1A and Additional file1: Supplementary Methods)[8]. The remaining 20,428 UCs (63.13%), which contained more than 60% of the transcripts, were novel (Additional file1: Supplementary Methods and Additional file1: Figure S3). The expressions of eight randomly selected novel UCs were confirmed by reverse transcription PCR (Additional file1: Figure S4 and Additional file1: Table S4). In addition, the unmapped raw sequences were processed by SOAPdenovo[28], resulting in assemblies of 42,476 to 72,011 scaffolds for each sample. These assembled sequences increased the extent of the current horse genome (Additional file1: Supplementary Methods, Additional file1: Figure S2, and Additional file1: Tables S5, S6, S7, S8). When we pooled the scaffolds together, we identified around 670,000 non-redundant unigenes. 27% to 46% of these unigenes from each sample were matched to human genes using tBLASTx (Additional file1: Table S9).


Whole transcriptome analyses of six thoroughbred horses before and after exercise using RNA-Seq.

Park KD, Park J, Ko J, Kim BC, Kim HS, Ahn K, Do KT, Choi H, Kim HM, Song S, Lee S, Jho S, Kong HS, Yang YM, Jhun BH, Kim C, Kim TH, Hwang S, Bhak J, Lee HK, Cho BW - BMC Genomics (2012)

Enhanced genome annotation, single nucleotide variation analyses, and differentially expressed genes before and after exercise in horse. (A) Red and green circles indicate expressed genes in the blood and muscle tissues, respectively, and the blue circle shows the current Ensembl annotation (Release 62). The grey rectangle indicates the coverage of the current horse genome. (B) Green circle: SNPs provided by the Broad Institute, red circle: SNPs provided by Ensembl (release 62), blue circle: SNPs identified from this study. (C) SNV profiles of six horses for the titin (TTN) gene. The top of the blue arrow is the 5' end and the bottom is 3' end of TTN gene. The X-axis shows the names of the horses. Dark green horizontal bars are non-synonymous SNVs. Light green horizontal bars are synonymous SNVs. (D) Blue bars: >2-fold upregulated genes, red bars: >2-fold downregulated genes, white bars: not differentially expressed. The four pie charts display the composition of the DEGs supported by four horses (white), five horses (light grey), and six horses (grey).
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3472166&req=5

Figure 1: Enhanced genome annotation, single nucleotide variation analyses, and differentially expressed genes before and after exercise in horse. (A) Red and green circles indicate expressed genes in the blood and muscle tissues, respectively, and the blue circle shows the current Ensembl annotation (Release 62). The grey rectangle indicates the coverage of the current horse genome. (B) Green circle: SNPs provided by the Broad Institute, red circle: SNPs provided by Ensembl (release 62), blue circle: SNPs identified from this study. (C) SNV profiles of six horses for the titin (TTN) gene. The top of the blue arrow is the 5' end and the bottom is 3' end of TTN gene. The X-axis shows the names of the horses. Dark green horizontal bars are non-synonymous SNVs. Light green horizontal bars are synonymous SNVs. (D) Blue bars: >2-fold upregulated genes, red bars: >2-fold downregulated genes, white bars: not differentially expressed. The four pie charts display the composition of the DEGs supported by four horses (white), five horses (light grey), and six horses (grey).
Mentions: To construct high quality horse transcriptome data, we generated over 1.3 billion 90-bp pair-end reads using an Illumina HiSeq2000 (Additional file1: Figure S1, Additional file1: Table S1, and Table S2). Using TopHat[26] and Cufflinks[27], 84.60% of all the reads were successfully mapped against the current horse reference genome (Additional file1: Table S3). A novel bioinformatics protocol for processing large amounts of transcriptome sequences was built (Additional file1: Figure S2). RNA sequences were obtained from 24 different samples; therefore, we defined a new concept, unigene cluster (UC), which contains overlapped unigene sequences originating from multiple samples. Utilizing the current annotation (Ensembl 62), 32,361 unigene clusters (UCs), with a total length of 51.83 Mb, were identified. 11,933 UCs matched current gene models, which comprised 36.87% of the 32,361 UCs (Figure 1A and Additional file1: Supplementary Methods)[8]. The remaining 20,428 UCs (63.13%), which contained more than 60% of the transcripts, were novel (Additional file1: Supplementary Methods and Additional file1: Figure S3). The expressions of eight randomly selected novel UCs were confirmed by reverse transcription PCR (Additional file1: Figure S4 and Additional file1: Table S4). In addition, the unmapped raw sequences were processed by SOAPdenovo[28], resulting in assemblies of 42,476 to 72,011 scaffolds for each sample. These assembled sequences increased the extent of the current horse genome (Additional file1: Supplementary Methods, Additional file1: Figure S2, and Additional file1: Tables S5, S6, S7, S8). When we pooled the scaffolds together, we identified around 670,000 non-redundant unigenes. 27% to 46% of these unigenes from each sample were matched to human genes using tBLASTx (Additional file1: Table S9).

Bottom Line: More than 60% (20,428) of the unigene clusters did not match any current equine gene model.Most SNVs (171,558 SNVs; 90.31%) were novel when compared with over 1.1 million equine SNPs from two SNP databases.In addition, we found interesting RNA expression patterns where different alternative splicing forms of the same gene showed reversed expressions before and after exercising.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Biotechnology, Hankyong National University, Anseong, Republic of Korea.

ABSTRACT

Background: Thoroughbred horses are the most expensive domestic animals, and their running ability and knowledge about their muscle-related diseases are important in animal genetics. While the horse reference genome is available, there has been no large-scale functional annotation of the genome using expressed genes derived from transcriptomes.

Results: We present a large-scale analysis of whole transcriptome data. We sequenced the whole mRNA from the blood and muscle tissues of six thoroughbred horses before and after exercise. By comparing current genome annotations, we identified 32,361 unigene clusters spanning 51.83 Mb that contained 11,933 (36.87%) annotated genes. More than 60% (20,428) of the unigene clusters did not match any current equine gene model. We also identified 189,973 single nucleotide variations (SNVs) from the sequences aligned against the horse reference genome. Most SNVs (171,558 SNVs; 90.31%) were novel when compared with over 1.1 million equine SNPs from two SNP databases. Using differential expression analysis, we further identified a number of exercise-regulated genes: 62 up-regulated and 80 down-regulated genes in the blood, and 878 up-regulated and 285 down-regulated genes in the muscle. Six of 28 previously-known exercise-related genes were over-expressed in the muscle after exercise. Among the differentially expressed genes, there were 91 transcription factor-encoding genes, which included 56 functionally unknown transcription factor candidates that are probably associated with an early regulatory exercise mechanism. In addition, we found interesting RNA expression patterns where different alternative splicing forms of the same gene showed reversed expressions before and after exercising.

Conclusion: The first sequencing-based horse transcriptome data, extensive analyses results, deferentially expressed genes before and after exercise, and candidate genes that are related to the exercise are provided in this study.

Show MeSH