Limits...
Prioritization of candidate genes in QTL regions based on associations between traits and biological processes.

Bargsten JW, Nap JP, Sanchez-Perez GF, van Dijk AD - BMC Plant Biol. (2014)

Bottom Line: The average reduction of the number of genes was over ten-fold.Comparison with various types of experimental datasets (including QTL fine-mapping and Genome Wide Association Study results) indicated both statistical significance and biological relevance of the obtained connections between genes and traits.This way it capitalizes on QTL data to uncover how individual genes influence trait variation.

View Article: PubMed Central - PubMed

ABSTRACT

Background: Elucidation of genotype-to-phenotype relationships is a major challenge in biology. In plants, it is the basis for molecular breeding. Quantitative Trait Locus (QTL) mapping enables to link variation at the trait level to variation at the genomic level. However, QTL regions typically contain tens to hundreds of genes. In order to prioritize such candidate genes, we show that we can identify potentially causal genes for a trait based on overrepresentation of biological processes (gene functions) for the candidate genes in the QTL regions of that trait.

Results: The prioritization method was applied to rice QTL data, using gene functions predicted on the basis of sequence- and expression-information. The average reduction of the number of genes was over ten-fold. Comparison with various types of experimental datasets (including QTL fine-mapping and Genome Wide Association Study results) indicated both statistical significance and biological relevance of the obtained connections between genes and traits. A detailed analysis of flowering time QTLs illustrates that genes with completely unknown function are likely to play a role in this important trait.

Conclusions: Our approach can guide further experimentation and validation of causal genes for quantitative traits. This way it capitalizes on QTL data to uncover how individual genes influence trait variation.

Show MeSH

Related in: MedlinePlus

Analysis of QTL regions for rice trait days to heading. (A) Overview of prioritization results per QTL region. Each pair of horizontal bars indicates a QTL region; the black bar represents the total number of genes in the region, and the green bar the number of prioritized (potentially causal) genes. Inset: pie-diagram indicates the total number of genes (7113), and the fraction of those genes selected by the prioritization approach (579). (B) Overview of selected biological processes: REVIGO [68] scatterplot view in which each circle represents a BP; the distance between circles indicates similarity between BPs.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4274756&req=5

Fig4: Analysis of QTL regions for rice trait days to heading. (A) Overview of prioritization results per QTL region. Each pair of horizontal bars indicates a QTL region; the black bar represents the total number of genes in the region, and the green bar the number of prioritized (potentially causal) genes. Inset: pie-diagram indicates the total number of genes (7113), and the fraction of those genes selected by the prioritization approach (579). (B) Overview of selected biological processes: REVIGO [68] scatterplot view in which each circle represents a BP; the distance between circles indicates similarity between BPs.

Mentions: To illustrate the added value for plant biology, we considered the trait days to heading in depth. Days to heading, which is related to the trait flowering time, is an important parameter for rice breeding [65,66] and plays a key role in adaptation of rice to different environments [67]. In Figure 4A the number of genes prioritized is plotted, either divided per QTL region (main) or in all QTL regions together (insert). The various terms obtained for this trait are depicted in Figure 4B. Here, the position of each biological process term is chosen to represent similarities between the terms [68]. The overrepresented biological process occurring for the largest number of genes for this trait is ‘regulation of multicellular organismal development’. This term, although quite general, is obviously relevant for days to heading. Another relevant selected term was ‘cellular response to ethylene stimulus’; an ethylene receptor is known to delay the floral transition in rice [69]. A third clearly relevant term was ‘regulation of flower development’. We analyzed the genes associated with this term in more detail. From 7,113 genes in the rice QTL regions linked with the trait days to heading, 79 genes were assigned to the term ‘regulation of flower development’ by our function annotation (Additional file 3: Table S5) and hence prioritized as potentially causal genes for this trait by our method. Of these 79 genes some are described as ‘unknown’ by existing annotations (Additional file 3: Table S5). For example, gene LOC_Os04g54420 is annotated as containing a domain of unknown function (DUF618). Such genes could not have been prioritized based on existing annotations, which illustrates the importance of using our set of computational gene function predictions as input. To have a closer look at the genes prioritized for the trait days to heading based on the BP ‘regulation of flower development’ we focused on the genes that in the QTL region in which they occur were the only gene associated with this BP. Given the relevance of the BP ‘regulation of flower development’ for the trait days to heading, the occurrence of only one gene annotated with that BP term in a QTL region for this trait makes that gene a prime candidate for further study. There are in total 11 of such genes (Table 4). Analysis of the existing Rice Genome Annotation Project data [70] for these genes indicates that some are known to be involved in flower development. This includes two MADS genes, OsMADS34, involved in inflorescence and spikelet formation [71], and OsMADS18, involved in specifying floral determinacy and organ identity [72]. Several other genes are however not characterized at all and should therefore be considered new potentially causal genes involved in the regulation of flowering time. This includes a MYB transcription factor and two zinc finger domain containing proteins. In line with the preference for TFs among prioritized candidate genes, the set of 11 genes contains 5 TFs: the three mentioned above (2x MADS, 1x MYB) as well as two GATA TFs.Figure 4


Prioritization of candidate genes in QTL regions based on associations between traits and biological processes.

Bargsten JW, Nap JP, Sanchez-Perez GF, van Dijk AD - BMC Plant Biol. (2014)

Analysis of QTL regions for rice trait days to heading. (A) Overview of prioritization results per QTL region. Each pair of horizontal bars indicates a QTL region; the black bar represents the total number of genes in the region, and the green bar the number of prioritized (potentially causal) genes. Inset: pie-diagram indicates the total number of genes (7113), and the fraction of those genes selected by the prioritization approach (579). (B) Overview of selected biological processes: REVIGO [68] scatterplot view in which each circle represents a BP; the distance between circles indicates similarity between BPs.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4274756&req=5

Fig4: Analysis of QTL regions for rice trait days to heading. (A) Overview of prioritization results per QTL region. Each pair of horizontal bars indicates a QTL region; the black bar represents the total number of genes in the region, and the green bar the number of prioritized (potentially causal) genes. Inset: pie-diagram indicates the total number of genes (7113), and the fraction of those genes selected by the prioritization approach (579). (B) Overview of selected biological processes: REVIGO [68] scatterplot view in which each circle represents a BP; the distance between circles indicates similarity between BPs.
Mentions: To illustrate the added value for plant biology, we considered the trait days to heading in depth. Days to heading, which is related to the trait flowering time, is an important parameter for rice breeding [65,66] and plays a key role in adaptation of rice to different environments [67]. In Figure 4A the number of genes prioritized is plotted, either divided per QTL region (main) or in all QTL regions together (insert). The various terms obtained for this trait are depicted in Figure 4B. Here, the position of each biological process term is chosen to represent similarities between the terms [68]. The overrepresented biological process occurring for the largest number of genes for this trait is ‘regulation of multicellular organismal development’. This term, although quite general, is obviously relevant for days to heading. Another relevant selected term was ‘cellular response to ethylene stimulus’; an ethylene receptor is known to delay the floral transition in rice [69]. A third clearly relevant term was ‘regulation of flower development’. We analyzed the genes associated with this term in more detail. From 7,113 genes in the rice QTL regions linked with the trait days to heading, 79 genes were assigned to the term ‘regulation of flower development’ by our function annotation (Additional file 3: Table S5) and hence prioritized as potentially causal genes for this trait by our method. Of these 79 genes some are described as ‘unknown’ by existing annotations (Additional file 3: Table S5). For example, gene LOC_Os04g54420 is annotated as containing a domain of unknown function (DUF618). Such genes could not have been prioritized based on existing annotations, which illustrates the importance of using our set of computational gene function predictions as input. To have a closer look at the genes prioritized for the trait days to heading based on the BP ‘regulation of flower development’ we focused on the genes that in the QTL region in which they occur were the only gene associated with this BP. Given the relevance of the BP ‘regulation of flower development’ for the trait days to heading, the occurrence of only one gene annotated with that BP term in a QTL region for this trait makes that gene a prime candidate for further study. There are in total 11 of such genes (Table 4). Analysis of the existing Rice Genome Annotation Project data [70] for these genes indicates that some are known to be involved in flower development. This includes two MADS genes, OsMADS34, involved in inflorescence and spikelet formation [71], and OsMADS18, involved in specifying floral determinacy and organ identity [72]. Several other genes are however not characterized at all and should therefore be considered new potentially causal genes involved in the regulation of flowering time. This includes a MYB transcription factor and two zinc finger domain containing proteins. In line with the preference for TFs among prioritized candidate genes, the set of 11 genes contains 5 TFs: the three mentioned above (2x MADS, 1x MYB) as well as two GATA TFs.Figure 4

Bottom Line: The average reduction of the number of genes was over ten-fold.Comparison with various types of experimental datasets (including QTL fine-mapping and Genome Wide Association Study results) indicated both statistical significance and biological relevance of the obtained connections between genes and traits.This way it capitalizes on QTL data to uncover how individual genes influence trait variation.

View Article: PubMed Central - PubMed

ABSTRACT

Background: Elucidation of genotype-to-phenotype relationships is a major challenge in biology. In plants, it is the basis for molecular breeding. Quantitative Trait Locus (QTL) mapping enables to link variation at the trait level to variation at the genomic level. However, QTL regions typically contain tens to hundreds of genes. In order to prioritize such candidate genes, we show that we can identify potentially causal genes for a trait based on overrepresentation of biological processes (gene functions) for the candidate genes in the QTL regions of that trait.

Results: The prioritization method was applied to rice QTL data, using gene functions predicted on the basis of sequence- and expression-information. The average reduction of the number of genes was over ten-fold. Comparison with various types of experimental datasets (including QTL fine-mapping and Genome Wide Association Study results) indicated both statistical significance and biological relevance of the obtained connections between genes and traits. A detailed analysis of flowering time QTLs illustrates that genes with completely unknown function are likely to play a role in this important trait.

Conclusions: Our approach can guide further experimentation and validation of causal genes for quantitative traits. This way it capitalizes on QTL data to uncover how individual genes influence trait variation.

Show MeSH
Related in: MedlinePlus