Limits...
Alignment of time course gene expression data and the classification of developmentally driven genes with hidden Markov models.

Robinson S, Glonek G, Koch I, Thomas M, Davies C - BMC Bioinformatics (2015)

Bottom Line: We present a novel alignment method based on hidden Markov models (HMMs) and use the method to align the motivating grapevine data.The classification of developmentally driven genes both validates that the alignment we obtain is meaningful and also gives new evidence that can be used to identify the role of genes with unknown function.Using our alignment methodology, we find at least 1279 grapevine probe sets with no current annotated function that are likely to be controlled in a developmental manner.

View Article: PubMed Central - PubMed

Affiliation: School of Mathematical Sciences, University of Adelaide, Adelaide, Australia. sean.robinson@alumni.adelaide.edu.au.

ABSTRACT

Background: We consider data from a time course microarray experiment that was conducted on grapevines over the development cycle of the grape berries at two different vineyards in South Australia. Although the underlying biological process of berry development is the same at both vineyards, there are differences in the timing of the development due to local conditions. We aim to align the data from the two vineyards to enable an integrated analysis of the gene expression and use the alignment of the expression profiles to classify likely developmental function.

Results: We present a novel alignment method based on hidden Markov models (HMMs) and use the method to align the motivating grapevine data. We show that our alignment method is robust against subsets of profiles that are not suitable for alignment, investigate alignment diagnostics under the model and demonstrate the classification of developmentally driven genes.

Conclusions: The classification of developmentally driven genes both validates that the alignment we obtain is meaningful and also gives new evidence that can be used to identify the role of genes with unknown function. Using our alignment methodology, we find at least 1279 grapevine probe sets with no current annotated function that are likely to be controlled in a developmental manner.

Show MeSH
Histogram of H(k) for the grapevine data (left) and ROC curve for classifying ‘developmental’ or ‘non-developmental’ (temperature responsive) genes based on whether they are above or below a given H(k) threshold (right)
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4472167&req=5

Fig7: Histogram of H(k) for the grapevine data (left) and ROC curve for classifying ‘developmental’ or ‘non-developmental’ (temperature responsive) genes based on whether they are above or below a given H(k) threshold (right)

Mentions: The left side of Fig. 7 shows the distribution of Hamming distances for all pairs of expression profiles in the grapevine data. The right side of Fig. 7 shows the receiver operating characteristic (ROC) curve for classifying genes as ‘developmental’ or ‘non-developmental’ (temperature responsive) based on whether the Hamming distance is below or above a given threshold. The area under the curve is 0.91, indicating a good level of discrimination for this data. When the threshold is taken as H(k)=10, the true positive rate is 85.3 % and the false positive rate is 21.9 %. This suggests that applying the same threshold is a potentially useful filter for the classification of developmentally controlled genes amongst a set of genes of unknown function.Fig. 7


Alignment of time course gene expression data and the classification of developmentally driven genes with hidden Markov models.

Robinson S, Glonek G, Koch I, Thomas M, Davies C - BMC Bioinformatics (2015)

Histogram of H(k) for the grapevine data (left) and ROC curve for classifying ‘developmental’ or ‘non-developmental’ (temperature responsive) genes based on whether they are above or below a given H(k) threshold (right)
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4472167&req=5

Fig7: Histogram of H(k) for the grapevine data (left) and ROC curve for classifying ‘developmental’ or ‘non-developmental’ (temperature responsive) genes based on whether they are above or below a given H(k) threshold (right)
Mentions: The left side of Fig. 7 shows the distribution of Hamming distances for all pairs of expression profiles in the grapevine data. The right side of Fig. 7 shows the receiver operating characteristic (ROC) curve for classifying genes as ‘developmental’ or ‘non-developmental’ (temperature responsive) based on whether the Hamming distance is below or above a given threshold. The area under the curve is 0.91, indicating a good level of discrimination for this data. When the threshold is taken as H(k)=10, the true positive rate is 85.3 % and the false positive rate is 21.9 %. This suggests that applying the same threshold is a potentially useful filter for the classification of developmentally controlled genes amongst a set of genes of unknown function.Fig. 7

Bottom Line: We present a novel alignment method based on hidden Markov models (HMMs) and use the method to align the motivating grapevine data.The classification of developmentally driven genes both validates that the alignment we obtain is meaningful and also gives new evidence that can be used to identify the role of genes with unknown function.Using our alignment methodology, we find at least 1279 grapevine probe sets with no current annotated function that are likely to be controlled in a developmental manner.

View Article: PubMed Central - PubMed

Affiliation: School of Mathematical Sciences, University of Adelaide, Adelaide, Australia. sean.robinson@alumni.adelaide.edu.au.

ABSTRACT

Background: We consider data from a time course microarray experiment that was conducted on grapevines over the development cycle of the grape berries at two different vineyards in South Australia. Although the underlying biological process of berry development is the same at both vineyards, there are differences in the timing of the development due to local conditions. We aim to align the data from the two vineyards to enable an integrated analysis of the gene expression and use the alignment of the expression profiles to classify likely developmental function.

Results: We present a novel alignment method based on hidden Markov models (HMMs) and use the method to align the motivating grapevine data. We show that our alignment method is robust against subsets of profiles that are not suitable for alignment, investigate alignment diagnostics under the model and demonstrate the classification of developmentally driven genes.

Conclusions: The classification of developmentally driven genes both validates that the alignment we obtain is meaningful and also gives new evidence that can be used to identify the role of genes with unknown function. Using our alignment methodology, we find at least 1279 grapevine probe sets with no current annotated function that are likely to be controlled in a developmental manner.

Show MeSH