Limits...
Paradigm of tunable clustering using Binarization of Consensus Partition Matrices (Bi-CoPaM) for gene discovery.

Abu-Jamous B, Fa R, Roberts DJ, Nandi AK - PLoS ONE (2013)

Bottom Line: Conventional binary and fuzzy clustering do not embrace the biological reality that some genes may be irrelevant for a problem and not be assigned to a cluster, while other genes may participate in several biological functions and should simultaneously belong to multiple clusters.The method has been tested with a set of synthetic datasets and a set of five real yeast cell-cycle datasets.The results demonstrate its validity in generating relevant tight, wide, and complementary clusters that can meet requirements of different gene discovery studies.

View Article: PubMed Central - PubMed

Affiliation: Department of Electrical Engineering and Electronics, The University of Liverpool, Brownlow Hill, Liverpool, United Kingdom.

ABSTRACT
Clustering analysis has a growing role in the study of co-expressed genes for gene discovery. Conventional binary and fuzzy clustering do not embrace the biological reality that some genes may be irrelevant for a problem and not be assigned to a cluster, while other genes may participate in several biological functions and should simultaneously belong to multiple clusters. Also, these algorithms cannot generate tight clusters that focus on their cores or wide clusters that overlap and contain all possibly relevant genes. In this paper, a new clustering paradigm is proposed. In this paradigm, all three eventualities of a gene being exclusively assigned to a single cluster, being assigned to multiple clusters, and being not assigned to any cluster are possible. These possibilities are realised through the primary novelty of the introduction of tunable binarization techniques. Results from multiple clustering experiments are aggregated to generate one fuzzy consensus partition matrix (CoPaM), which is then binarized to obtain the final binary partitions. This is referred to as Binarization of Consensus Partition Matrices (Bi-CoPaM). The method has been tested with a set of synthetic datasets and a set of five real yeast cell-cycle datasets. The results demonstrate its validity in generating relevant tight, wide, and complementary clusters that can meet requirements of different gene discovery studies.

Show MeSH
SNR effect over the number of multiply assigned and unassigned genes.(a) The number of multi-assigned genes is plotted over the 60 SNR values in four cases of wide clusters generated by using the TB technique. (b) The number of unassigned genes is plotted over the 60 SNR values in four cases of tight clusters generated by using the DTB technique. Note that there are no multi-assigned genes in tight clusters as there are no unassigned genes in wide clusters.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC3569426&req=5

pone-0056432-g005: SNR effect over the number of multiply assigned and unassigned genes.(a) The number of multi-assigned genes is plotted over the 60 SNR values in four cases of wide clusters generated by using the TB technique. (b) The number of unassigned genes is plotted over the 60 SNR values in four cases of tight clusters generated by using the DTB technique. Note that there are no multi-assigned genes in tight clusters as there are no unassigned genes in wide clusters.

Mentions: Figure 5(a) shows the numbers of multi-assigned genes () while adopting TB with δ = 0.05, 0.1, 0.2, and 0.4 over the 60 synthetic datasets ordered by their SNR values. It can be noticed for a particular dataset, i.e. a particular SNR value in this plot, the rate of increase in while increasing δ is usually higher for noisier ones. Also, while comparing different datasets with each other, the values of tend to decrease for purer datasets, i.e. for higher SNR datasets.


Paradigm of tunable clustering using Binarization of Consensus Partition Matrices (Bi-CoPaM) for gene discovery.

Abu-Jamous B, Fa R, Roberts DJ, Nandi AK - PLoS ONE (2013)

SNR effect over the number of multiply assigned and unassigned genes.(a) The number of multi-assigned genes is plotted over the 60 SNR values in four cases of wide clusters generated by using the TB technique. (b) The number of unassigned genes is plotted over the 60 SNR values in four cases of tight clusters generated by using the DTB technique. Note that there are no multi-assigned genes in tight clusters as there are no unassigned genes in wide clusters.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC3569426&req=5

pone-0056432-g005: SNR effect over the number of multiply assigned and unassigned genes.(a) The number of multi-assigned genes is plotted over the 60 SNR values in four cases of wide clusters generated by using the TB technique. (b) The number of unassigned genes is plotted over the 60 SNR values in four cases of tight clusters generated by using the DTB technique. Note that there are no multi-assigned genes in tight clusters as there are no unassigned genes in wide clusters.
Mentions: Figure 5(a) shows the numbers of multi-assigned genes () while adopting TB with δ = 0.05, 0.1, 0.2, and 0.4 over the 60 synthetic datasets ordered by their SNR values. It can be noticed for a particular dataset, i.e. a particular SNR value in this plot, the rate of increase in while increasing δ is usually higher for noisier ones. Also, while comparing different datasets with each other, the values of tend to decrease for purer datasets, i.e. for higher SNR datasets.

Bottom Line: Conventional binary and fuzzy clustering do not embrace the biological reality that some genes may be irrelevant for a problem and not be assigned to a cluster, while other genes may participate in several biological functions and should simultaneously belong to multiple clusters.The method has been tested with a set of synthetic datasets and a set of five real yeast cell-cycle datasets.The results demonstrate its validity in generating relevant tight, wide, and complementary clusters that can meet requirements of different gene discovery studies.

View Article: PubMed Central - PubMed

Affiliation: Department of Electrical Engineering and Electronics, The University of Liverpool, Brownlow Hill, Liverpool, United Kingdom.

ABSTRACT
Clustering analysis has a growing role in the study of co-expressed genes for gene discovery. Conventional binary and fuzzy clustering do not embrace the biological reality that some genes may be irrelevant for a problem and not be assigned to a cluster, while other genes may participate in several biological functions and should simultaneously belong to multiple clusters. Also, these algorithms cannot generate tight clusters that focus on their cores or wide clusters that overlap and contain all possibly relevant genes. In this paper, a new clustering paradigm is proposed. In this paradigm, all three eventualities of a gene being exclusively assigned to a single cluster, being assigned to multiple clusters, and being not assigned to any cluster are possible. These possibilities are realised through the primary novelty of the introduction of tunable binarization techniques. Results from multiple clustering experiments are aggregated to generate one fuzzy consensus partition matrix (CoPaM), which is then binarized to obtain the final binary partitions. This is referred to as Binarization of Consensus Partition Matrices (Bi-CoPaM). The method has been tested with a set of synthetic datasets and a set of five real yeast cell-cycle datasets. The results demonstrate its validity in generating relevant tight, wide, and complementary clusters that can meet requirements of different gene discovery studies.

Show MeSH