Limits...
PyMix--the python mixture package--a tool for clustering of heterogeneous biological data.

Georgi B, Costa IG, Schliep A - BMC Bioinformatics (2010)

Bottom Line: Mixtures are versatile and powerful statistical models which perform robustly for clustering in the presence of noise and have been successfully applied in a wide range of applications.PyMix has been successfully used for the analysis of biological sequence, complex disease and gene expression data.Due to the general nature of the framework, PyMix can be applied to a wide range of applications and data sets.

View Article: PubMed Central - HTML - PubMed

Affiliation: Max Planck Institute for Molecular Genetics, Dept, of Computational Molecular Biology, Ihnestrasse 73, 14195 Berlin. bgeorgi@mail.med.upenn.edu

ABSTRACT

Background: Cluster analysis is an important technique for the exploratory analysis of biological data. Such data is often high-dimensional, inherently noisy and contains outliers. This makes clustering challenging. Mixtures are versatile and powerful statistical models which perform robustly for clustering in the presence of noise and have been successfully applied in a wide range of applications.

Results: PyMix - the Python mixture package implements algorithms and data structures for clustering with basic and advanced mixture models. The advanced models include context-specific independence mixtures, mixtures of dependence trees and semi-supervised learning. PyMix is licenced under the GNU General Public licence (GPL). PyMix has been successfully used for the analysis of biological sequence, complex disease and gene expression data.

Conclusions: PyMix is a useful tool for cluster analysis of biological data. Due to the general nature of the framework, PyMix can be applied to a wide range of applications and data sets.

Show MeSH

Related in: MedlinePlus

a) Model structure for a conventional mixture with five components and four RVs. Each cell of the matrix represents a distribution in the mixture and every RV has an unique distribution in each component. b) CSI model structure. Multiple components may share the same distribution for a RV as indicated by the matrix cells spanning multiple rows. In example C2, C3 and C4 share the same distribution for X2.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2823712&req=5

Figure 1: a) Model structure for a conventional mixture with five components and four RVs. Each cell of the matrix represents a distribution in the mixture and every RV has an unique distribution in each component. b) CSI model structure. Multiple components may share the same distribution for a RV as indicated by the matrix cells spanning multiple rows. In example C2, C3 and C4 share the same distribution for X2.

Mentions: parameters θkj for each component k and feature Xj. This can be visualized in a matrix as shown in Figure 1a). Here each cell in the matrix represent one of the θkj. The different values of the parameters for each feature and component express the regularities in the data which characterize and distinguish the components. The basic idea of the context-specific independence (CSI) [9] extension to the mixture framework is that very often the regularities found in the data do not require a separate set of parameters for all features in every component. Rather there will be features where several component share a parameterization.


PyMix--the python mixture package--a tool for clustering of heterogeneous biological data.

Georgi B, Costa IG, Schliep A - BMC Bioinformatics (2010)

a) Model structure for a conventional mixture with five components and four RVs. Each cell of the matrix represents a distribution in the mixture and every RV has an unique distribution in each component. b) CSI model structure. Multiple components may share the same distribution for a RV as indicated by the matrix cells spanning multiple rows. In example C2, C3 and C4 share the same distribution for X2.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2823712&req=5

Figure 1: a) Model structure for a conventional mixture with five components and four RVs. Each cell of the matrix represents a distribution in the mixture and every RV has an unique distribution in each component. b) CSI model structure. Multiple components may share the same distribution for a RV as indicated by the matrix cells spanning multiple rows. In example C2, C3 and C4 share the same distribution for X2.
Mentions: parameters θkj for each component k and feature Xj. This can be visualized in a matrix as shown in Figure 1a). Here each cell in the matrix represent one of the θkj. The different values of the parameters for each feature and component express the regularities in the data which characterize and distinguish the components. The basic idea of the context-specific independence (CSI) [9] extension to the mixture framework is that very often the regularities found in the data do not require a separate set of parameters for all features in every component. Rather there will be features where several component share a parameterization.

Bottom Line: Mixtures are versatile and powerful statistical models which perform robustly for clustering in the presence of noise and have been successfully applied in a wide range of applications.PyMix has been successfully used for the analysis of biological sequence, complex disease and gene expression data.Due to the general nature of the framework, PyMix can be applied to a wide range of applications and data sets.

View Article: PubMed Central - HTML - PubMed

Affiliation: Max Planck Institute for Molecular Genetics, Dept, of Computational Molecular Biology, Ihnestrasse 73, 14195 Berlin. bgeorgi@mail.med.upenn.edu

ABSTRACT

Background: Cluster analysis is an important technique for the exploratory analysis of biological data. Such data is often high-dimensional, inherently noisy and contains outliers. This makes clustering challenging. Mixtures are versatile and powerful statistical models which perform robustly for clustering in the presence of noise and have been successfully applied in a wide range of applications.

Results: PyMix - the Python mixture package implements algorithms and data structures for clustering with basic and advanced mixture models. The advanced models include context-specific independence mixtures, mixtures of dependence trees and semi-supervised learning. PyMix is licenced under the GNU General Public licence (GPL). PyMix has been successfully used for the analysis of biological sequence, complex disease and gene expression data.

Conclusions: PyMix is a useful tool for cluster analysis of biological data. Due to the general nature of the framework, PyMix can be applied to a wide range of applications and data sets.

Show MeSH
Related in: MedlinePlus