Limits...
PyMix--the python mixture package--a tool for clustering of heterogeneous biological data.

Georgi B, Costa IG, Schliep A - BMC Bioinformatics (2010)

Bottom Line: Mixtures are versatile and powerful statistical models which perform robustly for clustering in the presence of noise and have been successfully applied in a wide range of applications.PyMix has been successfully used for the analysis of biological sequence, complex disease and gene expression data.Due to the general nature of the framework, PyMix can be applied to a wide range of applications and data sets.

View Article: PubMed Central - HTML - PubMed

Affiliation: Max Planck Institute for Molecular Genetics, Dept, of Computational Molecular Biology, Ihnestrasse 73, 14195 Berlin. bgeorgi@mail.med.upenn.edu

ABSTRACT

Background: Cluster analysis is an important technique for the exploratory analysis of biological data. Such data is often high-dimensional, inherently noisy and contains outliers. This makes clustering challenging. Mixtures are versatile and powerful statistical models which perform robustly for clustering in the presence of noise and have been successfully applied in a wide range of applications.

Results: PyMix - the Python mixture package implements algorithms and data structures for clustering with basic and advanced mixture models. The advanced models include context-specific independence mixtures, mixtures of dependence trees and semi-supervised learning. PyMix is licenced under the GNU General Public licence (GPL). PyMix has been successfully used for the analysis of biological sequence, complex disease and gene expression data.

Conclusions: PyMix is a useful tool for cluster analysis of biological data. Due to the general nature of the framework, PyMix can be applied to a wide range of applications and data sets.

Show MeSH

Related in: MedlinePlus

We depict the dependence trees and the expression patterns of 2 groups of a Lymphoid development data [21]. Expression profiles are indicated as a heat-map, where red values indicate over-expression and green values indicate under-expression. Lines correspond to genes and columns correspond to the developmental stage ordered as in the corresponding dependence tree. In the left cluster, genes have a over-expression patterns for T cell related stages (stages in blue); while the cluster in the right, we have over-expression of B cell related stages (stages in orange). The dependence tree of each cluster reflects the co-expression of developmental stages within the clusters.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2823712&req=5

Figure 6: We depict the dependence trees and the expression patterns of 2 groups of a Lymphoid development data [21]. Expression profiles are indicated as a heat-map, where red values indicate over-expression and green values indicate under-expression. Lines correspond to genes and columns correspond to the developmental stage ordered as in the corresponding dependence tree. In the left cluster, genes have a over-expression patterns for T cell related stages (stages in blue); while the cluster in the right, we have over-expression of B cell related stages (stages in orange). The dependence tree of each cluster reflects the co-expression of developmental stages within the clusters.

Mentions: One particular application is the analysis of patterns of gene expression in the distinct stages of a developmental tree, the developmental profiles of genes. It is assumed that, in development, the sequence of changes from a stem cell to a particular mature cell, as described by a developmental tree, are the most important in modeling gene expression from developmental processes. For example in [21], we analyzed a gene expression compendia of Lymphoid development, which contained expression from lymphoid stem cells, B cells, T cells and Natural Killer cells (depicted in green, orange, blue and yellow respectively in Figure 6).


PyMix--the python mixture package--a tool for clustering of heterogeneous biological data.

Georgi B, Costa IG, Schliep A - BMC Bioinformatics (2010)

We depict the dependence trees and the expression patterns of 2 groups of a Lymphoid development data [21]. Expression profiles are indicated as a heat-map, where red values indicate over-expression and green values indicate under-expression. Lines correspond to genes and columns correspond to the developmental stage ordered as in the corresponding dependence tree. In the left cluster, genes have a over-expression patterns for T cell related stages (stages in blue); while the cluster in the right, we have over-expression of B cell related stages (stages in orange). The dependence tree of each cluster reflects the co-expression of developmental stages within the clusters.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2823712&req=5

Figure 6: We depict the dependence trees and the expression patterns of 2 groups of a Lymphoid development data [21]. Expression profiles are indicated as a heat-map, where red values indicate over-expression and green values indicate under-expression. Lines correspond to genes and columns correspond to the developmental stage ordered as in the corresponding dependence tree. In the left cluster, genes have a over-expression patterns for T cell related stages (stages in blue); while the cluster in the right, we have over-expression of B cell related stages (stages in orange). The dependence tree of each cluster reflects the co-expression of developmental stages within the clusters.
Mentions: One particular application is the analysis of patterns of gene expression in the distinct stages of a developmental tree, the developmental profiles of genes. It is assumed that, in development, the sequence of changes from a stem cell to a particular mature cell, as described by a developmental tree, are the most important in modeling gene expression from developmental processes. For example in [21], we analyzed a gene expression compendia of Lymphoid development, which contained expression from lymphoid stem cells, B cells, T cells and Natural Killer cells (depicted in green, orange, blue and yellow respectively in Figure 6).

Bottom Line: Mixtures are versatile and powerful statistical models which perform robustly for clustering in the presence of noise and have been successfully applied in a wide range of applications.PyMix has been successfully used for the analysis of biological sequence, complex disease and gene expression data.Due to the general nature of the framework, PyMix can be applied to a wide range of applications and data sets.

View Article: PubMed Central - HTML - PubMed

Affiliation: Max Planck Institute for Molecular Genetics, Dept, of Computational Molecular Biology, Ihnestrasse 73, 14195 Berlin. bgeorgi@mail.med.upenn.edu

ABSTRACT

Background: Cluster analysis is an important technique for the exploratory analysis of biological data. Such data is often high-dimensional, inherently noisy and contains outliers. This makes clustering challenging. Mixtures are versatile and powerful statistical models which perform robustly for clustering in the presence of noise and have been successfully applied in a wide range of applications.

Results: PyMix - the Python mixture package implements algorithms and data structures for clustering with basic and advanced mixture models. The advanced models include context-specific independence mixtures, mixtures of dependence trees and semi-supervised learning. PyMix is licenced under the GNU General Public licence (GPL). PyMix has been successfully used for the analysis of biological sequence, complex disease and gene expression data.

Conclusions: PyMix is a useful tool for cluster analysis of biological data. Due to the general nature of the framework, PyMix can be applied to a wide range of applications and data sets.

Show MeSH
Related in: MedlinePlus