Limits...
PyMix--the python mixture package--a tool for clustering of heterogeneous biological data.

Georgi B, Costa IG, Schliep A - BMC Bioinformatics (2010)

Bottom Line: Mixtures are versatile and powerful statistical models which perform robustly for clustering in the presence of noise and have been successfully applied in a wide range of applications.PyMix has been successfully used for the analysis of biological sequence, complex disease and gene expression data.Due to the general nature of the framework, PyMix can be applied to a wide range of applications and data sets.

View Article: PubMed Central - HTML - PubMed

Affiliation: Max Planck Institute for Molecular Genetics, Dept, of Computational Molecular Biology, Ihnestrasse 73, 14195 Berlin. bgeorgi@mail.med.upenn.edu

ABSTRACT

Background: Cluster analysis is an important technique for the exploratory analysis of biological data. Such data is often high-dimensional, inherently noisy and contains outliers. This makes clustering challenging. Mixtures are versatile and powerful statistical models which perform robustly for clustering in the presence of noise and have been successfully applied in a wide range of applications.

Results: PyMix - the Python mixture package implements algorithms and data structures for clustering with basic and advanced mixture models. The advanced models include context-specific independence mixtures, mixtures of dependence trees and semi-supervised learning. PyMix is licenced under the GNU General Public licence (GPL). PyMix has been successfully used for the analysis of biological sequence, complex disease and gene expression data.

Conclusions: PyMix is a useful tool for cluster analysis of biological data. Due to the general nature of the framework, PyMix can be applied to a wide range of applications and data sets.

Show MeSH

Related in: MedlinePlus

WebLogos http://weblogo.berkeley.edu for the two subgroups of Leu3 binding sites. It can be seen that the positions with strong sequence variability (positions 1, 4, 5 and 6) have been recognized by the CSI structure (indicated by arrows). (Figure reproduced from [13].)
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2823712&req=5

Figure 5: WebLogos http://weblogo.berkeley.edu for the two subgroups of Leu3 binding sites. It can be seen that the positions with strong sequence variability (positions 1, 4, 5 and 6) have been recognized by the CSI structure (indicated by arrows). (Figure reproduced from [13].)

Mentions: An example for the two clusters of binding sites found for the transcription factor Leu3 is shown in Figure 5. The double arrows indicate the positions where the learned CSI structure assigned a separate distribution for each cluster. These positions coincide with the strongest difference in the sequence composition of the two clusters. In a comparison on a data set of 64 JASPAR [26] transcription factors, the CSI mixtures outperformed conventional mixtures and positional weight matrices with respect to human-mouse sequence conservation of predicted hits [13].


PyMix--the python mixture package--a tool for clustering of heterogeneous biological data.

Georgi B, Costa IG, Schliep A - BMC Bioinformatics (2010)

WebLogos http://weblogo.berkeley.edu for the two subgroups of Leu3 binding sites. It can be seen that the positions with strong sequence variability (positions 1, 4, 5 and 6) have been recognized by the CSI structure (indicated by arrows). (Figure reproduced from [13].)
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2823712&req=5

Figure 5: WebLogos http://weblogo.berkeley.edu for the two subgroups of Leu3 binding sites. It can be seen that the positions with strong sequence variability (positions 1, 4, 5 and 6) have been recognized by the CSI structure (indicated by arrows). (Figure reproduced from [13].)
Mentions: An example for the two clusters of binding sites found for the transcription factor Leu3 is shown in Figure 5. The double arrows indicate the positions where the learned CSI structure assigned a separate distribution for each cluster. These positions coincide with the strongest difference in the sequence composition of the two clusters. In a comparison on a data set of 64 JASPAR [26] transcription factors, the CSI mixtures outperformed conventional mixtures and positional weight matrices with respect to human-mouse sequence conservation of predicted hits [13].

Bottom Line: Mixtures are versatile and powerful statistical models which perform robustly for clustering in the presence of noise and have been successfully applied in a wide range of applications.PyMix has been successfully used for the analysis of biological sequence, complex disease and gene expression data.Due to the general nature of the framework, PyMix can be applied to a wide range of applications and data sets.

View Article: PubMed Central - HTML - PubMed

Affiliation: Max Planck Institute for Molecular Genetics, Dept, of Computational Molecular Biology, Ihnestrasse 73, 14195 Berlin. bgeorgi@mail.med.upenn.edu

ABSTRACT

Background: Cluster analysis is an important technique for the exploratory analysis of biological data. Such data is often high-dimensional, inherently noisy and contains outliers. This makes clustering challenging. Mixtures are versatile and powerful statistical models which perform robustly for clustering in the presence of noise and have been successfully applied in a wide range of applications.

Results: PyMix - the Python mixture package implements algorithms and data structures for clustering with basic and advanced mixture models. The advanced models include context-specific independence mixtures, mixtures of dependence trees and semi-supervised learning. PyMix is licenced under the GNU General Public licence (GPL). PyMix has been successfully used for the analysis of biological sequence, complex disease and gene expression data.

Conclusions: PyMix is a useful tool for cluster analysis of biological data. Due to the general nature of the framework, PyMix can be applied to a wide range of applications and data sets.

Show MeSH
Related in: MedlinePlus