Limits...
The Molecule Cloud - compact visualization of large collections of molecules.

Ertl P, Rohde B - J Cheminform (2012)

Bottom Line: Various sophisticated methods are available to perform this task, including clustering, dimensionality reduction or scaffold frequency analysis.The layout algorithm is available as open source Java code.Several example visualizations of large data sets, including PubChem, ChEMBL and ZINC databases using the Molecule Cloud diagrams are provided.

View Article: PubMed Central - HTML - PubMed

Affiliation: Novartis Institutes for BioMedical Research, Novartis Campus, CH-4056, Basel, Switzerland. peter.ertl@novartis.com.

ABSTRACT

Background: Analysis and visualization of large collections of molecules is one of the most frequent challenges cheminformatics experts in pharmaceutical industry are facing. Various sophisticated methods are available to perform this task, including clustering, dimensionality reduction or scaffold frequency analysis. In any case, however, viewing and analyzing large tables with molecular structures is necessary. We present a new visualization technique, providing basic information about the composition of molecular data sets at a single glance.

Summary: A method is presented here allowing visual representation of the most common structural features of chemical databases in a form of a cloud diagram. The frequency of molecules containing particular substructure is indicated by the size of respective structural image. The method is useful to quickly perceive the most prominent structural features present in the data set. This approach was inspired by popular word cloud diagrams that are used to visualize textual information in a compact form. Therefore we call this approach "Molecule Cloud". The method also supports visualization of additional information, for example biological activity of molecules containing this scaffold or the protein target class typical for particular scaffolds, by color coding. Detailed description of the algorithm is provided, allowing easy implementation of the method by any cheminformatics toolkit. The layout algorithm is available as open source Java code.

Conclusions: Visualization of large molecular data sets using the Molecule Cloud approach allows scientists to get information about the composition of molecular databases and their most frequent structural features easily. The method may be used in the areas where analysis of large molecular collections is needed, for example processing of high throughput screening results, virtual screening or compound purchasing. Several example visualizations of large data sets, including PubChem, ChEMBL and ZINC databases using the Molecule Cloud diagrams are provided.

No MeSH data available.


Molecule Cloud for molecules in the PubChem database. Magenta background indicates presence of biological activity.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3403880&req=5

Figure 3: Molecule Cloud for molecules in the PubChem database. Magenta background indicates presence of biological activity.

Mentions: In this section Molecule Cloud diagrams are presented for several popular publicly available data sets. PubChem [11] is the largest publicly available molecular structure database. In June 2012 it contained nearly 33 million unique structures. The Molecule Cloud of PubChem is shown in Figure 3. In this image, scaffolds of bioactive molecules are indicated by magenta background, where the color intensity is proportional to the ratio between bioactive and all molecules containing this scaffold. Bioactive molecules were identified by the PubChem advanced search as molecules having activity better than 10 μm in any PubChem assay.


The Molecule Cloud - compact visualization of large collections of molecules.

Ertl P, Rohde B - J Cheminform (2012)

Molecule Cloud for molecules in the PubChem database. Magenta background indicates presence of biological activity.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3403880&req=5

Figure 3: Molecule Cloud for molecules in the PubChem database. Magenta background indicates presence of biological activity.
Mentions: In this section Molecule Cloud diagrams are presented for several popular publicly available data sets. PubChem [11] is the largest publicly available molecular structure database. In June 2012 it contained nearly 33 million unique structures. The Molecule Cloud of PubChem is shown in Figure 3. In this image, scaffolds of bioactive molecules are indicated by magenta background, where the color intensity is proportional to the ratio between bioactive and all molecules containing this scaffold. Bioactive molecules were identified by the PubChem advanced search as molecules having activity better than 10 μm in any PubChem assay.

Bottom Line: Various sophisticated methods are available to perform this task, including clustering, dimensionality reduction or scaffold frequency analysis.The layout algorithm is available as open source Java code.Several example visualizations of large data sets, including PubChem, ChEMBL and ZINC databases using the Molecule Cloud diagrams are provided.

View Article: PubMed Central - HTML - PubMed

Affiliation: Novartis Institutes for BioMedical Research, Novartis Campus, CH-4056, Basel, Switzerland. peter.ertl@novartis.com.

ABSTRACT

Background: Analysis and visualization of large collections of molecules is one of the most frequent challenges cheminformatics experts in pharmaceutical industry are facing. Various sophisticated methods are available to perform this task, including clustering, dimensionality reduction or scaffold frequency analysis. In any case, however, viewing and analyzing large tables with molecular structures is necessary. We present a new visualization technique, providing basic information about the composition of molecular data sets at a single glance.

Summary: A method is presented here allowing visual representation of the most common structural features of chemical databases in a form of a cloud diagram. The frequency of molecules containing particular substructure is indicated by the size of respective structural image. The method is useful to quickly perceive the most prominent structural features present in the data set. This approach was inspired by popular word cloud diagrams that are used to visualize textual information in a compact form. Therefore we call this approach "Molecule Cloud". The method also supports visualization of additional information, for example biological activity of molecules containing this scaffold or the protein target class typical for particular scaffolds, by color coding. Detailed description of the algorithm is provided, allowing easy implementation of the method by any cheminformatics toolkit. The layout algorithm is available as open source Java code.

Conclusions: Visualization of large molecular data sets using the Molecule Cloud approach allows scientists to get information about the composition of molecular databases and their most frequent structural features easily. The method may be used in the areas where analysis of large molecular collections is needed, for example processing of high throughput screening results, virtual screening or compound purchasing. Several example visualizations of large data sets, including PubChem, ChEMBL and ZINC databases using the Molecule Cloud diagrams are provided.

No MeSH data available.