CFam: a chemical families database based on iterative selection of functional seeds and seed-directed compound clustering.
Bottom Line: A database that organizes compounds into similarity-based as well as scaffold-based and property-based families is useful for facilitating these tasks.The functional seeds of CFam families were from hierarchically clustered drugs, bioactive molecules, human metabolites, natural products, patented agents, respectively, which were used to characterize families and cluster compounds into families, superfamilies and classes.Efforts will be made to further expand CFam database and add more functional categories and families based on other types of molecular representations.
Affiliation: Bioinformatics and Drug Design Group, Department of Pharmacy, and Center for Computational Science and Engineering, National University of Singapore, Singapore 117543 State Key Laboratory of Biotherapy and Cancer Center, West China Hospital, West China Medical School, Sichuan University, Chengdu 610041, China Computational and Systems Biology, Singapore-MIT Alliance, National University of Singapore, Singapore.Show MeSH
Mentions: CFam can be searched by three different modes (Figure 1). The first mode enables the search of CFam by inputting a compound name or ID (currently support CFam, Pubchem, Chembl, Zinc and TTD compound IDs), a CFam family name or ID, a CFam superfamily name or ID and a CFam Class ID, respectively. The relevant information may be obtained by clicking the buttons of ‘Molecule’, ‘Family’, ‘Superfamily’ and ‘Class’, respectively. For instance, inputting ‘aspirin’ and then clicking ‘Molecule’ leads to the CFam molecule CFAMM00072836 page which shows that aspirin belongs to the CFam CFFAD534 cyclooxygenase inhibitor salicylate derivative aspirin family (Figure 2). The second mode enables the browsing of CFam families, superfamilies and classes of any functional category, respectively, which can be proceeded by first clicking the ‘Family’, ‘Superfamily’ or ‘Class’ word in the section header titled ‘Browse CFam Family/Superfamily/Class by Functional Category’, and then clicking a specific functional category below the header. For instance, clicking ‘Family’ and then ‘Approved Drug Families’ leads to the page of CFam approved drug families list (Figure 3). The third mode facilitates the alignment of an input compound in SMILES or molecular fingerprint format against CFam seeds to identify CFam families with high, intermediate and remote similarity to the input compound. The list of up to 30 CFam families with at least one seed having 2DF-TC > 0.85 (high similarity family), 0.85 ≥ 2DF-TC > 0.7 (intermediate similarity family) and 0.7 ≥ 2DF-TC > 0.57 (remote similarity) to the input compound is provided. Figure 4 shows the result page of the alignment of aspirin with CFam seeds. To facilitate the development of chemical family databases and the structural and functional analysis of molecules, CFam seeds can be downloaded from the CFam main page (Figure 1).
Affiliation: Bioinformatics and Drug Design Group, Department of Pharmacy, and Center for Computational Science and Engineering, National University of Singapore, Singapore 117543 State Key Laboratory of Biotherapy and Cancer Center, West China Hospital, West China Medical School, Sichuan University, Chengdu 610041, China Computational and Systems Biology, Singapore-MIT Alliance, National University of Singapore, Singapore.