Limits...
Structure-revealing data fusion.

Acar E, Papalexakis EE, Gürdeniz G, Rasmussen MA, Lawaetz AJ, Nilsson M, Bro R - BMC Bioinformatics (2014)

Bottom Line: Fusing data from multiple sources has already proved useful in many applications in social network analysis, signal processing and bioinformatics.Using numerical experiments, we demonstrate the effectiveness of the proposed approach in terms of identifying shared and unshared components.Furthermore, we measure a set of mixtures with known chemical composition using both LC-MS (Liquid Chromatography - Mass Spectrometry) and NMR (Nuclear Magnetic Resonance) and demonstrate that the structure-revealing data fusion model can (i) successfully capture the chemicals in the mixtures and extract the relative concentrations of the chemicals accurately, (ii) provide promising results in terms of identifying shared and unshared chemicals, and (iii) reveal the relevant patterns in LC-MS by coupling with the diffusion NMR data.

View Article: PubMed Central - PubMed

Affiliation: Department of Food Science, Faculty of Science, University of Copenhagen, Frederiksberg C, Denmark. evrim@life.ku.dk.

ABSTRACT

Background: Analysis of data from multiple sources has the potential to enhance knowledge discovery by capturing underlying structures, which are, otherwise, difficult to extract. Fusing data from multiple sources has already proved useful in many applications in social network analysis, signal processing and bioinformatics. However, data fusion is challenging since data from multiple sources are often (i) heterogeneous (i.e., in the form of higher-order tensors and matrices), (ii) incomplete, and (iii) have both shared and unshared components. In order to address these challenges, in this paper, we introduce a novel unsupervised data fusion model based on joint factorization of matrices and higher-order tensors.

Results: While the traditional formulation of coupled matrix and tensor factorizations modeling only shared factors fails to capture the underlying structures in the presence of both shared and unshared factors, the proposed data fusion model has the potential to automatically reveal shared and unshared components through modeling constraints. Using numerical experiments, we demonstrate the effectiveness of the proposed approach in terms of identifying shared and unshared components. Furthermore, we measure a set of mixtures with known chemical composition using both LC-MS (Liquid Chromatography - Mass Spectrometry) and NMR (Nuclear Magnetic Resonance) and demonstrate that the structure-revealing data fusion model can (i) successfully capture the chemicals in the mixtures and extract the relative concentrations of the chemicals accurately, (ii) provide promising results in terms of identifying shared and unshared chemicals, and (iii) reveal the relevant patterns in LC-MS by coupling with the diffusion NMR data.

Conclusions: We have proposed a structure-revealing data fusion model that can jointly analyze heterogeneous, incomplete data sets with shared and unshared components and demonstrated its promising performance as well as potential limitations on both simulated and real data.

Show MeSH
Modeling of more than two data sets using ACMTF.(a) A third-order tensor  coupled with matrices Y and Z in the first mode, (b) Weights λ,σand γcaptured by ACMTF as well as the match score for factor matrix A.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4117975&req=5

Fig9: Modeling of more than two data sets using ACMTF.(a) A third-order tensor coupled with matrices Y and Z in the first mode, (b) Weights λ,σand γcaptured by ACMTF as well as the match score for factor matrix A.

Mentions: Our experiments so far have focused on joint analysis of two data sets. Here, we also demonstrate that the proposed model has a promising performance in terms of identifying shared/unshared factors when more than two data sets are jointly analyzed. We use the coupled data sets given in Figure 9(a) as an illustrative example.Figure 9


Structure-revealing data fusion.

Acar E, Papalexakis EE, Gürdeniz G, Rasmussen MA, Lawaetz AJ, Nilsson M, Bro R - BMC Bioinformatics (2014)

Modeling of more than two data sets using ACMTF.(a) A third-order tensor  coupled with matrices Y and Z in the first mode, (b) Weights λ,σand γcaptured by ACMTF as well as the match score for factor matrix A.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4117975&req=5

Fig9: Modeling of more than two data sets using ACMTF.(a) A third-order tensor coupled with matrices Y and Z in the first mode, (b) Weights λ,σand γcaptured by ACMTF as well as the match score for factor matrix A.
Mentions: Our experiments so far have focused on joint analysis of two data sets. Here, we also demonstrate that the proposed model has a promising performance in terms of identifying shared/unshared factors when more than two data sets are jointly analyzed. We use the coupled data sets given in Figure 9(a) as an illustrative example.Figure 9

Bottom Line: Fusing data from multiple sources has already proved useful in many applications in social network analysis, signal processing and bioinformatics.Using numerical experiments, we demonstrate the effectiveness of the proposed approach in terms of identifying shared and unshared components.Furthermore, we measure a set of mixtures with known chemical composition using both LC-MS (Liquid Chromatography - Mass Spectrometry) and NMR (Nuclear Magnetic Resonance) and demonstrate that the structure-revealing data fusion model can (i) successfully capture the chemicals in the mixtures and extract the relative concentrations of the chemicals accurately, (ii) provide promising results in terms of identifying shared and unshared chemicals, and (iii) reveal the relevant patterns in LC-MS by coupling with the diffusion NMR data.

View Article: PubMed Central - PubMed

Affiliation: Department of Food Science, Faculty of Science, University of Copenhagen, Frederiksberg C, Denmark. evrim@life.ku.dk.

ABSTRACT

Background: Analysis of data from multiple sources has the potential to enhance knowledge discovery by capturing underlying structures, which are, otherwise, difficult to extract. Fusing data from multiple sources has already proved useful in many applications in social network analysis, signal processing and bioinformatics. However, data fusion is challenging since data from multiple sources are often (i) heterogeneous (i.e., in the form of higher-order tensors and matrices), (ii) incomplete, and (iii) have both shared and unshared components. In order to address these challenges, in this paper, we introduce a novel unsupervised data fusion model based on joint factorization of matrices and higher-order tensors.

Results: While the traditional formulation of coupled matrix and tensor factorizations modeling only shared factors fails to capture the underlying structures in the presence of both shared and unshared factors, the proposed data fusion model has the potential to automatically reveal shared and unshared components through modeling constraints. Using numerical experiments, we demonstrate the effectiveness of the proposed approach in terms of identifying shared and unshared components. Furthermore, we measure a set of mixtures with known chemical composition using both LC-MS (Liquid Chromatography - Mass Spectrometry) and NMR (Nuclear Magnetic Resonance) and demonstrate that the structure-revealing data fusion model can (i) successfully capture the chemicals in the mixtures and extract the relative concentrations of the chemicals accurately, (ii) provide promising results in terms of identifying shared and unshared chemicals, and (iii) reveal the relevant patterns in LC-MS by coupling with the diffusion NMR data.

Conclusions: We have proposed a structure-revealing data fusion model that can jointly analyze heterogeneous, incomplete data sets with shared and unshared components and demonstrated its promising performance as well as potential limitations on both simulated and real data.

Show MeSH