Limits...
SimQ: real-time retrieval of similar consumer health questions.

Luo J, Zhang GQ, Wentz S, Cui L, Xu R - J. Med. Internet Res. (2015)

Bottom Line: The results show that SimQ reached the highest precision of 72.2%, recall of 78.0%, and F-score of 75.0% when using compositional feature representations.We demonstrated that SimQ complements the existing Q&A services of Netwellness, a not-for-profit community-based consumer health information service that consists of nearly 70,000 Q&As and serves over 3 million users each year.SimQ not only reduces response delay by instantly providing closely related questions and answers, but also helps consumers to improve the understanding of their health concerns.

View Article: PubMed Central - HTML - PubMed

Affiliation: Center for Biomedical Data and Language Processsing, Department of Health Informatics and Administration, University of Wisconsin Milwaukee, Milwaukee, WI, United States. luojake@gmail.com.

ABSTRACT

Background: There has been a significant increase in the popularity of Web-based question-and-answer (Q&A) services that provide health care information for consumers. Large amounts of Q&As have been archived in these online communities, which form a valuable knowledge base for consumers who seek answers to their health care concerns. However, due to consumers' possible lack of professional knowledge, it is still very challenging for them to find Q&As that are closely relevant to their own health problems. Consumers often repeatedly ask similar questions that have already been answered previously by other users.

Objective: In this study, we aim to develop efficient informatics methods that can retrieve similar Web-based consumer health questions using syntactic and semantic analysis.

Methods: We propose the "SimQ" to achieve this objective. SimQ is an informatics framework that compares the similarity of archived health questions and retrieves answers to satisfy consumers' information needs. Statistical syntactic parsing was used to analyze each question's syntactic structure. Standardized Unified Medical Language System (UMLS) was employed to annotate semantic types and extract medical concepts. Finally, the similarity between sentences was calculated using both semantic and syntactic features.

Results: We used 2000 randomly selected consumer questions to evaluate the system's performance. The results show that SimQ reached the highest precision of 72.2%, recall of 78.0%, and F-score of 75.0% when using compositional feature representations.

Conclusions: We demonstrated that SimQ complements the existing Q&A services of Netwellness, a not-for-profit community-based consumer health information service that consists of nearly 70,000 Q&As and serves over 3 million users each year. SimQ not only reduces response delay by instantly providing closely related questions and answers, but also helps consumers to improve the understanding of their health concerns.

Show MeSH
Dice coefficient (1) and cosine similarity (2) formulas.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4376128&req=5

figure3: Dice coefficient (1) and cosine similarity (2) formulas.

Mentions: Dice coefficient and cosine similarity are the algorithms that are employed for calculating similarity in this paper. Dice coefficient (DC) and cosine similarity (CS) (see Figure 3) were used to evaluate the similarity score between questions. The similarity score has a value range of 0-1. A score of zero means two questions are not similar at all, and a score of one means that they are completely the same. Assuming that there are two feature sets Q1 and Q2 that are generated from two different consumer questions, we can then calculate the DC and CS similarity scores through the formulas in Figure 3.


SimQ: real-time retrieval of similar consumer health questions.

Luo J, Zhang GQ, Wentz S, Cui L, Xu R - J. Med. Internet Res. (2015)

Dice coefficient (1) and cosine similarity (2) formulas.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4376128&req=5

figure3: Dice coefficient (1) and cosine similarity (2) formulas.
Mentions: Dice coefficient and cosine similarity are the algorithms that are employed for calculating similarity in this paper. Dice coefficient (DC) and cosine similarity (CS) (see Figure 3) were used to evaluate the similarity score between questions. The similarity score has a value range of 0-1. A score of zero means two questions are not similar at all, and a score of one means that they are completely the same. Assuming that there are two feature sets Q1 and Q2 that are generated from two different consumer questions, we can then calculate the DC and CS similarity scores through the formulas in Figure 3.

Bottom Line: The results show that SimQ reached the highest precision of 72.2%, recall of 78.0%, and F-score of 75.0% when using compositional feature representations.We demonstrated that SimQ complements the existing Q&A services of Netwellness, a not-for-profit community-based consumer health information service that consists of nearly 70,000 Q&As and serves over 3 million users each year.SimQ not only reduces response delay by instantly providing closely related questions and answers, but also helps consumers to improve the understanding of their health concerns.

View Article: PubMed Central - HTML - PubMed

Affiliation: Center for Biomedical Data and Language Processsing, Department of Health Informatics and Administration, University of Wisconsin Milwaukee, Milwaukee, WI, United States. luojake@gmail.com.

ABSTRACT

Background: There has been a significant increase in the popularity of Web-based question-and-answer (Q&A) services that provide health care information for consumers. Large amounts of Q&As have been archived in these online communities, which form a valuable knowledge base for consumers who seek answers to their health care concerns. However, due to consumers' possible lack of professional knowledge, it is still very challenging for them to find Q&As that are closely relevant to their own health problems. Consumers often repeatedly ask similar questions that have already been answered previously by other users.

Objective: In this study, we aim to develop efficient informatics methods that can retrieve similar Web-based consumer health questions using syntactic and semantic analysis.

Methods: We propose the "SimQ" to achieve this objective. SimQ is an informatics framework that compares the similarity of archived health questions and retrieves answers to satisfy consumers' information needs. Statistical syntactic parsing was used to analyze each question's syntactic structure. Standardized Unified Medical Language System (UMLS) was employed to annotate semantic types and extract medical concepts. Finally, the similarity between sentences was calculated using both semantic and syntactic features.

Results: We used 2000 randomly selected consumer questions to evaluate the system's performance. The results show that SimQ reached the highest precision of 72.2%, recall of 78.0%, and F-score of 75.0% when using compositional feature representations.

Conclusions: We demonstrated that SimQ complements the existing Q&A services of Netwellness, a not-for-profit community-based consumer health information service that consists of nearly 70,000 Q&As and serves over 3 million users each year. SimQ not only reduces response delay by instantly providing closely related questions and answers, but also helps consumers to improve the understanding of their health concerns.

Show MeSH