Limits...
Embracing the Sparse, Noisy, and Interrelated Aspects of Patient Demographics for use in Clinical Medical Record Linkage.

Ash SM, Ip-Lin K - AMIA Jt Summits Transl Sci Proc (2015)

Bottom Line: String similarity approaches ignore the rich semantic information present by reducing it to a simple syntactic comparison of characters.Using a ground truth dataset of a real patient population, we demonstrate that systems built in this fashion improve recall by 76% with little reduction in precision.This result empirically demonstrates the size of the gap between sophisticated systems and naïve approaches.

View Article: PubMed Central - PubMed

Affiliation: University of Memphis, Memphis, TN ; ARGO Data, Richardson, TX.

ABSTRACT
Duplicate patient records in health information systems have received increased attention in recent time due to regulatory incentives to integrate the healthcare enterprise. Historically, most patient record matching systems have been limited to simple applications of the Fellegi-Sunter theory of record linkage with edit distance based string similarity measurements. String similarity approaches ignore the rich semantic information present by reducing it to a simple syntactic comparison of characters. This work describes an updated approach to building clinical medical record linkage systems, which embraces the unavoidable problems present in real-world patient matching. Using a ground truth dataset of a real patient population, we demonstrate that systems built in this fashion improve recall by 76% with little reduction in precision. This result empirically demonstrates the size of the gap between sophisticated systems and naïve approaches. Additionally, it accentuates the difficulty in estimating the false negative error in this setting as previous research has reported much higher levels of recall, due, in part, to measuring from biased samples.

No MeSH data available.


Precision-Recall curve comparing FS-Plus to others
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC4525218&req=5

f2-2074352: Precision-Recall curve comparing FS-Plus to others

Mentions: Figure 2 shows the precision-recall curves for each level of recall. This chart is particularly important for the patient matching problem, in which there are not enough human resources to adjudicate the entire output of the matching process. Thus, it is important to ensure high precision at all levels of recall so that human resources are not wasting time on low quality match work. FS-Plus outperforms other approaches at all levels of recall. The Average Precision metric captures the area under the precision-recall curve. FS-Plus improves the Average Precision by ~76%.


Embracing the Sparse, Noisy, and Interrelated Aspects of Patient Demographics for use in Clinical Medical Record Linkage.

Ash SM, Ip-Lin K - AMIA Jt Summits Transl Sci Proc (2015)

Precision-Recall curve comparing FS-Plus to others
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC4525218&req=5

f2-2074352: Precision-Recall curve comparing FS-Plus to others
Mentions: Figure 2 shows the precision-recall curves for each level of recall. This chart is particularly important for the patient matching problem, in which there are not enough human resources to adjudicate the entire output of the matching process. Thus, it is important to ensure high precision at all levels of recall so that human resources are not wasting time on low quality match work. FS-Plus outperforms other approaches at all levels of recall. The Average Precision metric captures the area under the precision-recall curve. FS-Plus improves the Average Precision by ~76%.

Bottom Line: String similarity approaches ignore the rich semantic information present by reducing it to a simple syntactic comparison of characters.Using a ground truth dataset of a real patient population, we demonstrate that systems built in this fashion improve recall by 76% with little reduction in precision.This result empirically demonstrates the size of the gap between sophisticated systems and naïve approaches.

View Article: PubMed Central - PubMed

Affiliation: University of Memphis, Memphis, TN ; ARGO Data, Richardson, TX.

ABSTRACT
Duplicate patient records in health information systems have received increased attention in recent time due to regulatory incentives to integrate the healthcare enterprise. Historically, most patient record matching systems have been limited to simple applications of the Fellegi-Sunter theory of record linkage with edit distance based string similarity measurements. String similarity approaches ignore the rich semantic information present by reducing it to a simple syntactic comparison of characters. This work describes an updated approach to building clinical medical record linkage systems, which embraces the unavoidable problems present in real-world patient matching. Using a ground truth dataset of a real patient population, we demonstrate that systems built in this fashion improve recall by 76% with little reduction in precision. This result empirically demonstrates the size of the gap between sophisticated systems and naïve approaches. Additionally, it accentuates the difficulty in estimating the false negative error in this setting as previous research has reported much higher levels of recall, due, in part, to measuring from biased samples.

No MeSH data available.