Limits...
Embracing the Sparse, Noisy, and Interrelated Aspects of Patient Demographics for use in Clinical Medical Record Linkage.

Ash SM, Ip-Lin K - AMIA Jt Summits Transl Sci Proc (2015)

Bottom Line: String similarity approaches ignore the rich semantic information present by reducing it to a simple syntactic comparison of characters.Using a ground truth dataset of a real patient population, we demonstrate that systems built in this fashion improve recall by 76% with little reduction in precision.This result empirically demonstrates the size of the gap between sophisticated systems and naïve approaches.

View Article: PubMed Central - PubMed

Affiliation: University of Memphis, Memphis, TN ; ARGO Data, Richardson, TX.

ABSTRACT
Duplicate patient records in health information systems have received increased attention in recent time due to regulatory incentives to integrate the healthcare enterprise. Historically, most patient record matching systems have been limited to simple applications of the Fellegi-Sunter theory of record linkage with edit distance based string similarity measurements. String similarity approaches ignore the rich semantic information present by reducing it to a simple syntactic comparison of characters. This work describes an updated approach to building clinical medical record linkage systems, which embraces the unavoidable problems present in real-world patient matching. Using a ground truth dataset of a real patient population, we demonstrate that systems built in this fashion improve recall by 76% with little reduction in precision. This result empirically demonstrates the size of the gap between sophisticated systems and naïve approaches. Additionally, it accentuates the difficulty in estimating the false negative error in this setting as previous research has reported much higher levels of recall, due, in part, to measuring from biased samples.

No MeSH data available.


Typical processing pipeline for patient matching
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC4525218&req=5

f1-2074352: Typical processing pipeline for patient matching

Mentions: In the clinical setting, matching occurs at the point of registration for a patient encounter. The input is patient demographics, and the output is a ranked list of possibly matching records. Figure 1 shows the typical processing pipeline for an industrial patient matching system. Previous work describes each step in this pipeline in greater detail8. The overall quality of the patient matching system depends on the quality of each step in this pipeline. Here we focus on the weigh agreement vector step, which has received significant attention from the research community. Probabilistic patient matching systems are built upon the Fellegi-Sunter theory of record linkage1 (FS), which describes an optimal decision rule to classify a record pair-wise agreement vector into match (M), maybe match (C), and non-match (U) classes. The agreement vector contains one dimension for each demographic attribute (e.g. Given name, Family name, Sex), with the value of the dimension showing the level of matches for that attribute.


Embracing the Sparse, Noisy, and Interrelated Aspects of Patient Demographics for use in Clinical Medical Record Linkage.

Ash SM, Ip-Lin K - AMIA Jt Summits Transl Sci Proc (2015)

Typical processing pipeline for patient matching
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC4525218&req=5

f1-2074352: Typical processing pipeline for patient matching
Mentions: In the clinical setting, matching occurs at the point of registration for a patient encounter. The input is patient demographics, and the output is a ranked list of possibly matching records. Figure 1 shows the typical processing pipeline for an industrial patient matching system. Previous work describes each step in this pipeline in greater detail8. The overall quality of the patient matching system depends on the quality of each step in this pipeline. Here we focus on the weigh agreement vector step, which has received significant attention from the research community. Probabilistic patient matching systems are built upon the Fellegi-Sunter theory of record linkage1 (FS), which describes an optimal decision rule to classify a record pair-wise agreement vector into match (M), maybe match (C), and non-match (U) classes. The agreement vector contains one dimension for each demographic attribute (e.g. Given name, Family name, Sex), with the value of the dimension showing the level of matches for that attribute.

Bottom Line: String similarity approaches ignore the rich semantic information present by reducing it to a simple syntactic comparison of characters.Using a ground truth dataset of a real patient population, we demonstrate that systems built in this fashion improve recall by 76% with little reduction in precision.This result empirically demonstrates the size of the gap between sophisticated systems and naïve approaches.

View Article: PubMed Central - PubMed

Affiliation: University of Memphis, Memphis, TN ; ARGO Data, Richardson, TX.

ABSTRACT
Duplicate patient records in health information systems have received increased attention in recent time due to regulatory incentives to integrate the healthcare enterprise. Historically, most patient record matching systems have been limited to simple applications of the Fellegi-Sunter theory of record linkage with edit distance based string similarity measurements. String similarity approaches ignore the rich semantic information present by reducing it to a simple syntactic comparison of characters. This work describes an updated approach to building clinical medical record linkage systems, which embraces the unavoidable problems present in real-world patient matching. Using a ground truth dataset of a real patient population, we demonstrate that systems built in this fashion improve recall by 76% with little reduction in precision. This result empirically demonstrates the size of the gap between sophisticated systems and naïve approaches. Additionally, it accentuates the difficulty in estimating the false negative error in this setting as previous research has reported much higher levels of recall, due, in part, to measuring from biased samples.

No MeSH data available.