Limits...
Anonymous Record Linkage Between EPR and CDW-H: Toward Development of a Federated Genotype-Phenotype System.

Pu D, Garantziotis S, Mostafa J - AMIA Jt Summits Transl Sci Proc (2013)

Bottom Line: CDW-H contains clinically-relevant data for patients who have been admitted to UNC healthcare system.To validate the feasibility of linking EPR with CDWH, the number of matching records between the two databases had to be established.Preliminary results showed that combination of last name, gender, data of birth and zip code would generate over 2,700 matches between the two databases.

View Article: PubMed Central - PubMed

Affiliation: Laboratory of Applied Informatics Research, University of North Carolina-Chapel Hill, Chapel Hill, NC, USA ; The North Carolina Translational & Clinical Science (TraCS) Institute, Chapel Hill, NC, USA.

ABSTRACT
Environmental Polymorphisms Registry (EPR) is a large-scale phenotype-by-genotype registry developed by National Institute of Environmental Health Sciences to facilitate translational research. The link between personal identity and collected genomic data was preserved in EPR which creates opportunities for EPR to be linked to phenotype-rich databases, such as the Carolina Data Warehouse for Health (CDW-H) located at the University of North Carolina hospital system. CDW-H contains clinically-relevant data for patients who have been admitted to UNC healthcare system. To validate the feasibility of linking EPR with CDWH, the number of matching records between the two databases had to be established. To that end, combinations of subjects' demographic identifiers from both databases were converted to anonymized hash codes, which were then matched to determine the number of overlapping records. Preliminary results showed that combination of last name, gender, data of birth and zip code would generate over 2,700 matches between the two databases.

No MeSH data available.


Related in: MedlinePlus

Pseudo code for anonymized record matching.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC3814467&req=5

f2-amia_tbi_2013_143: Pseudo code for anonymized record matching.

Mentions: Since there are over two million health records in CDW-H and over 15,000 records in EPR, it would have been computationally expensive if we tried to match them directly. To decrease the running time, the anonymized hash codes were sorted for both datasets in increasing order and then compared according to the pseudo code illustrated in figure 2. To ensure the accuracy of the matching process, at least one of the datasets should be duplicate-free. We removed duplicate hash codes for EPR dataset since it had smaller cardinality.


Anonymous Record Linkage Between EPR and CDW-H: Toward Development of a Federated Genotype-Phenotype System.

Pu D, Garantziotis S, Mostafa J - AMIA Jt Summits Transl Sci Proc (2013)

Pseudo code for anonymized record matching.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC3814467&req=5

f2-amia_tbi_2013_143: Pseudo code for anonymized record matching.
Mentions: Since there are over two million health records in CDW-H and over 15,000 records in EPR, it would have been computationally expensive if we tried to match them directly. To decrease the running time, the anonymized hash codes were sorted for both datasets in increasing order and then compared according to the pseudo code illustrated in figure 2. To ensure the accuracy of the matching process, at least one of the datasets should be duplicate-free. We removed duplicate hash codes for EPR dataset since it had smaller cardinality.

Bottom Line: CDW-H contains clinically-relevant data for patients who have been admitted to UNC healthcare system.To validate the feasibility of linking EPR with CDWH, the number of matching records between the two databases had to be established.Preliminary results showed that combination of last name, gender, data of birth and zip code would generate over 2,700 matches between the two databases.

View Article: PubMed Central - PubMed

Affiliation: Laboratory of Applied Informatics Research, University of North Carolina-Chapel Hill, Chapel Hill, NC, USA ; The North Carolina Translational & Clinical Science (TraCS) Institute, Chapel Hill, NC, USA.

ABSTRACT
Environmental Polymorphisms Registry (EPR) is a large-scale phenotype-by-genotype registry developed by National Institute of Environmental Health Sciences to facilitate translational research. The link between personal identity and collected genomic data was preserved in EPR which creates opportunities for EPR to be linked to phenotype-rich databases, such as the Carolina Data Warehouse for Health (CDW-H) located at the University of North Carolina hospital system. CDW-H contains clinically-relevant data for patients who have been admitted to UNC healthcare system. To validate the feasibility of linking EPR with CDWH, the number of matching records between the two databases had to be established. To that end, combinations of subjects' demographic identifiers from both databases were converted to anonymized hash codes, which were then matched to determine the number of overlapping records. Preliminary results showed that combination of last name, gender, data of birth and zip code would generate over 2,700 matches between the two databases.

No MeSH data available.


Related in: MedlinePlus