Limits...
Ensuring Confidentiality of Geocoded Health Data: Assessing Geographic Masking Strategies for Individual-Level Data.

Zandbergen PA - Adv Med (2014)

Bottom Line: This typically consists of applying a certain amount of random perturbation in a systematic manner to reduce the risk of reidentification.A number of geographic masking techniques have been developed as well as methods to quantity the risk of reidentification associated with a particular masking method.Any researcher publishing such maps is advised to become familiar with the different masking techniques available and their associated reidentification risks.

View Article: PubMed Central - PubMed

Affiliation: Department of Geography, University of New Mexico, Albuquerque, NM 87131, USA.

ABSTRACT
Public health datasets increasingly use geographic identifiers such as an individual's address. Geocoding these addresses often provides new insights since it becomes possible to examine spatial patterns and associations. Address information is typically considered confidential and is therefore not released or shared with others. Publishing maps with the locations of individuals, however, may also breach confidentiality since addresses and associated identities can be discovered through reverse geocoding. One commonly used technique to protect confidentiality when releasing individual-level geocoded data is geographic masking. This typically consists of applying a certain amount of random perturbation in a systematic manner to reduce the risk of reidentification. A number of geographic masking techniques have been developed as well as methods to quantity the risk of reidentification associated with a particular masking method. This paper presents a review of the current state-of-the-art in geographic masking, summarizing the various methods and their strengths and weaknesses. Despite recent progress, no universally accepted or endorsed geographic masking technique has emerged. Researchers on the other hand are publishing maps using geographic masking of confidential locations. Any researcher publishing such maps is advised to become familiar with the different masking techniques available and their associated reidentification risks.

No MeSH data available.


Related in: MedlinePlus

Illustration of the k-anonymity concept using record linkage. Medical records contain a number of different fields which are removed to protect confidentiality, including name and address. When combined with voting records, however, it becomes possible to uniquely identify individuals in the medical records by combining fields for ZIP code, birthday, and sex. The k-anonymity provided by the released data is unacceptably low. By removing the field for birthdate (or replacing it with birth year), the k-anonymity is substantially increased and may reach acceptable levels. The concept of k-anonymity provides a quantitative measure of confidentiality protection. More specifically, it is a number that can be calculated for each subset of the data. For the example of medical record and voting records, values for k-anonymity can be calculated prior to release for all combination of ZIP code and sex or any other field of interest. Adapted from [66].
© Copyright Policy - open-access
Related In: Results  -  Collection


getmorefigures.php?uid=PMC4590956&req=5

fig7: Illustration of the k-anonymity concept using record linkage. Medical records contain a number of different fields which are removed to protect confidentiality, including name and address. When combined with voting records, however, it becomes possible to uniquely identify individuals in the medical records by combining fields for ZIP code, birthday, and sex. The k-anonymity provided by the released data is unacceptably low. By removing the field for birthdate (or replacing it with birth year), the k-anonymity is substantially increased and may reach acceptable levels. The concept of k-anonymity provides a quantitative measure of confidentiality protection. More specifically, it is a number that can be calculated for each subset of the data. For the example of medical record and voting records, values for k-anonymity can be calculated prior to release for all combination of ZIP code and sex or any other field of interest. Adapted from [66].

Mentions: The concept of k-anonymity is best illustrated with an example, adapted from [66] and shown in Figure 7. Consider a set of health-related records with personal identifiers such as name, birthdate, sex, ethnicity, street address, and ZIP code, in addition to health-related data such as diagnosis, treatment, and insurance. To protect confidentiality, individual identifiers need to be removed from the data prior to release, including name and address. While this may appear to be sufficient to protect confidentiality, consider a second set of records consisting of publicly available voting records. In many jurisdictions these records include the individual's name, birthdate, sex, street address, and ZIP code, in addition to voting-related data such as party affiliation and the nature of participation in the last election. The voting records can be used to reidentify the individuals in the anonymized health records. In this particular example, the combination of ZIP code, birthdate, and gender in most cases will uniquely identify a single individual. The value for k would be 1, which is of course unacceptable. A possible solution is to replace the exact birthdate with the birth year, although in some cases this may not be sufficient. For an actual set of data files, empirical values for k can be determined to see the effects of specific anonymization techniques on the risk of reidentification.


Ensuring Confidentiality of Geocoded Health Data: Assessing Geographic Masking Strategies for Individual-Level Data.

Zandbergen PA - Adv Med (2014)

Illustration of the k-anonymity concept using record linkage. Medical records contain a number of different fields which are removed to protect confidentiality, including name and address. When combined with voting records, however, it becomes possible to uniquely identify individuals in the medical records by combining fields for ZIP code, birthday, and sex. The k-anonymity provided by the released data is unacceptably low. By removing the field for birthdate (or replacing it with birth year), the k-anonymity is substantially increased and may reach acceptable levels. The concept of k-anonymity provides a quantitative measure of confidentiality protection. More specifically, it is a number that can be calculated for each subset of the data. For the example of medical record and voting records, values for k-anonymity can be calculated prior to release for all combination of ZIP code and sex or any other field of interest. Adapted from [66].
© Copyright Policy - open-access
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC4590956&req=5

fig7: Illustration of the k-anonymity concept using record linkage. Medical records contain a number of different fields which are removed to protect confidentiality, including name and address. When combined with voting records, however, it becomes possible to uniquely identify individuals in the medical records by combining fields for ZIP code, birthday, and sex. The k-anonymity provided by the released data is unacceptably low. By removing the field for birthdate (or replacing it with birth year), the k-anonymity is substantially increased and may reach acceptable levels. The concept of k-anonymity provides a quantitative measure of confidentiality protection. More specifically, it is a number that can be calculated for each subset of the data. For the example of medical record and voting records, values for k-anonymity can be calculated prior to release for all combination of ZIP code and sex or any other field of interest. Adapted from [66].
Mentions: The concept of k-anonymity is best illustrated with an example, adapted from [66] and shown in Figure 7. Consider a set of health-related records with personal identifiers such as name, birthdate, sex, ethnicity, street address, and ZIP code, in addition to health-related data such as diagnosis, treatment, and insurance. To protect confidentiality, individual identifiers need to be removed from the data prior to release, including name and address. While this may appear to be sufficient to protect confidentiality, consider a second set of records consisting of publicly available voting records. In many jurisdictions these records include the individual's name, birthdate, sex, street address, and ZIP code, in addition to voting-related data such as party affiliation and the nature of participation in the last election. The voting records can be used to reidentify the individuals in the anonymized health records. In this particular example, the combination of ZIP code, birthdate, and gender in most cases will uniquely identify a single individual. The value for k would be 1, which is of course unacceptable. A possible solution is to replace the exact birthdate with the birth year, although in some cases this may not be sufficient. For an actual set of data files, empirical values for k can be determined to see the effects of specific anonymization techniques on the risk of reidentification.

Bottom Line: This typically consists of applying a certain amount of random perturbation in a systematic manner to reduce the risk of reidentification.A number of geographic masking techniques have been developed as well as methods to quantity the risk of reidentification associated with a particular masking method.Any researcher publishing such maps is advised to become familiar with the different masking techniques available and their associated reidentification risks.

View Article: PubMed Central - PubMed

Affiliation: Department of Geography, University of New Mexico, Albuquerque, NM 87131, USA.

ABSTRACT
Public health datasets increasingly use geographic identifiers such as an individual's address. Geocoding these addresses often provides new insights since it becomes possible to examine spatial patterns and associations. Address information is typically considered confidential and is therefore not released or shared with others. Publishing maps with the locations of individuals, however, may also breach confidentiality since addresses and associated identities can be discovered through reverse geocoding. One commonly used technique to protect confidentiality when releasing individual-level geocoded data is geographic masking. This typically consists of applying a certain amount of random perturbation in a systematic manner to reduce the risk of reidentification. A number of geographic masking techniques have been developed as well as methods to quantity the risk of reidentification associated with a particular masking method. This paper presents a review of the current state-of-the-art in geographic masking, summarizing the various methods and their strengths and weaknesses. Despite recent progress, no universally accepted or endorsed geographic masking technique has emerged. Researchers on the other hand are publishing maps using geographic masking of confidential locations. Any researcher publishing such maps is advised to become familiar with the different masking techniques available and their associated reidentification risks.

No MeSH data available.


Related in: MedlinePlus