Game theory indicates that only minimal edits are required to protect DNA data against attacks on anonymity, a health information privacy research team reported in Science Advances, according to a news release from Vanderbilt University Medical Center.
Zhiyu Wan, PhD, Bradley Malin, PhD, and colleagues at Vanderbilt University Medical Center have in previous papers discussed the application of game theory to genomic and health data re-identification risk. Here, they demonstrate a game theoretic method for protecting de-identified genomic data against attacks in which an adversary gathers information from different public sources to triangulate a target’s identity.
For purposes of illustration, the paper takes particular aim at a method of attack published in Science in 2013, where researchers used online public data sources to re-identify DNA test results obtained by querying a genetic genealogy company’s database.
In the masking game, as the authors call it, a research subject makes the opening move, sharing de-identified DNA data after masking selected data points. The equations and algorithms set out in the paper have allowed the subject to compute a rational adversary’s best responses for all possible masking strategies. The adversary moves next, deciding whether or not to attack based on observing which data points have been masked, again using equations derived from game theory.
“The goal of this research is to show how data holders in the real world — research teams and institutions, hospitals, government agencies, genetic genealogy companies — can use these methods to greatly improve the de-identification of genomic data entrusted to them by patients, research subjects and customers, shutting down the most likely sorts of attackers while optimizing the data’s usefulness under large-scale sharing for scientific research,” said Malin, Professor of Biomedical Informatics, Biostatistics and Computer Science.
Comparing the masking game to other data sharing strategies, the paper examines data privacy and utility under a range of scenarios involving different data sets, attack models and levels of risk aversion. The paper provides a formal examination of real, as well as simulated, scenarios involving prospective monetary payoffs for the game’s players. The subject’s payoff is optimized by suppressing only enough data to make attacks unprofitable, leaving attackers with no reason to participate.
“Many data managers are prone to assume the worst-case scenario, an attacker with unlimited capability and no aversion to financial losses,” said Wan, a research fellow in the Health Information Privacy Laboratory at VUMC, where Malin is the lab’s director. “But that may not happen in the real world, so you would tend to overestimate the risk and not share anything. We developed an approach that gives a better estimate of the risk.”