“We need to stop pretending that pseudonyms protect privacy.”
Such was Cambridge University Prof. Ross Anderson’s message to assembled privacy professionals during Wednesday’s opening keynote at the IAPP Europe Data Protection Intensive here in London, UK.
“Above all, we need honesty,” he implored.
The topic is top-of-mind in the UK, with the National Health System (NHS) getting policy direction from the top that “every patient is a research patient,” Anderson observed, and he predicted “the fight in Britain is going to be about anonymization.”
The thing is, there’s good data showing UK citizens, by a tally of 70 to 80 percent, are happy to have their medical data used in research. Provided they’re asked. “But if they’re not asked,” said Anderson, “they’re against it.” In fact, 53 percent of UK residents oppose or strongly oppose a central records database in the first place.
However, the Clinical Practice Research Datalink has been live since 2012, jointly funded by the NHS National Institute for Health Research and the Medicines and Healthcare Products Regulatory Agency. Already it’s been used in nearly 900 clinical reviews and papers.
“Now, in the U.S., you can just add to your shopping basket 100,000 UK diabetics,” Anderson said. “And there will soon be another system hoovering up GP stuff and making that for sale as well.”
The problem? As the data gets big, anonymization gets nearly impossible, he said, noting that if there are 33 data points on one individual, it’s a virtual certainty that person can be re-identified.
NHS Chief Scientific Advisor Mark Walport and National Director for Patients and Information Tim Kelsey “vigorously think all the data should be available,” Anderson said, but won’t acknowledge the risk to personal privacy.
“You can occasionally use anonymization, but it’s really, really hard and you can only use it under precise circumstances,” Anderson said. Especially, you need to be able to predict how that data is going to be used. The problem with a central repository from which you can buy data is that you don’t have any idea how that data might be integrated or combined with other data, which completely changes the effectiveness of the anonymization.
“Once it’s decentralized,” he said, “it’s only a matter of time until something unpleasant happens,” and “once you move away from episodic data to longitudinal data, then there are so many ways of identifying people.”
So, what should be done? Anderson said NHS and other records databases should create architecture the way Google would do it: “A company like Google would never dream of having a person walk out the door with eight million Gmail inboxes sitting on a laptop to go work at home for the weekend. Disabuse yourselves of the idea that you can download data like this to a laptop.”
For NHS, or any privacy professional working with Big Data sets, he recommends “a proper number of controlled repositories” to which you can apply for access. “The world has to change,” Anderson insisted. “There are ways forward technically, but it involves doing research the way Google would do it rather than the way it’s done now.”