“Finding beavers from outer space was my very first job,” said Alex “Sandy” Pentland, presenting at a recent Center for Geographic Analysis conference. The audience chuckled. “Yeah, isn’t that crazy?”
Pentland now works at MIT and co-leads the World Economic Forum Big Data and Personal Data Initiatives and somewhere in between beavers and the WEF, he helped develop the car monitoring systems for the Nissan Leaf—so he “knows a little about cars, too.”
But he didn’t come to the Harvard auditorium to talk about any of that. What he explored was what he referred to as possibly the world’s first attempt to create a global “data commons.”
Recently, mobile carrier Orange released to a group of researchers all of their data from Ivory Coast, and together with some partners, including the UN Global Pulse and the WEF, they came up with a way of aggregating and anonymizing the data so they could “do things with it.”
Eighty-six universities across the globe tackled this data from a privacy-threat standpoint and also used it for the public good, as part of the Data for Development (D4D) Challenge. One group of researchers was able to improve the public transit system by 10 percent by adding a couple of small—but critical—bus routes, and another used the data to help stop the spread of the HIV virus with campaigns directed to paths of transmission that were previously unknown.
There were also some more controversial findings. For example, with the war recently fought in Ivory Coast, “you can actually tell which side is right based on this data—which is going to cause problems,” Pentland noted.
Had they done enough, though, to mitigate any threats to privacy?
Elizabeth Bruce, executive director of MIT’s CSAIL Big Data Initiative, writes in the CSAIL blog that the research groups’ findings suggest “many of the privacy fears associated with the release of data about human behavior may be generally misunderstood…No path to re-identification was discovered even though several of the research groups studied this specific question.” This is attributed to advanced computer algorithms and a legal contract that set explicit rules for data usage—both of which are encompassed in current proposed legislation across the globe.
“The way things are now, it’s not how they have to be,” said Pentland. We don’t have to strip data of all its value in order to protect privacy.
Companies like Orange, Telefonica and Verizon are steadily moving towards creating data commons from which a lot of public good can come, Pentland said, “and it looks like there’s a practical—although not perfect—way forward for doing that.”
There’s an idea out there that computer code is the same thing as law. “You can actually write really detailed rules for how things happen, and verify that they happen automatically, and that’s really an extension of policy,” Pentland notes, adding, “You have to be careful, because people can break the law.”
These data commons, said Pentland, may end up part of a two-tiered system that these Big Data companies are beginning to arrive at: Data commons, which are freely available, and personal data stores, in which individuals control their data through informed opt-in and retraction. By incorporating the growing view that individuals should have ownership of data that is about them, and putting it side by side with an appreciation for what researchers can accomplish with properly anonymized data collections, Pentland believes you can find the greatest public good—but he also understands the risk.
“I think there are ways to design things that are good, not perfect,” said Pentland. “And, it’s important we move towards the good, because right now we’re in the terrible.”
Editor’s Note: You can hear Pentland speak at Navigate, June 21, surrounded by artists and thought leaders brought together to inspire fresh ideas about privacy.