Today, Forbes’ Kashmir Hill reported on the work of a man going by the name of “Puking Monkey.” This creative electronics tinkerer hacked into his RFID-enabled toll booth pass—a great feature for travelers, especially for us up here in the Northeast who regularly must pass through countless toll booths—and configured it to alert him whenever it was being read. What he discovered was that all over New York City, his E-Z Pass was being tracked—and not just by toll booths.
Of course, the surveillance capabilities are concerning, but this also stuck out to me:
“This isn’t a part of the Lower Manhattan Security Initiative, the millions-dollar project emulating London’s Ring of Steel with extreme surveillance. It’s part of Midtown in Motion, an initiative to feed information from lots of sensors into New York’s traffic management center. A spokesperson for the New York Department of Transportation, Scott Gastel, says the E-Z Pass readers are on highways across the city, and on streets in Manhattan, Brooklyn and Staten Island, and have been in use for years. The city uses the data from the readers to provide real-time traffic information, as for this tool. The DoT was not forthcoming about what exactly was read from the passes or how long geolocation information from the passes was kept. Notably, the fact that E-Z Passes will be used as a tracking device outside of toll payment, is not disclosed anywhere that I could see in the terms and conditions.”
It’s an example of how data collection intended for one purpose can be surreptitiously used for another, without a user’s consent. The reason we use these passes is in their very name: E-Z Pass. Convenience. Efficiency. And, I can understand why a municipality or public service would want to use anonymized traffic data to help gauge and manage traffic patterns and solutions. It’s potentially a good use of huge data sets for large swaths of people. But as Puking Monkey points out, “If NYDOT can put up readers … other agencies could as well.”
So: Ubiquitous data collection. Surreptitious data use. Anonymization. The bountiful benefits versus the privacy pitfalls of Big Data. These-are-some-of-my-favorite-things…
And these are some of the things discussed earlier this week at an event put together by the folks at the Future of Privacy Forum and Stanford Law School’s Center for Internet and Society. I had the pleasure of attending this content-packed event, which was focused on creating solutions to the seemingly incongruent realities between Big Data and privacy.
What did I learn? Well, there was a lot. So I’ll try to keep it brief.
The Benefits v. the Harms
To start, how do we frame and quantify the benefits of Big Data versus privacy harms? Jules Polonetsky, CIPP/US and Omer Tene have written extensively on this issue. Like measuring traffic patterns, the potential upside to Big Data innovation seems limitless. But, like the secondary use of the E-Z Passes, the surreptitious use to track drivers without their knowledge is creepy. (Another thing I learned: you can make a drinking game during Big Data discussions. Drink whenever the words “innovation” and “creepy” are used. Fall down in about 30 minutes. Thank you, Washington University School of Law Prof. Neil Richards.)
Data Use v. Data Collection
The Center for Democracy & Technology’s Justin Brookman has co-written a compelling argument on why collection matters. Others, such as Information Accountability Foundation’s Marty Abrams, argue that in a Big Data world, collection is not the issue. It’s what you do with the data that matters. Marty describes a two-phased approach where phase one entails the collection/discovery phase. This is the fun area where researchers can play around with the data and find areas for innovation (drink). It’s the application phase that requires a careful assessment and balance of benefits versus harms and where things can get creepy (drink, again).
Which brings me to my next takeaway.
Accountability, Trust and Transparency
In a Big Data world, some of the Fair Information Practice Principles shrink while others inflate. Take data minimization, for example. With countless sensors, cheap storage and smart algorithms, minimizing data collection isn’t always so realistic. But, there’s accountability. Things have gotten so complicated that, to reach truly informed consumer status, you’d have to spend all of your waking time understanding the complexities of online ad networks, encryption, and so on. However, putting the onus on the organization or business—cultivating trust—is one fairly popular option. But as Brookman pointed out, “Accountability can’t do anything about a rogue employee or about government access.” Plus, how can businesses be truly transparent about their data collection and use techniques? Obviously, as I wrote last week, good businesses will take steps to be transparent, but as UC-Berkeley Prof. Deirdre Mulligan points out, “you don’t want to dump a bunch of code on people.” To me, it would be the tech equivalent of long, unreadable legal privacy policies.
The Democratization of Big Data
So much time is spent discussing the top-down, all-inclusive side of Big Data, but that’s not the only angle here. Big Data tools can now be used by small organizations and individuals. Again, this can be good or bad. In thinking about the social effects of having Big Data tracking abilities, Rochester Institute of Technology Prof. Evan Selinger asked, “is our common law ready for that?” PhD candidate Karen Levy has done some interesting work on relational Big Data. She notes there are more tools now for parents to track their kids—which can be a good thing. But, as these tools proliferate, familial relationships may fundamentally change. And how about their use by jealous spouses? Regulators may want to keep watch. And what about those who are left out of the Big Data scheme? Only 16 percent of Africans have Internet access, points out U.S. Department of State Attorney-Advisor Jonas Lerman. Those who are poor may miss out on the many benefits brought on by Big Data.
Anonymization and Re-Identification
This debate is not going away. Having University of Ottawa Prof. Khaled El Emam debate it with graduate fellow Jonathan Mayer was a great way to dive into the often controversial subject. Perhaps the solution resides, as New York University Fellow Ira Rubenstein said, in prohibiting and regulating re-identification altogether?
Predictability, Inference and the Ethics of Algorithms
Perhaps the creepiest (drink) takeaway for me is the rapid, ineluctable rise of predictive analytics and automated decision-making. Rayid Ghani, who served as chief scientist of Obama’s 2012 reelection, shared how his team used data to make inferences and predictions to achieve their outcomes. Deirdre Mulligan has co-written a compelling piece on classifications and fairness and points out that ethical decisions are made when algorithms are created. With the Internet of Things staring us square in the face, we’re entering a world that will be filled with sensors, automated decision-making machines and the ability to use data to make predictions and create systems that influence data points to carry out those predictions. It involves behavioral economics and social psychology. It’s going to be a huge issue not only for consumers, but also for companies wanting to employ such services. The ethics of Big Data and technology will be discussed a lot in the months and years ahead.
In the meantime, policymakers, industry reps, academics and privacy pros will continue to hash out ways of making ends meet for Big Data and privacy—perhaps without getting too intoxicated by innovation or creepiness (glug, glug). Maybe Puking Monkey got his name from playing this game?