By Sam Pfeifle
How bad is the situation for privacy notices? The National Science Foundation just used part of its largest grant program, a Frontier award of well over $1 million, to fund a team of researchers looking to fix them.
And, to be clear, “We try to look at society’s biggest challenges and the things that really matter,” said Lisa-Joy Zgorski, a spokesperson for the National Science Foundation, “things that affect lives and jobs, and we tackle these issues with the requests for proposals.”
The project in question, “Towards Effective Web Privacy Notice and Choice: A Multi-Disciplinary Perspective,” is led by researchers at Carnegie Mellon University and includes teams at both Fordham and Stanford. They hope to use advances in machine-learning, crowd-sourcing and graphic design to take the often-boilerplate privacy polices found on virtually every website nowadays and make them much more digestible and useful for the average web surfer.
“Nobody really understands these privacy policies,” said Nina Amla, one of the program managers at the National Science Foundation’s Computer and Information Science and Engineering Directorate. “They’re pages long, and the few people who do read them don’t come away with much information about how to make decisions about visiting that website or not.”
Of course, Amla is not the first to make those observations. Attendees at the IAPP’s Navigate event were treated to a presentation by Carnegie Mellon’s Jason Hong that showed you’d need to devote some 25 days a year to reading privacy notices if you actually read the notice on every site you, as an average surfer, visit.
“I think we all realize that very few people read these polices,” he said, “and even if you do read them, you can’t answer the most basic questions about them.”
While researchers like Lorrie Cranor, who’s on Sadeh’s team, have looked at asking websites to use something more akin to a nutritional label, “we’re seeing that website operators are not necessarily keen to do much more than what they’ve already been doing,” Sadeh said.
With some background in machine learning and natural language studies, Sadeh a while back started to wonder if those kinds of technologies might be applied to privacy notices that tend to share a lot of similar language and patterns, so as to automatically answer those questions about privacy notices that are most important to the consumers visiting the sites.
Essentially, he asked, “can we take these policies in their ugliness and extract something meaningful out of them?”
The end result, he said, might be a browser plugin that displays a very simple color, or a letter grade, when someone visits a website that’s been evaluated by the program. It’s unlikely, said Sadeh, that what he’s envisioning could be done in real-time, but the sweeps of the kind done by privacy commissioners, for example, could be made much more efficient.
To that end, he’s assembled a team from the three universities with backgrounds in areas like legal research, public policy and human-computer interaction, in addition to privacy.
Once some research is done on what the privacy questions are that consumers really care about the answers to, and how to best to gather an online crowd interested in being a source of information, the workflow might look something like this:
First, the text of the privacy notice is ingested by the software. It pulls out the portions of the policy it believes answers the five-to-seven questions most important to consumers and offers up what it believes are the answers. The crowd online confirms or corrects those answers, and then a score for the site is generated. That score is recorded and added to the database. Finally, when someone next visits that site with the plugin installed, the score is displayed and the consumer can make a decision on whether to simply proceed or dive deeper into what the answers are to those important questions.
“Maybe we can find answers that matter to users,” said Sadeh, “though the answers may or may not be doable depending on what the policies do say. Some of them do a great job of never answering a question that you care about, and sometimes that’s very revealing. If they don’t make a statement about a valid question, then that’s an issue, and I can probably get a crowd to help me with pointing that out.”
At the end of the project’s three-and-a-half years, the hope, said Sadeh, is that “I can do this on a massive scale and we can start automating the sweeps … We might be able to see how policies evolve, or check how new regulations are being addressed, maybe even inform regulators and get them to impose various sanctions.
“We all realize that in many different domains,” he continued, “there’s been a rush to the bottom in terms of privacy practices, and the idea of self-regulation and that people would start competing on privacy polices, well, that was wishful thinking and that remains wishful thinking. But, if one day you can distill all this information so that it’s much easier for a user to digest, then maybe you find yourself with that actually happening.
“That’s the ultimate goal, I would think.”
Read More By Sam Pfeifle:
Skepticism Surrounds NSA Review; Massive “Black” Budget Revealed
A Turbulent Time for Gathering Privacy Commissioners
PCLOB to U.S. Intelligence: Update Data-Gathering Guidelines Now
PRIVACY IN POPULAR CULTURE: Privacy Is “More Complicated Than We Realized”