By The Common Data Project
As part of its mission to educate the public about online privacy issues, the Common Data Project embarked on a project to discover how companies compete on the basis of privacy. CDP surveyed the online privacy policies of large Internet companies, social networks and start-ups to develop a "how-to-read" guide for those unable to understand or unwilling to engage in the lengthy and complex privacy policies that are so common in today's marketplace. Using seven questions, the CDP set out to extract the information users need in order to understand the practices behind the policies and to participate in the public debate about online data collection.
The Common Data Project was created to encourage and enable the disclosure of personal data for public re-use through the creation of a technology and legal framework for anonymized data-sharing. Specifically, we think that means creating a new kind of institution called a datatrust, which is exactly what it sounds like: a trusted place to store and share sensitive, personal data.
So why did we spend a lot of time parsing the legalese of some excruciatingly long privacy statements?
We realize that most users of online services have not and never will read the privacy policies so carefully crafted by teams of lawyers at large companies. And having read all of these documents (many times over), we're not convinced that anyone should read them, other than to confirm what they probably already know: A lot of data is being collected about them, and it's not really clear who gets to use that data, for what purpose, for how long, or whether any or all of it can eventually be connected back to them.
Yet people continue to use Google, Microsoft, Yahoo and other sites without giving much thought to the privacy implications of giving up their data to companies.
We at the Common Data Project know that for a datatrust to function properly, we can't rely on people to simply look the other way, nor do we want them to.
Data collection for Google and Microsoft users is incidental. People go to Google.com to search, not to give data. As long as they have a good search experience, the data collection is largely out of sight, out of mind.
A datatrust, on the other hand, will be a service explicitly designed around giving and sharing data. We know that to convince the public that the datatrust can indeed be trusted, a clear privacy story is absolutely necessary.
Below we offer a guided tour through the privacy policies of 15 online services--from established players to major retailers to Web 2.0 starlets and aspiring start-ups that hope to compete on superior privacy guarantees. Our goal was to identify where their policies were ambiguous or simply confusing.
The companies and organizations whose policies the CDP analyzed were chosen for being among the most trafficked sites, as well as for providing a range of services online.
- Search and Internet Portals Google, Yahoo!, Microsoft, AOL
- Major Retailers Amazon, eBay
- Online Communities and Social Networks Facebook, Craigslist, Wikipedia, Photobucket
- Content Providers NYT, WebMD
- Emerging Search Engines Ask, Cuil, Ixquick
Privacy is not exclusively an online issue, even though the companies surveyed here all operate online. Many of the largest data breaches over the last 10 years have involved companies and agencies that actually operate exclusively offline, and the question of how to manage, store, and share large amounts of information is an important question for almost every business today. However, we chose to focus on online businesses and organizations because they have been among the most visible in illustrating the dangers and advantages of amassing great quantities of data.
For a graph visit www.commondataproject.org.
To guide the analysis, we used seven questions to help pinpoint the issues deemed as most crucial for users' privacy.
- How do they define "personal information"?
- What promises are being made about sharing information with third parties?
- What is their data retention policy and what does it say about their commitment to privacy?
- What privacy choices do they offer to the user?
- What input do users have into changing to the policy's terms?
- To what extent do they share the data they collect with users and the public?
When you walk into your neighborhood grocery store, you might not be surprised that the owner is keeping track of what is popular, what is not, and what items people in the neighborhood seem to want. You would be surprised, though, if you found out that some of the people in the store who were asking questions of the customers didn't work for the grocery store. You would be especially surprised if you asked the grocery store owner about it, and he said, "Oh those people? I take no responsibility for what they do." But in the online world, that happens all the time. Obviously, when a user clicks on a link and leaves a site, he or she ends up subject to new rules. But even when a user doesn't leave a site, there's data collection by third-party advertisers that's happening while you sit there.
In this section, we review how companies handle this in their privacy policies.
2. How do they define "personal information"?
In this section, we examine how companies approach this disclosure. Some companies categorize, others label, and others use the disclosure to tout the fact they collect no information whatsoever. This section also explores how companies approach the topic of re-anonymization.
3. What promises are being made about sharing information with third parties?
In addition to listing the types of data collected from you, most privacy policies will also list the reasons for doing so. The most common are:
- To provide services, including customer service
- To operate the site/ensure technical functioning of the site
- To customize content and advertising
- To conduct research to improve services and develop new services.
They also list the circumstances in which data is shared with third parties, the most common being:
- To provide information to subsidiaries or partners that perform services for the company
- To respond to subpoenas, court orders, or legal process, or otherwise comply with law
- To enforce terms of service
- To detect or prevent fraud
- To protect the rights, property, or safety of the company, its users, or the public
- Upon merger or acquisition
This section examines various companies' approaches to communicating this message. Language aimed at normalizing and euphemizing the information being relayed was common.
For us at CDP, the issue isn't whether IP addresses are included in the "personal information" category or not. What we really want to see are honest, meaningful promises about user privacy. We would like to see organizations offer choices to users about how specific pieces of data about them are stored and shared, rather than simply make broad promises about "personal information," as defined by that company.
It may turn out that "personal" and "anonymous" are categories that are so difficult to define, we'll have to come up with new terminology that is more descriptive and informative. Or companies will end up having to do what Wikipedia does: simply state that it "cannot guarantee that user information will remain private."
4. What is their data-retention policy and what does it say about their commitment to privacy?
Data retention has been a controversial issue for many years, with American companies not measuring up to the European Union's more stringent requirements. But for us, it obscures what's really at stake and often confuses consumers.
For many privacy advocates, limiting the amount of time data is stored reduces the risk of exposure. The theory, presumably, is that sensitive data is like toxic waste, and the less of it lying around, the better off we are. But that theory, as appealing as it is, doesn't address the fact that our new abilities to collect and store data are incredibly valuable, not just to major corporations, but to policymakers, researchers, and even the average citizen. Focusing on this issue of data retention hasn't necessarily led to better privacy protections. In fact, it may be distracting us from developing better solutions.
This section explores how some companies handle (or don't) the topic of data retention in their privacy policies.
5. What privacy choices do they offer to the user?
Over the last year or two, there have been some interesting changes in the way some companies view privacy. They're starting to understand that people not only care about whether the telemarketer calls them during dinner, but also whether that telemarketer already knows what they're eating for dinner.
This section explores the evolution and offers one option for taking it a significant step forward.
6. What input do users have into changes to the policy's terms?
Not surprisingly, none of the policies we looked at stated that users would have a say into changes to the privacy terms. In this section, we review what companies do say in this regard, and also shed light on how Facebook's recent handling of policy term changes did open the door for users to have a say, why this is good, and why they had to do it.
7. To what extent do they share the data they collect with users and the public?
This section discusses companies' data-sharing activities and what it means, or could mean, for consumers.
By our standards, none of the privacy policies we surveyed quite measure up. Most of them provide incomplete information on what "personal information" means. Many of them fail to make clear that they are actively sharing information with third-parties. Even when they change their policies on something like data retention to placate privacy advocates, the changes do little to provide real privacy. The legal right companies reserve to change their policies at any time reminds us that right now, the balance of power is clearly in their favor. When they do offer users choices, the choices fail to encompass all the ways online data collection implicates users' privacy.
But we don't believe that we are stuck with the status quo. In fact, there are many positive signs of companies making smart moves, because they're realizing they need buy-in from their users to survive in the long-term.
Already, during recent Facebook controversies, we've seen users trying to determine how their data is shared. Google has created new tools that allow users a wider range of choices for controlling how their data is tracked. And every day, we see new examples of how data can be shared with users and customers as part of a service, rather than being treated just as a byproduct that is solely for the companies' use and enrichment.
We hope that our analysis will help push debate in the right direction. We hope that companies will see there can be real value and return in being more honest with their consumers. At the same time, we hope that as consumers and privacy advocates, we can work with companies towards useful solutions that balance privacy rights against the value of data for all of us.
About the Common Data Project