Microsoft researcher Kate Crawford got some press at this time last year with her “six myths of big data,” among which was this item: “Big Data Doesn’t Discriminate.”
Over the past year, this idea of big data analytics and its potential for discriminatory harm has only gained steam. Our own resident VP of Research and Education, Omer Tene, published with Future of Privacy Forum ED Jules Polonetsky about the topic in a paper called “Judged by the Tin Man”; Michael Schrage wrote about it for Harvard Business Review, and the White House reports on big data highlighted potential discrimination as an issue to watch, among numerous other examples.
Perhaps it’s not surprising then that the two IAPP prize-winning papers at this year’s recently completed Privacy Law Scholars Conference both deal with big data, and the increasingly complicated systems that employ it.
Solon Barocas and Andrew Selbst’s paper, “Big Data’s Disparate Impact,” still in draft form, examines discrimination law in the U.S. and whether it can adequately handle the issues raised by big data. Danielle Citron and Frank Pasquale’s “The Scored Society,” published in Washington Law Review, looks at the legal complications created by systems using data and algorithms to include and exclude people from various programs. (To read about last year’s winners, click here.)
Selbst, currently with Public Citizen and about to embark on a clerkship with a judge on the third circuit, said the idea for their paper arose from a conversation he had on a New York City street with Barocas, currently with the Center for Information Technology Policy at Princeton.
With an interest in how new technology changes previous understandings of civil liberties, Selbst was primed to engage with Barocas’ dissertation work on the effects of big data on populations. “He was discovering that there are all these unintentional ways that discrimination could creep in that people wouldn’t think of right away, and I just said, ‘I think this breaks anti-discrimination law. I’m not sure the law can possibly handle that.’”
Barocas said he’s been working on big data’s indirect impacts since his master’s work in 2004, and then continued with his dissertation to look into data analysis, machine learning and the work scientists have been doing on non-discriminatory data mining models. “A lot of my work now is to translate these technical details into policy and philosophy,” he said. “It’s a really rich area.”
Working with Selbst, he said, “brought my insights to a more legal analysis.”
And all the activity surrounding the issues, including the White House reports, has been “really encouraging,” Barocas said. “We happened to finish our paper at the exact right moment.”
Really, the only question remaining is this: “We’re not sure how truly pessimistic to be,” Selbst said. In their initial thinking, they were fairly pessimistic, Selbst said, but “I have come to believe there is more reason for optimism, and we won’t know where we come down” until the paper is finished later this summer, incorporating feedback from PLSC and elsewhere.
Citron and Pasquale, for their parts, are fairly pessimistic, if not dire. “Individuals should be granted meaningful opportunities to challenge adverse decisions based on scores miscategorizing them,” they write. “Without such protections in place, systems could launder biased and arbitrary data into powerfully stigmatizing scores.”
The pair work together teaching law at the University of Maryland, where both are doing work on “black boxes,” those closed systems where data goes in and a decision comes out and it’s unclear, or certainly opaque, just how that decision was arrived at.
Citron wrote a paper exploring this issue first in 2007, “Technological Due Process,” which focused on a health and human services system in Colorado that was “a disaster.” Programmers coded in bad policy, with incorrect decision tables, and people were denied benefits like Medicaid and food stamps.
With no programmer notes, crashing systems, no audit trails, “there was no way to trace why a decision was made,” Citron said. “It was a failure of due process, with no chance to be heard.” Essentially, it was unauthorized rulemaking, with programmers doing real-life damage without even knowing it.
The question she raised was this: How do we update our understanding of due process for the 21st century?
So, when Washington Law Review asked her to write about artificial intelligence and the law, she naturally thought of Pasquale, with whom she had previously examined U.S. federal “fusion centers” and who was exploring these black box issues.
They decided to turn their gaze toward credit-scoring systems, combining Citron’s ideas about due process with Pasquale’s ideas about lack of insight into systems. “There are hundreds of ways that entities are scoring us in ways that we find very troubling,” she said.
“She was the person who got me into privacy law,” Pasquale said. “I’d never really written in the area until we worked on fusion centers.”
Instead, his black box work started out with his examination of Google’s search algorithms back in 2005/2006. People were complaining about their place in search results, and saying it was unfair. “Google would always say, ‘It’s about the quality of the user experience,’” Pasquale said, “and then I found that nobody could really get to the bottom of it. People just believed whatever Google said.” Back then, “people said, ‘You’re out of your mind. They’re not that powerful. Why are you even talking about this?’” Pasquale said. Now his work seems a little more relevant to folks.
Then he turned his eye to credit scores, which was particularly relevant during the housing crisis: “It seemed that so many times the credit score was setting up unfair games for people.”
Such is the nature of their paper. It’s particularly interesting to see the issue in the context of the original credit bureaus, Pasquale said, which would report on things like “effeminate gestures” or a “messy yard.” FCRA was meant to stop this kind of “disgusting insinuating innuendos,” he said. It was supposed to create a more scientific model for credit evaluation. In the process, however, it created a more opaque box.
Unfortunately, Citron and Pasquale will not be available to present their paper at the IAPP Privacy Academy in San Jose this fall (Citron is on book tour; Pasquale is hosting a big data conference of his own at UMaryland), but Barocas will be on hand for the event, speaking solo because of Selbst’s clerkship.