The browser or device you are using is out of date. It has known security flaws and a limited feature set. You will not see all the features of some websites. Please update your browser. A list of the most popular browsers can be found below.
These days, everyone seems to be talking about “big data.” Engineers, researchers, lawyers, executives and self-trackers all tout the surprising insights they can get from applying math to large data sets. The rhetoric of big data is often overblown, exaggerated and contradictory, but there’s an element of truth to the claim that data science is helping us to know more about our world, our society and ourselves.
Data scientists use big data to deliver personalized ads to Internet users, to make better spell checkers and search engines, to predict weather patterns, perform medical research, learn about customers, set prices and plan traffic flow patterns. Big data can also fight crime, whether through the use of automated license-plate readers or, at least theoretically, through the collection of vast amounts of “metadata” about our communications and associations by the National Security Agency.
Big data allows us to know more, to predict and to influence others. This is its power, but it’s also its danger. The entities that can harness the power of math applied to large sets of personal information can do things that used to be impossible. Many of these new uses are good, but some of them aren’t. For example, if our “personalized prices” can be based on our race or sex, or if our college admissions are based on things like ZIP code or car ownership, we might want to think more deeply about the kinds of big decisions our big data can be used for. We’re creating a society based on data, and we need to make sure that we create a society that we want to live in.
The values we build or fail to build into our new digital structures will define us. If we don’t balance the human values that we care about — such as privacy, confidentiality, transparency, identity and free choice — with the compelling uses of big data, our society risks abandoning them for the sake of mere innovation or expediency.
We think the answer lies in a conversation about the ethics of big data. What should we allow it to do for us, and why? Big data has allowed the impossible to become possible, and it has outpaced our legal system’s ability to control it. This is understandable, as our elected officials don’t pass laws to regulate things that aren’t possible. We need to talk about big data ethics, and we think four facts should guide our discussion.
Big data ethics
First, when we talk about decisions based upon personal data, we need to realize that privacy rules are necessary. Some people might argue that privacy is dead in an age of information, but nothing could be further from the truth. Privacy isn’t just about keeping things hidden, it’s about the rules we use to govern information. Look at the “privacy policies” of even big data companies — these tell you not just what information gets collected about you, but how it is used and when it can be destroyed.
Second, we need to realize that even shared personal information can be protected. When you go to see doctors or lawyers, you don’t expect that the information you give them is theirs to use any way they want. The information is confidential: We confide in them so they can help us, and it’s the promise of confidentiality that lets us trust them enough to tell them everything they need to know, even if it’s embarrassing or sensitive. This essential trust is backed up by laws as well as professional rules of ethics. We don’t think of this information as “public” or “nonprivate,” and we can think about much of the data gathered about us the same way, whether it’s the websites we visit, the books we read or the places we go that our digital devices track automatically. Amazon, Apple or our ISP or mobile phone carrier might need to know this information to help us go about our days, but that doesn’t mean this data is “public” or that it should be beyond our control.
If we’re constantly sorted and nudged by big-data-based decisions, we risk letting the powerful entities in our lives determine who we are before we even know ourselves.
Third, big data requires transparency. If important decisions are being made about us based on an algorithm and data, we have a right to know how the algorithm works and what data is being used. It’s outrageous that while big data has allegedly eliminated privacy, many of the ways it’s used are themselves shrouded in secrecy. This has things entirely the wrong way around. If we’re to build a society through decisions based upon data, we should know how they work, especially when those decisions will affect our daily lives, privacy and social opportunities.
Finally, we should recognize that big data can compromise identity, our right to decide who we are. If we’re constantly sorted and nudged by big-data-based decisions in areas from our choice of books to our voting habits, we risk letting the powerful entities in our lives determine who we are before we even know ourselves. We need to think imaginatively about the kinds of data inferences and data decisions we will allow. We must regulate or prohibit ones we find corrosive, threatening or offensive, just as we’ve long protected decisions surrounding voting and contraception and prohibited invidious decisions made upon criteria such as race, sex or gender.
A new framework
How should we make sure that big data ethics gets built into our digital future? Law should certainly be part of the answer, and despite the claims of some technologists, law can work here. For example, the federal Fair Credit Reporting Act effectively regulates the credit reporting agencies’ use of big data to generate consumer credit reports and calculate consumer credit scores. In fact, the FCRA has regulated growing uses of big data in this context since 1970. (Some kinds of big data are really old.) As big data’s analytical tools become more common in our society, we should extend similar legal protections to other essential areas as well.
But law alone cannot solve these problems. As a society, we need to talk about big data ethics. Are we comfortable using race or proxies for race to price goods or allocate government benefits such as school funding or welfare payments? What about using big data inferences to decide college admissions or lawsuits, to investigate crimes or impose criminal sentences? As scholars, we certainly have our own moral views on these questions (as do many data scientists), but if we’re building a society in which data science is deployed more often, we need to talk as a society about what we will allow and what we won’t. In this respect, the White House’s initiative to study the technological, legal and ethical implications of big data is a good first step. But we need to do more.
We need to establish social norms for the use of data to make decisions about people, and for the rights that people have for understanding and disputing those decisions, just as we established norms for safe working conditions in the wake of the Industrial Revolution and norms for the allocation of government services and benefits at the dawn of the welfare state. When we do this, software designers and engineers need to be at the center of the conversation. Individual users certainly have responsibility to behave responsibly when their data is at stake, but users alone can’t bear the whole burden. We need to build structures that encourage ethical data usage rather than merely incentivizing individual consumers into sharing as much as possible for as little as possible in return.
We must build these structures, such as in-house ethicists or review boards, into government and private entities that use big data. Such proposals might seem far-fetched, but they are already starting to become widespread. For decades, university scientists wishing to perform experiments (whether physical or data-based) on human subjects have had to submit their research projects to institutional review boards, in-house panels that ensure that scientific tools are deployed ethically and for the benefit of human beings. And many leading corporations have started to take steps along these lines, such as the widespread growth of chief privacy officers as senior corporate executives or experiments such as Google’s ethical review board or ethicists-in-residence. If we’re building a data-science revolution, let’s make sure it’s a revolution we want — one that makes society better as well as making companies richer.
Big data ethics first begins as a state of mind, before it becomes a set of mandates. While engineers in particular must embrace the idea of big data ethics, in an information society that cares about privacy, confidentiality, transparency and identity, we must all be part of the conversation, and part of the solution. Big data ethics is for everyone.
This op-ed is adapted from "Big Data Ethics," an essay by Neil M. Richards and Jonathan H. King in the Wake Forest Law Review (forthcoming 2014). You can access the paper online here.