Big Data & Ethics


Big Data, The Moving Parts: Fast Data, Big Ana...

Big Data, The Moving Parts: Fast Data, Big Analytics, and Deep Insight (Photo credit: Dion Hinchcliffe)


For those not familiar with the phrase,  “Big Data” is used to describe the acquisition, storage and analysis of large quantities of data. The search giant Google was one of the pioneers in this area and it is developed into an industry worth billions of dollars. Big Data and its uses also raise ethical concerns.

One common use of Big Data is to analyse customer data so as to make predictions that would be useful in conducting targeted ad campaigns. Perhaps the most infamous example of this is Target’s pregnancy targeting. This Big Data adventure was a model of inductive reasoning. First, an analysis was conducted of Target customers who had signed up for Target’s new baby registry. The purchasing history of these women was analysed to find patterns of buying that corresponded to each stage of pregnancy. For example, pregnant women were found to often buy lots of unscented lotion at the start of the second trimester. Once the analysis revealed the buying patterns of pregnant women, Target then applied this information to the buying patterns of women customers. Oversimplifying things, they were essentially using an argument by analogy:  inferring that hat women not known to be pregnant who had X,Y, and Z patterns were probably pregnant because women known to be pregnant had X,Y, and Z buying patterns.  The women who were tagged as probably pregnant were then subject to targeted ads for baby products and this proved to be a winner for Target, other than some public relations issues.

One interesting aspect of this method is that it does not follow the usual model of predicting a person’s future buying behavior from  his/her past buying behavior. An example of predicting future buying behavior based on past behavior would be predicting that I would buy Gatorade the next time I went grocery shopping because I have been bought it consistently in the past. The analysis used by Target and other companies differs from this model by making inferences about the future behavior of customers based on their similarity to customers whose past buying behavior is known. For example, a store might see shifts in someone’s buying behavior that matches other data from people starting to get into fitness and thus predict the person was getting into fitness. The store might then send the person (and others like her) targeted ads featuring Gatorade coupons because their models show that such people buy more Gatorade.

This method also has an interesting Sherlock Holmes aspect to it. The fictional detective was able to use inductive logic (although he was presented as deducing) to make impressive inferences from seemingly innocuousness bits of information. Big Data can do this in reality and make reliable inferences based on what appears to be irrelevant information. For example, likely voting behavior might be inferred from factors such as one’s preferred beverage.

Naturally, Big Data can be used to sell a wide variety of products, including politicians and ideology. It also has non-commercial applications, such a law enforcement and political uses. As such, it is hardly surprising that companies and agencies are busily gathering and analyzing data at a relentless and ever growing pace. This certainly is cause for concern.

One ethical concern is that the use of Big Data can impact the outcome of elections. For example, analyzing massive amounts of data information can be acquired that would allow ads to be effectively crafted and targeted. Given that Big Data is expensive, the data advantage would tend to go to the side with the most money, thus increasing the influence of money on the outcome of elections. Naturally, the influence of money on elections is already a moral concern. While more spending does not assure victory, there is a clear connection between spending and success. To use but one obvious example, Mitt Romney was able to beta his Republican competitors in part by being able to outlast them financially and outspend them.

In any case, Big Data adds yet another tool and expense to political campaigning, thus making it more costly for people to run for office. This, in turn, means that those running for office will need even more money than before, thus making money an even greater factor than in the past. This, obviously enough, increases the ability of those with more money to influence the candidates and the issues.

On the face of it, it would seem unreasonable to require that campaigns go without Big Data. After all, it could be argued that this would be tantamount to demanding that campaigns operate in ignorance. However, the concerns about big money buying Big Data to influence elections could be addressed by campaign finance reform, which would be another ethical issue.

Perhaps the biggest ethical concern about Big Data is the matter of privacy. First, there is the ethical worry that much of the data used in Big Data is gathered without people knowing how the data will be used (and perhaps that it is even being gathered). For example, the customers at Target seemed to be unaware that Target was gathering such data about them to be analyzed and used to target ads.

While people might know that information is being collected about them, knowing this and knowing that the data will be analyzed for various purposes are two different things. As such, it can be argued that private data is being gathered without proper informed consent and this is morally wrong.

The obvious solution is for data collectors to make it clear about what the data will be used for, thus allowing people to make an informed choice regarding their private information. Of course, one problem that will remain is that it is rather difficult to know what sort of inferences can be made from seemingly innocuous data. As such, people might think that they are not providing any private data when they are, in fact, handing over data that can be used to make inferences about private matters.

If a business claims that they would be harmed because people would not hand over such information if they knew what it would be used for, the obvious reply is that this hardly gives them the right to deceive to get what they want. However, I do not think that businesses have much to worry about—Facebook has shown that many people are quite willing to hand over private information for little or nothing in return.

A second and perhaps the most important moral concern is that Big Data provides companies and others with the means of making inferences about people that go beyond the available data and into what might be regarded as the private realm. While this sort of reasoning is classic induction, Big Data changes the game because of the massive amount of data and processing power available to make these inferences, such as whether women are pregnant or not. In short, the analysis of seemingly innocuous data can yield inferences about information that people would tend to regard as private—or at the very least, information they would not think would be appropriate for a company to know.

One obvious counter to this is to argue that privacy rights are not being violated. After all, as long as the data used does not violate the privacy of individuals, then the inferences made from this data cannot be regarded as violating people’s privacy, even if the inferences are about matters that people would regard as private (such as pregnancy). To use an analogy, if I were to spy on someone and learn from thus that she was an alcoholic, then I would be violating her privacy. However, if I inferred that she is an alcoholic from publically available information, then I might know something private about her, but I have not violated her privacy.

This counter is certainly appealing. After all, there does seem to be a meaningful and relevant distinction between directly getting private information by violating privacy and inferring private information using public (or at least legitimately provided) data. To use an analogy, if I get the secret ingredient in someone’s prize recipe by sneaking a look at the recipe, then I have acted wrongly. However, if I infer the secret ingredient by tasting the food when I am invited to dinner, then I have not acted wrongly.

A reasonable reply to this counter is that while there is a difference between making an inference that yields private data and getting the data directly, there is also the matter of intent. It is, for example, one thing to infer the secret ingredient simply by tasting it, but it is quite another to arrange to get invited to dinner specifically so I can get that secret ingredient by tasting the food.  To use another example, it is one thing to infer that someone is an alcoholic, but quite another to systematically gather public data in order to determine whether or not she is an alcoholic. In the case of Big Data, there is clearly intent to infer data that customers have not already voluntarily provided. After all, if the data had been provided, there would be no need to undertake an analysis in order to get the desired information. Thus, while the means do not involve a direct violation of privacy rights, they do involve an indirect violation—at least in cases in which the data is private (or at least intended to be private).

The solution, which would probably be rather problematic to implement, would involve setting restrictions on what sort of inferences can be made from the data on the grounds that people have a right to keep that information private, even if the means used to acquire it did not involve any direct violations of privacy rights.

My Amazon author page.

Enhanced by Zemanta
Leave a comment ?


  1. This is an area in which I actually run a business in the application of Predicitive Analytics on the Big Data coming out of Mission Critical Networks, such as the Smart Grid.

    My main ethical concerns are less on predictive consumer analysis based on data you could choose not to offer up, it is based on data you don’t offer up but is allowed to be gathered without your consent. That’s a “means” good/bad issue.

    If that data is used to serve “good” ends, and they are delivered, a consequentialist might find utiliity in it.

    I have my doubts of such a utilitarian argument though, since it is hard to justify ethically if those good ends do not materialize (and in complex/chaotic human problems that happens more often than the analysts would like to admit). Furthermore the question of who gets to define the “good” of the utility function exists – this is a potential “tyranny of the masses” issue.

    If the analytic results are used to restrict a persons choices and opportunities then it seriously becomes an issue for me.

    Consider if you are rejected University education because you are considered a bad “economic bet” compared to others in being able to use effectively that education. Or if you are assigned demeaning employment in a Brave New World of alphas, betas… etc.

    Or if Predicitive Analytics on Big Data is used to predict your tendency to being “anti-social” or likely to commit a crime. Such “Predicitive Policing” is a real application being deployed. It smacks of a “Minority Report” like dilemma. Shall we gaol or medically intervene with pre-crime data analysed prospects to prevent crimes waiting to happen? If we don’t and the bad happens are we culpable?

    Technology is a great thing when used ethically and a useful tool for the tyrant who seeks to control us.

  2. Norman Hanscombe

    There’s nothing very new in data being used for such aims, and the role of money is hardly a ‘shock-horror’ novel factor.
    In a world where ‘top’ researchers seem oblivious to such basic distinctions as that between correlation and causation, perhaps we should be more concerned about the role of cognitive dissonance in helping keep both lay and ‘expert’ dabblers in philosophy blissfully happy with whatever set of ‘truths’ make them feel comfortable?
    In 1962 I encountered problems ethics faced associated with ideas raised by philosophers such as A.J. Ayer and Stevens. Unfortunately the implications threatened to encroach on one of philosophy’s main fields, so (like Basil Fawlty) the policy became “Don’t mention emotive bases of ethics.”
    In light of increasing numbers of students finding the once routinely accepted requirements of basic Philosophy I courses too hard, one can understand this — but?

  3. I was amazed but not shocked to discover that the agreement to use the Amazon Kindle contains the right to use all your reading data for their research. So when you underline (highlight) a section they get to know about that. When you read a book how long it takes you to read it and whether you finish it is known to them. All possible behaviours in relation to your reading becomes known to them and this sort of information is pure gold for the opportunist publisher who will design a book to suit the greatest amount of readers.

    Personally I use the more superior ereader from Sony who do none of this, I am assured and being a simple person I believe them. Expect more Grey novels and sexed up Austen, Pickwick with whips….

  4. I agree with Michael, I think the burden rests on the consumer to sift through the user agreements if there is something that they would object to i.e. using your reading data for such purposes. If you are willing to read those hefty, jargon infused novellas, and you find something in there you don’t like, then exercise your consumer power of not using that particular product.

    If you don’t read the agreements, then you’ve only yourself to blame.

    But if a company or product is not giving you the opportunity to decline their service or product on this basis, then I think some issues will arise.

  5. Ben,

    Good points. However, one concern is that the agreements are often incredibly long (see, for example, the iTunes EULA). While it can be justly said that a person should read before accepting, there is the expectation that the agreement will be kept to a manageable length.

    Another concern, as you note, is that the data is often garnered without explicit consent.

    I happened to see the movie In Time recently. While the movie was…not good, it did have the clever device of making time the currency of the world. It would be interesting to see a well done sci-fi movie about a future in which private data is the new currency.

  6. Big Data & Ethics | Talking Philosophy « Big Data Analytics - pingback on July 31, 2012 at 3:50 am
  7. 5 Ethical Issues Data Analysts face in the U.S | limelight828 - pingback on February 29, 2016 at 12:20 pm
  8. Great and informative post.

Leave a Comment

NOTE - You can use these HTML tags and attributes:
<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

Trackbacks and Pingbacks: