Twitter Mining

Image representing Twitter as depicted in Crun...

Image via CrunchBase

In February, 2014 Twitter made all its tweets available to researchers. As might be suspected, this massive data is a potential treasure trove to researchers. While one might picture researchers going through the tweets for the obvious content (such as what people eat and drink), this data can be mined in some potentially surprising ways. For example, the spread of infectious diseases can be tracked via an analysis of tweets. This sort of data mining is not new—some years ago I wrote an essay on the ethics of mining data and used Target’s analysis of data to determine when customers were pregnant (so as to send targeted ads). What is new about this is that all the tweets are now available to researchers, thus providing a vast heap of data (and probably a lot of crap).

As might be imagined, there are some ethical concerns about the use of this data. While some might suspect that this creates a brave new world for ethics, this is not the case. While the availability of all the tweets is new and the scale is certainly large, this scenario is old hat for ethics. First, tweets are public communications that are on par morally with yelling statements in public places, posting statements on physical bulletin boards, putting an announcement in the paper and so on. While the tweets are electronic, this is not a morally relevant distinction. As such, researchers delving into the tweets is morally the same as a researcher looking at a bulletin board for data or spending time in public places to see the number of people who go to a specific store.

Second, tweets can (often) be linked to a specific person and this raises the stock concern about identifying specific people in the research. For example, identifying Jane Doe as being likely to have an STD based on an analysis of her tweets. While twitter provides another context in which this can occur, identifying specific people in research without their consent seems to be well established as being wrong. For example, while a researcher has every right to count the number of people going to a strip club via public spaces, to publish a list of the specific individuals visiting the club in her research would be morally dubious—at best. As another example, a researcher has every right to count the number of runners observed in public spaces. However, to publish their names without their consent in her research would also be morally dubious at best. Engaging in speculation about why they run and linking that to specific people would be even worse (“based on the algorithm used to analysis the running patterns, Jane Doe is using her running to cover up her affair with John Roe”).

One counter is, of course, that anyone with access to the data and the right sorts of algorithms could find out this information for herself. This would simply be an extension of the oldest method of research: making inferences from sensory data. In this case the data would be massive and the inferences would be handled by computers—but the basic method is the same. Presumably people do not have a privacy right against inferences based on publically available data (a subject I have written about before). Speculation would presumably not violate privacy rights, but could enter into the realm of slander—which is distinct from a privacy matter.

However, such inferences would seem to fall under privacy rights in regards to the professional ethics governing researchers—that is, researchers should not identify specific people without their consent whether they are making inferences or not. To use an analogy, if I infer that Jane Doe and John Roe’s public running patterns indicate they are having an affair, I have not violated their right to privacy (assuming this also covers affairs). However, if I were engaged in running research and published this in a journal article without their permission, then I would presumably be acting in violation of research ethics.

The obvious counter is that as long as a researcher is not engaged in slander (that is intentionally saying untrue things that harm a person), then there would be little grounds for moral condemnation. After all, as long as the data was publically gathered and the link between the data and the specific person is also in the public realm, then nothing wrong has been done. To use an analogy, if someone is in a public park wearing a nametag and engages in specific behavior, then it seems morally acceptable to report that. To use the obvious analogy, this would be similar to the ethics governing journalism: public behavior by identified individuals is fair game. Inferences are also fair game—provided that they do not constitute slander.

In closing, while Twitter has given researchers a new pile of data the company has not created any new moral territory.

My Amazon Author Page

My Paizo Page

My DriveThru RPG Page

  1. M-ree BiPolar

    Concerning the last analogy, there’s another counter. In a park, you CAN wear a nametag, but you also can NOT wear it.

    On a (physical) public bulletin board, you can just hang the ad without signing it.

    On Twitter… You cannot post anonymously. So these name tags being forced on users makes their use to link to specific people somewhat more morally dubious.

    Of course, it can also be countered that no one forces anyone to use Twitter (or some forum board, or whatever else) at all, but then same can be said about speaking at all, too.

Leave a Comment


NOTE - You can use these HTML tags and attributes:
<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>