Psychohistory & Big Data

English: This image is a reproduction of an or...

Isaac Asimov introduced the fictional scientific field of psychohistory in his Foundation universe. In this science fiction setting, this science could predict the future by analyzing data and making inductive inferences from this data using various algorithms and formulas. The predictions resulting from the science are not about specific individuals, but rather about broad events. For example, the science could predict the fall of the Empire, but it could not be used to predict which specific person would be the emperor at that time.

Not surprisingly, real thinkers have been striving to make such predictions for quite some time and have met with some success at making statistical predictions involving large numbers of people. For example, the number of traffic accidents that will occur in a year can be predicted with a fair degree of accuracy as can the number of births.  However, making the sort of predictions made in the Foundation series has been beyond the reach of current social sciences. However, this might change.

Psychohistory is, in many ways, would work like weather prediction: data needs to be collected, analyzed and used to create mathematical models. Ideally, the model would be a perfect duplicate of reality and time could be accelerated in the model to see what will happen. Of course, making such a model is rather challenging.

One major restrictive factor has been that of data. After all, the ideal would be a perfect reconstruction of the world and to the degree that the available data falls short, the model becomes less than accurate.

While humans have been gathering and storing information since the advent of writing, we are currently gathering and storing more information than ever before. In fact, the practice of gathering, storing and analyzing data is now a standard business practice that goes by the name “Big Data.” Google was one of the pioneers of modern Big Data but other companies and organizations have gotten into the game. Some are involved because it is an industry worth billions while others are involved for other reasons (such as law enforcement). In any case, significant effort is being expended to gather up data that would be useful in predicting human behavior whether the goal is to sell more baby products or fight terrorism. People are, of course, contributing to this process by handing over massive amounts of data via social networking sites and other ways, such as trading private information for “free” stuff.  As such, there is now a massive quantity of Big Data that would be very useful in modeling the future.

The data will, of course, always be less than complete. In addition to the practical limits, there is also the problem of “limited” omniscience—knowing everything that is and was. Unlimited omniscience would include knowing everything, including what will be (assuming that can be known). Given human limitations, we will never have that complete information. As such, the epistemic limits will certainly prevent a perfect model because there will presumably always be past things that we do not know (and perhaps there are unknowable things) and hence they will not be in the data.

But, perhaps there is a way around this. If a suitably awesome machine could be built, perhaps it could predict everything from a single truth—a Cartesian machine of sorts. This leads to a second restrictive element.

A second restrictive factor has been a matter of logic. To be specific, there is the problem of creating the “software” to analyze the massive amounts of data so as to make predictions. Much of this involves inductive reasoning. After all, the goal is to make an inference from what is known (the sample) to what is not known (the target). This sort of reasoning is, of course, essentially philosophical. As such, it is hardly surprising that Leibniz was one of the first to explicitly propose creating a model of reality using symbols. Hobbes also believed that the social sciences could be “real” sciences and took geometry as his model.

While the “software” is still not quite up to psychohistory standards, there have been some impressive results in the business world in the field of predictive analysis. Of course, some of these successes have created some concern such as Target’s infamous use of such results to predict pregnancies and thus engage in targeted marketing of women who were statistically likely to be pregnant based on their buying behaviors.

As might be imagined, metaphysics becomes a factor in regards to predictive software. One important matter is whether or not humans have free will. After all, if humans do have free will in the classic sense, then predicting human behavior will always be limited by that factor. Of course, it can be argued that even if people do have that mysterious free will, people still behave in ways that are subject to statistical analysis. So, X% of people will freely do Y, while Z% of people will freely not. Though they are all free, the general patterns of behavior would certainly remain predictable. After all, we already engage in effective statistical predictions and if these are compatible with our (alleged) free will, then it seems reasonable that the same would apply to other large scale predictions as well. As such, psychohistory would be consistent with free will. That said, perhaps free will could be a factor that could “break” some predictions, perhaps in very important ways. The “breakage” caused by free will would seem to depend on how much impact individual choice has on the behavior of the whole.

A second important matter is, obviously enough, whether reality is determined or not. If we live in a deterministic world, this would seem to make definitive predictions easier (if that even makes sense to say in a deterministic universe). After all, there would be no random chance or free will to complicate matters. Of course, even if we live in a random universe then predictions would still be possible. They would, of course, lack the certainly that would be theoretically possible in a deterministic universe, but such is life in a random universe.

A third important matter is whether or not reality can be adequately modeled. This involves concerns about the nature of reality as well as the capability of humans to develop a means of modeling reality. It seems reasonable to believe that our models will always fall short of reality, thus ensuring that predictions will always potentially be in error.

A third restrictive factor is processing power. Before computers, data analysis was done by humans and this placed a rather serious limit on the volume of data processed and the speed at which it could be done. While modern computers lack human intelligence, they are well suited to data analysis—at least once they have been properly programmed by humans. While the industry is starting to run into the limits imposed by physics when it comes to improvements in processors, creating massive networks as provided a means to work around this, at least for a while.

There is, of course, the fact that it is probably impossible to build a machine with enough processing power to recreate the world (even if it is assumed that the data is complete and completely accurate) even in a virtual way. As such, this will also limit the efficacy of predictions.

Perhaps someday we will be able to predict the future so as to know whether or not we need to wear shades.

My Amazon Author Page. Big Data predicts that you will buy some books. Don’t make big data cry.

Enhanced by Zemanta
  1. Mike: there are several reasons why such projects are doomed to failure.

    1/. Any computational machine big enough to analyse the entire universe would have to be
    (a) not in the universe when it did it, or problems of recursions arise
    (b) as big as the universe.
    (c) Even if (a) is satisfied, since no part of it including its output could be allowed to enter the know universe, the answer would not in any case be available.

    Looking at more realistic partial approximations we come up against – even there – more problems.

    We may be able to reduce the universes supposed causal change to partial or complete time differential equations as we do with g.g. Newtons laws, but the existence of e.h. the three body problem shows that the time integral of such derivatives, no matter how simple, is both extraordinarily complex and subject to extreme sensitivity: That is, simple formulae when integrated over time can and do lead to massively chaotic results.

    Finally, we have no way of knowing whether such derivative formulae that we may derive are in fact correct, or merely local approximations to the actual behaviour (as e.g. Einsteins corrections to Newtonian mechanics showed).

    Thus there is no hope of a total solution of any value that can be relied on.

    That doesn’t mean that no solutions of any value cannot be produced.

    One of my grandmothers aphorisms was that ‘it takes three generations to make a gentleman’ by which she meant that only the great grandchildren of a peasant would be so inculcated with the correct behavioral norms that they were indistinguishable from the extant aristocracy. That does suggest some kind of psychological lag – and indeed I have seen such lags over my lifetime, with issues such as racism and homophobia. Which in my teenage years were so common as to pass unremarked, but are now more or less rare in educated people.

    So whilst the idea has merit, it will never be as complete a science as to make it more than a better guess IMHO.

    Occasionally it does have merit. Way back in 2008 I predicted that the global financial crisis would last 15-20 years, for no other reason than it would take at least a generation of ~zero growth before people accepted it as the ‘new norm’ and adapted to it psychologically and culturally.

    5 years on I see no sign that they are, nor any reason to change that prediction.

  2. But this is just the beginning:,0,409336.story
    and it will never be as predicted
    Why would you say that Asimov was not a ‘real thinker’?

  3. In my opinion free will, determinism and the fundamental nature of reality are largely irrelevant to this question. Even if reality is perfectly predictable in principle for a predictor with perfect knowledge of the present and unlimited processing power, that is of little use to us, since we will never have that capability. Our predictions will always be subject to unforeseen contingencies.

    In many respects humans and societies are chaotic systems, where small contingent events (or differences in initial state) can lead to greatly different outcomes. Even the weather will never be predictable very far ahead because of chaos. Such prediction is subject to dimishing returns: a large improvement in models and processing power will render only a modest increase in forecasting range.

    In the case of society, a single particle of cosmic radiation can cause a cancer that kills a person who, if she’d lived, might have had a major effect on the course of history. I think Foundation-style psychohistory will always be out of reach because that kind of historical development is too dependent on unforeseeable contingencies.

    Sorry to be a spoilsport. 😉

  4. Very enjoyable read. I have some other thoughts on big data and psychohistory.

  5. Big data. Buzzwords? « out of my mind - pingback on July 31, 2012 at 3:51 am

Leave a Comment

NOTE - You can use these HTML tags and attributes:
<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

Trackbacks and Pingbacks: