Skip to content


November 2, 2011

Physics and biology are sometimes called the hard sciences.  For instance, we use carefully controlled experiments and rigorous statistical methods to study physical phenomena. On the other hand, the social sciences are known as the soft sciences.  Until recently it was difficult to generate sufficient quantities of reliable data, and apply rigorous statistical methods to study how different aspects of culture change and evolve.    

However, things are starting to change. Google’s attempt to digitize all books ever published has provide a treasure trove of data.  Indeed, a team of researchers from Harvard lead by Jean-Baptiste Michel and Erez Lieberman Aiden has recently analyzed about 5% of all books ever published.  In this vast amount of text they detected trends that gave a glimpse of how language and culture have changed over time.  

The richness and variety of the insights are amazing.  For instance, they tracked how frequently different verbs appear in books and newspapers, and observed how irregular verbs become regular.  Over the last 400 years irregular verbs have become more regular. In 1800 we “chid” unruly children, but in the year 2000 we “chided” them.  The more frequently a verb is used, the more resilient it is to regularization: “spoke” will not turn into “speaked” for a long time.  However, the verb “sped” is giving way to “speeded,” a process that started around 1920 and is still going on today.  Linguists have known that verbs change in this way, but the Google book data offered detailed insight into this process of transmutation.

One can do much more with the data.  The team looked how frequently famous people are mentioned over time.  Typically fame reaches a peak about 75 years after person’s birth, and declines thereafter.  However, today people rise to fame much faster than they have in the past.  They become more famous than the past celebrities.  But we also forget them more quickly.

One can also see how Soviet dissident abruptly disappeared from pages of books as they fell out of favor with the governments.  The same happened to famous Jews when the Nazis came to power in Germany. Indeed the researchers were able to identify victims of Nazi repression by checking how frequently they were mentioned in print during the years of Hitler’s rule.  The Harvard team analyzed culture as biologists analyze the genomes of animals.  Therefore they called their approach “culturomics”.

The field of “culturomics” has since taken off.  Statistics was used to show how the Arab Spring could have been predicted simply by tracking the overall “tone” in the coverage of current events in a region.  Looking at a single newspaper may not tell much. However, an analysis of thousands of articles from hundreds of newspapers can detect even subtle trends in public sentiment. With well designed algorithms, computers are good at sifting through these vast amounts of data.

This looks like an incredibly promising beginning for “culturomics”. However, blindly mining large quantities of data can easily lead to misleading results. Searching only published records is likely susceptible to a number of biases. The further we go back in time, the higher were the costs of book production.  Hence, books were published either by the well-to-do, or those with rich patrons.  In either case, they likely do not offer an accurate picture of either the general culture or the common language of the time.  Moreover, books have traditionally been used to capture information of a more enduring nature, as judged by the writer.  The snapshots of culture they provide will be different than the picture  we get from newspapers, or from street conversations.But such biases exist in all our attempts to uncover the past. We look at history through the prism of objects and documents that have survived over the ages. We interpret the information we obtain using many assumptions and guesses.  Social scientists frequently claim that human nature has not changed much over time.  Ancient people, they say, were driven largely by the same needs, desires and fears as us.  However, this is not only a conclusion, but frequently also an assumption made to understand the past. This is why we would likely never be able to understand the culture of an alien civilization from its remains.  No matter how hard we tried, we would make assumptions about the aliens, assumptions that would be very human, and likely very wrong.

Even 20 years ago it may have been hard to imagine that society and culture can be studied with great detail and accuracy. The influx of data and the increase in computational power is allowing us to quantify cultural and linguistic evolution. However, even well designed statistical analysis can give misleading results. To truly interpret and understand the results of these analyses we will always need expert historians and linguists, since only they truly understand the data within within its historical and cultural context.


From → Uncategorized

Leave a Comment

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: