Culturomics is a form of computational lexicology that studies human behavior and cultural trends through the Statistics of digitized texts. Researchers data mining large to investigate cultural phenomena reflected in language and word usage. The term is an American neologism first described in a 2010 Science article called Quantitative Analysis of Culture Using Millions of Digitized Books, co-authored by Harvard researchers Jean-Baptiste Michel and Erez Lieberman Aiden.
Michel and Aiden helped create the Google Labs project Google Ngram Viewer which uses to analyze the Google Books digital library for cultural patterns in language use over time.
Because the Google Ngram data set is not an unbiased sample, and does not include metadata, there are several pitfalls when using it to study language or the popularity of terms. Medical literature accounts for a large, but shifting, share of the corpus, Comparison of example terms which does not take into account how often the literature is printed, or read.
In a 2012 paper by Alexander M. Petersen and co-authors, they found a "dramatic shift in the birth rate and death rates of words": Deaths have increased and births have slowed. The authors also identified a universal "tipping point" in the life cycle of new words: at about 30 to 50 years after their origin, they either enter the long-term lexicon or fall into disuse. "The New Science of the Birth and Death of Words ", CHRISTOPHER SHEA, Wall Street Journal, March 16, 2012
Culturomic approaches have been taken in the analysis of newspaper content in a number of studies by I. Flaounas and co-authors. These studies showed macroscopic trends across different news outlets and countries. In 2012, a study of 2.5 million articles suggested that gender bias in news coverage depends on topic and how the readability of newspaper articles is related to topic. A separate study by the same researchers, covering 1.3 million articles from 27 countries, showed macroscopic patterns in the choice of stories to cover. In particular, countries made similar choices when they were related by economic, geographical and cultural links. The cultural links were revealed by the similarity in voting for the Eurovision song contest. This study was performed on a vast scale, by using statistical machine translation, text categorisation and information extraction techniques.
The possibility to detect public opinion by analysing Twitter content was demonstrated in a study by T. Lansdall-Welfare and co-authors.
In a 2013 study by S Sudhahar and co-authors, the automatic parsing of textual corpora has enabled the extraction of actors and their relational networks on a vast scale, turning textual data into network data. The resulting networks, which can contain thousands of nodes, are then analysed by using tools from Network theory to identify the key actors, the key communities or parties, and general properties such as robustness or structural stability of the overall network, or centrality of certain nodes.
In a 2014 study by T Lansdall-Welfare and co-authors, 5 million news articles were collected over 5 years
In 2015, a study revealed the bias of the Google books data set, which "suffers from a number of limitations which make it an obscure mask of cultural popularity," and calls into question the significance of many of the earlier results.
Culturomic approaches can also contribute towards conservation science through a better understanding of human-nature relationships, with the first research published by McCallum and Bury in 2013. This study revealed a precipitous decline in public interest in environmental issues. In 2016, a publication by Richard Ladle and colleagues highlighted five key areas where culturomics can be used to advance the practice and science of conservation, including recognizing conservation-oriented constituencies and demonstrating public interest in nature, identifying conservation emblems, providing new metrics and tools for near-real-time environmental monitoring and to support conservation decision making, assessing the cultural impact of conservation interventions, and framing conservation issues and promoting public understanding.
In 2017, a study correlated Arthralgia with Google search activity and temperature. While the study observed higher search activity for hip and knee pain (but not arthritis) during higher temperatures, it does not (and cannot) control for relevant other factors such as activity. Mass media misinterpreted this as "myth busted: rain does not increase joint pain", while the authors speculate the observed correlation is due to "changes in physical activity levels".
|
|