Data comes from a singular Latin word, datum, which originally meant "something given." Data is the raw facts, before being processed into information that is useful. This chapter examines the unintended consequences, both positive and negative, of digital data on human culture and society. Human cultures have traditionally stored data and information as symbolic language in written form, from notches on a stick, to coded markings on clay tablets to written records. Ancient astronomers recorded the data of planets and stars in order to find patterns in the universe that might be useful information for growing crops or performing rituals. Today's global networked computer systems scan a virtual universe of bits and bytes for patterns in online behavior. In many cases this information is used to make predictions about people. Unlike the data recorded by ancients astronomers, however, digital data is often captured, stored and processed without human intervention. The amount of digital data has also become so large and complex that it has overwhelmed many of the traditional methods that humans use to make sense of phenomena.
Big data refers to massive data sets that can only make sense with the assistance of a computer. On the one hand, big data makes possible predictions that can help people live happier, safer and healthier lives. Google collects data around forest fires, earthquakes and virus outbreaks just by aggregating and interpreting search terms. But search terms can also provide aggregated data about personal details in network behavior. On August 4th 2006, AOL accidentally released a sample log of 20 million search queries. 650,000 AOL users had their personal searches exposed to the world. Their identifying names were removed, but the search terms revealed very private information about user's sex lives, fears, desires and, in some cases, criminal intentions.
An individual U.S. citizen produces an enormous amount of public data as a byproduct of just living in a networked world. Searches, purchases, geographical movements, preferences in food and media entertainment might seem like insignificant raw data on the individual level, but when aggregated and merged with other data from other individuals the data set could result in patterns of behahvior that might produce useful information. "Millenneial purchasing history" is a query that might scan vast data sets over a period of time and then return useful patterns for companies trying to market to millenneials.
Algorithms also capture and process personal data to make consequential decisions about individual lives, such as whether someone should get a loan, keep their health care, or be released on bail. What are the implications and consequences of this type of automated surveillance and analysis? How and when is the big data collected about human behavior an invasion of individual privacy? How might computer systems, which lack the nuance and subtlety of human-to-human interaction, make harmful decisons in the interest of pattern prediction and efficiency? How can humans trust digital data when it is quite easy for AI to create "fake" data?
There are no easy answers to these questions. Big data promises to make the world more efficient and safer, but it also introduces new technological processes that are largely invisible to humans and therefore unpredictable by humans.
Data Analytics, is the discipline of turning raw data into information, and making conclusions about that information. The techniques of data analytics are often automated with programmed algorithms that collect, process, clean and organize vast raw data sets so that patterns can be detected and results presented in a form that can be analysed for human consumption. Because big data analysis often results in numeric values and complex, abstract relations between these values, a data analyst presents findings as a data visualization or infographic accompanied by an interpretive story.
Given the exponential increase of digital data, data analytics and data visualization are growing fields that will impact almost all areas of human life. While the interpretive stories and visualizations that come out of data analysis can give fresh insights into the world's problems, it is important to remember that humans write the algorithms that frame the results. The collection and intepretation of any kind of data is prone to human fallibility.
Machine Learning is a method of data analysis that gives the computer the task not only of sorting and processing big data, but also of analytical model building. As a branch of artificial intelligence, machine learning gives systems the power to learn from data by identifying patterns and make informed decisions with almost no human involvement.
Web search began in 1994, when the World Wide Web was in its infancy. Search has since become an enormous business based purely on the data a search engine extracts about its users' queries. Google delivers ads and packages of data sets are sold to other companies based on keywords in search terms.
Personal data is more than a record of search queries. It is the aggregation of all the data that trails a person with a mobile phone and/or home computer: geolocation, purchases, likes and favorites, keywords in emails. The following documentary presents some of the questions around privacy in the era of big data. Who owns the data of a person's life?
There are some things humans can't do. For example, finding a facial match in a database of millions of images within seconds. Because of such powerful abilities, big data technologies can extend research and creativity into new areas. Below are examples of just some of the ways in which big data is being used in the arts, sciences and humanities.
What does it mean to be human in a world with big data technologies making decisions about health, law, transportation, entertainment, social interaction, desire and taste? Below are some of the most prominent voices today warning of the dangers of trusting Big data technologies to solve problems without a critical reflection about what we are handing over to automation.
From the article How to Protect Your Digital Privacy | nytimes.com
Bridle, James, New Dark Age: Technology and the End of the Future (London ; Brooklyn, NY: Verso, 2018)
Buckland, Michael, Information and Society (Cambridge, Massachusetts: The MIT Press, 2017)
HOLMES, DAWN E., Big Data: A Very Short Introduction, 1 edition (Oxford, United Kingdom: OXFORD, 2014)
Kelleher, John D., and Brendan Tierney, Data Science (Cambridge, Massachusetts: The MIT Press, 2018)
Kernighan, Brian W., D Is for Digital: What a Well-Informed Person Should Know about Computers and Communications, 8/24/11 edition (S.l.: CreateSpace Independent Publishing Platform, 2011)
Lanier, Jaron, Who Owns the Future?, Reprint edition (New York: Simon & Schuster, 2014)
O'Neil, Cathy, Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy, Reprint edition (New York: Broadway Books, 2017)
Simanowski, Roberto, Data Love: The Seduction and Betrayal of Digital Technologies, trans. by Brigitte Pichon, Dorian Rudnytsky, and John Cayley, Reprint edition (New York: Columbia University Press, 2018)