Data is eating the world
Data is eating the world and winning Nobel Prizes, hiding behind better marketing terms like ‘artificial intelligence’ or ‘AI’. The 2024 Nobel Prize in Physics was awarded to Geoffrey Hinton (the ‘godfather of AI’) and John Hopfield for their work in ‘machine learning with artificial neural networks’. Half of the Chemistry Prize went to Demis Hassabis and John Jumper from DeepMind for their AlphaFold, a machine that learned to correctly guess the structure of proteins, also with the help of artificial neural networks.
Both prizes were awarded for what used to be called “computational statistics”: complex statistical methods that identify patterns in large amounts of data and a process for developing a computer program that can find the same patterns or abstractions in a new data set. Both prizes celebrate the recent success of the ‘connectionist’ approach to the development of computer programs for ‘artificial intelligence’.
Artificial neural network-based “connectionism,” which emerged in the mid-1950s, was overshadowed until about a decade ago by “symbolic AI,” another approach that emerged around the same time. The proponents of symbolic AI considered the statistical analysis favored by the connectionists as “alchemy,” because they believed in the power of (their) human intelligence and its ability to devise “rules” for logical thinking that then can be programmed to make computers think. , reason and plan. This was the dominant ideology in computer science in general, the fervent belief in “expert systems” (as a branch of symbolic AI was called), that is, that for experts (computer scientists, AI researchers and developers) it is possible to convert human knowledge into computer code.
When Jonathan Rosenfeld, co-founder and CTO of Somite.ai, recently explained his AI Scaling Laws to me, he mentioned Rich Sutton’s The Bitter Lesson in the context of why he (Rosenfeld) wants to “do better than experts.” Reviewing AI breakthroughs in chess, Go, speech recognition and natural language processing, Sutton concluded that “the only thing that matters in the long run is leveraging computation.” All that matters is the decreasing cost of a unit of account or ‘Moore’s law’.
Sutton learned two lessons from the bitter lesson of computers. One insight is that what experts think about thinking and the rules they come up with doesn’t matter much, because it’s useless to find “simple ways to think about the contents of the mind.” The other lesson is that number crunching wins (eventually) every time because, unlike experts, it scales: “the power of general purpose methods, of methods that continue to scale with more computing power even as the available computing power becomes very large.”
Sutton noted that “the two methods that appear to scale randomly in this way do search And learn”, methods that have served as the basis for successful recent AI breakthroughs such as image classification, AlphFold and LLMs. But while the cost of computation has been declining rapidly and steadily for decades, these breakthroughs have only occurred in the past decade. Why?
Arguably, Sutton has highlighted an important factor – the falling cost of computation – in the recent triumph of artificial neural networks (or deep learning or computational statistics). In 2019, however, he should have acknowledged another important contribution to the sudden triumph of connectionism: the availability of a lot of data.
When Tim Berners-Lee invented the World Wide Web thirty-five years ago, he (and many other inventors who followed him) created a massive data repository that was accessible to billions of Internet users around the world. Coupled with new tools (primarily the smartphone) for creating and sharing data in multiple forms (text, images, video), the Internet provided the critical enabler that enabled the recent success of the old-new approach to “AI.”
Falling computational costs and the discovery that GPUs were the most efficient way to process the computations needed to find patterns in lots of data were not, by themselves, the factors that made the 2012 breakthrough in image classification possible. The main contribution to this breakthrough was the availability of labeled images taken from the Internet and collected in 2009 into ImageNet, an organized database. Similarly, the invention of a new type of statistical model for processing and analyzing text in 2017 made a significant contribution to the current ChatGPT bubble, but ‘generative AI’ could not have happened without the enormous amount of text (and images and videos) that was available (with or without permission) on the Internet.
Why is the declining cost of computation or ‘Moore’s Law’ so central to the descriptions and explanations of the trajectory of computers in general and the advancement of ‘AI’ in particular? Why have computer industry observers been missing the most important trend, the explosion of data, driving technological innovations, since the 1990s?
The term ‘data processing’ was coined in 1954. “It’s not that all industry participants ignored data,” I wrote in 2019. “But data was seen as the consequence and not the cause, the result of faster and faster devices processing data faster and faster. data and increasingly larger containers (also made possible by Moore’s law) to store it.”
‘Data’ is an ephemeral concept, difficult to define and quantify, unlike processing power and the fact that we could see firsthand the rapid shrinkage of computers. The highly successful marketing powerhouse Intel also helped drive the focus on processing rather than data.
‘Data’ had a brief PR success between about 2005 and 2015, with terms like ‘Big Data’ and ‘Data Science’ becoming the talk of the town. But these were soon overshadowed by the most successful marketing and branding campaign ever, that of ‘artificial intelligence’. Yet data continues to eat the world, for better or for worse. Ultimately, it even received – without explicit recognition – two Nobel Prizes.