Progress and hype in AI research
It is quite possible that people in the future will wonder why so many people back in 2019 thought playing Go and other fixed games in simulated environments after long training had that much to do with intelligence  .
Intelligence is more about applying/transferring old knowledge to new task (playing Quake Arena good enough without any training after mastering Doom) than it is about compressing experience into heuristics to predict outcome and determining action to maximize predicted outcome value in given state (playing Quake Arena good enough after million games after mastering Doom).
Human intelligence is about ability to adapt to the physical/social world, and playing Go is a particular adaptation performed by human intelligence, and developing algorithm to learn to play Go is more performant adaptation, and developing mathematical theory to play Go might be even more performant.
It makes more sense to compare a human and AI not by effectiveness/efficiency of end product of adaptation (in games played between human and agent) but by effectiveness/efficiency of process of adaptation (in games played between human-coded agent and machine-learned agent after limited practice).
Dota 2, StarCraft 2, Civilization 5 and probably even GTA 5 might be solved in not so distant future but ability to play any new game at human level with no prior training would be way more significant.
The second biggest issue with AI is lack of robustness in a long tail of unprecedented situations (including critical ones in healthcare, self-driving vehicles, finance) which at present can't be handled with accuracy even close to acceptable [7a] [7b] [7c] [7d] [7e] [7f].
Complex models exploit any patterns that relate input to output variables but some patterns might not hold for cases poorly covered by training data [section "progress"] [8a] [8b] [8c]. >99% of healthcare applications use simple models like logistic regression (domain knowledge is converted into code to compute statistics as features) so as to avoid spurious correlations and gain more robustness on outliers [9a] [9b] [9c].
For an agent in simulated environment like Go or Quake, true model of environment is either known or available so that agent can generate any amount of training data in order to learn how to act optimally in any situation. Finding out correlations in that data isn't intelligent — for real-world problems discovering true model is key [10a] [10b] [10c].
For an organism, the real world is not a fixed game with known environment and rules such as Go or Quake but a game with environment and rules largely unknown and always changing. It has to adapt to unexpected changes of environment and rules including changes caused by adversaries. It has to be capable of wide autonomy as opposed to merely automation necessary to play some fixed game.
It might turn out to be impossible to have self-driving vehicles and humanoid robots operating alongside humans without training them to obtain human-level adaptability to the real world. It might turn out to be impossible to have personal assistants substituting humans in key aspects of their lives without training them to obtain human-level adaptability to the social world .
knowledge vs intelligence
Knowledge is some information, such as data from observations or experiments, compressed and represented in some computable form, such as text in natural language, mathematical theory in semi-formal language, program in formal language, weights of artificial neural network or synapses of brain.
Knowledge is about tool (theory, algorithm, physical process) to solve problem. Intelligence is about applying (transferring) and creating (learning) knowledge . There is knowledge how to solve problem (algorithm for computer, instructions for human), and then there is process of applying knowledge (executing program by computer, interpreting and executing instructions by human), and then there is process of creating knowledge (inductive inference/learning from observations and experiments, deductive reasoning from inferred theories and learned models).
Alpha(Go)Zero is way closer to knowledge how to solve particular class of problems than to an intelligent agent capable of applying and creating knowledge. It is a search algorithm like IBM Deep Blue with heuristics being not hardcoded but being tuned during game sessions. It can't apply learned knowledge to other problems — even playing on smaller Go board. It can't create abstract knowledge useful to humans — even simple insight on Go tactic.
TD-Gammon from 1992 is considered by many as the biggest breakthrough in AI . Note that TD-Gammon didn't use Q-learning — it used TD(λ) with online on-policy updates. TD-Gammon's author used its variation to learn IBM Watson's wagering strategy .
Alpha(Go)Zero is also roughly a variation of TD(λ) . TD-Gammon used neural network trained by Temporal Difference learning with target values calculated using tree search with depth not more than three and using outcomes of games played to the end as estimates of leaf values. Alpha(Go)Zero used deep neural network trained by Temporal Difference learning with target values calculated using Monte-Carlo Tree Search with much bigger depth and using estimates of leaf values and policy actions calculated by network without playing games to the end.
Qualitative differences between Backgammon and Go as problems and between TD-Gammon and Alpha(Go)Zero as solutions (scale of neural network and number of played games being major differences) are not nearly as big as qualitative differences between perfect information games such as Go and imperfect information games such as Poker (Alpha(Go)Zero and Libratus can't be used correspondingly for Poker and Go).
IBM Watson, the most advanced question answering system by far in 2011, is not an intelligent agent. It is knowledge represented as 100Ks lines of hand-crafted logic for searching and manipulating sequences of words as well as generating hypotheses and supporting arguments plus few hundred parameters tuned with linear regression for weighing in different pieces of knowledge for each supported type of question and answer [16a] [16b]. It's not that much different conceptually from database engines which use statistics of data and hardcoded threshold values to construct a plan for executing given query via selecting and pipelining a subset of implemented algorithms for manipulating data.
what is intelligence
Biologists define intelligence as ability to find non-standard solutions for non-standard problems and distinguish it from reflexes and instincts defined as standard solutions for standard problems [19a] [19b]. Playing Go can't be considered a non-standard problem for AlphaGo after playing millions of games. Detecting new malware can be considered a non-standard problem with no human-level solution so far.
Necessity to adapt/survive provides optimization objectives for organisms to guide self-organization and learning/evolution. Some organisms can set up high-level objectives for themselves after being trained/evolved to satisfy low-level objectives.
Most AI researchers focus on top-down approach to intelligence, i.e. defining objective for high-level problem (such as maximizing expected probability of win by Alpha(Go)Zero) and expecting agent to learn good solutions for low-level subproblems [20a]. This approach works for relatively simple problems like games in simulated environments but requires a huge amount of training episodes (several orders of magnitude more than amount which can be experienced by agent in the real world) and leads to solutions incapable of generalization (AlphaGo(Zero) trained on 19x19 board is useless for 9x9 board without training from scratch). Hardest high-level problems which can be solved by humans are open-ended — humans don't search in fixed space of possible solutions unlike AlphaGo [20b]. Being informed and guided by observations and experiments in the real world, humans come up with good subproblems, e.g. special and general relativity.
A few AI researchers [section "possible directions"] focus on bottom-up approach, i.e. starting with low-level objectives (such as maximizing ability to predict environment or effect on environment), then adding higher-level objectives for intrinsic motivation (such as maximizing learning progress or maximizing available options), and only then adding high-level objectives for problems of interest (such as maximizing game score). This approach is expected to lead to more generalizable and robust solutions for high-level problems because learning with such low-level objectives might lead an agent to also learn self-directing and self-correcting behavior helpful in non-standard or dangerous situations with zero information about them effectively provided by high-level objective. It is quite possible that some set of universal low-level objectives might be derived from a few equations governing flow of energy and information , so that optimization with those objectives might lead to intelligence of computers in an analogous way to how evolution of the Universe governed by laws of physics leads to intelligence of organisms [section "possible directions"].
While solving high-level problems in simulated environments such as Go had successes, solving low-level problems such as vision and robotics are yet to have such successes. Humans can't learn to play Go without first learning to discern board and to place stones. Computers can solve some high-level problems without ability to solve low-level ones when high-level problems are abstracted away from low-level subproblems by humans. It is low-level problems which are more computationally complex for both humans and computers although not necessarily more complex as mathematical or engineering problems. It is low-level problems which are prerequisites for commonsense reasoning, i.e. estimating plausibility of arbitrary hypothesis from obtained or imagined observations and from all previously acquired knowledge, which is necessary for machine to adapt to arbitrary environment and to solve arbitrary high-level problem in that environment [22a] [22b].
The first biggest obstacle to applications in real-world environments as opposed to simulated ones seems to be underconstrained objectives for optimization, since the real world is a way more complex environment than any video game due to amount of moving parts and interactions as well as amount of adapting agents. Any sufficiently complex model trained with insufficiently constrained objective will exploit any pattern found in training data that relates input to target variables but spurious correlations won't necessarily generalize to testing data [section "progress"] [23a] [23b] [23c]. Even billion examples don't constrain optimization sufficiently and don't lead to major performance gains in image recognition [24a] [24b] [24c]. Agents find surprising ways to exploit simulated environments to maximize objectives not constrained enough to prevent exploits [25a] [25b].
One way to constrain optimization sufficiently in order to avoid non-generalizable and non-robust solutions is more informative data for training, such as using physics of the real world or dynamics of the social world as sources of signal as opposed to simulated environments with artificial agents — not nearly as complex and not representative of corner cases in the real/social world. Another way is more complex objective for optimization, such as learning to predict not only statistics of interest such as future cumulative rewards conditionally on agent's next actions but also dynamics, i.e. some arbitrary future properties of environment conditionally on some arbitrary hypothetical future events including agent's next actions [26a] [26b]. States and rewards correspond to agent's summaries for interactions with environment while dynamics corresponds to agent's knowledge about how environment works . Progress in learning to predict dynamics of environment [section "possible directions"] [28a] [28b] as well as progress in creating options to influence environment [section "possible directions"] [28c] [28d] might be some of the most powerful kinds of intrinsic motivation for agent and are another ways to constrain optimization.
The second biggest obstacle seems to be an enormous gap between complexity of simulated environments available for present computers and complexity of real-world environments available for present robots so that agent trained in simulated environment can't be transferred to robot in real-world environment with acceptable performance and robustness . Boston Dynamics team never used machine learning to control their robots — they use real-time solvers of differential equations to calculate dynamics and optimal control for models of robots and environments which are not learned from data but specified manually [30a] [30b]. MIT researchers didn't use machine learning to control their robot in DARPA Robotics Challenge 2015, and their robot was the only robot which didn't fall or need physical assistance from humans [31a]. It might be impossible to encode all useful information into parameters of neural network and some cases might not admit separating data by any number of its decision hyper-planes. A tail event might be not learnable statistically and might require some non-statistical inference, i.e. building valid program/model/theory for it and drawing hypotheses from built abstractions. For that not only statistics but dynamics of phenomena have to be calculated by model, and model might have to be programmed or/and trained to simulate dynamics of phenomena [31b].
It's quite possible that the only way to train/evolve an agent with intelligence sufficient for hard problems in the real world (such as robotics) and in the social world (such as natural language understanding) might turn out to be:
(1) to train/evolve agent in environment which provides as much constraints for optimization as real and the social world (i.e. agent has to be a robot operating in the real world alongside humans);
(2) to train/evolve agent on problems which provide as much constraints for optimization as the hardest problems solved by organisms in the real world (i.e. robot has to learn to survive without any assistance from humans) and solved by humans in the social world (i.e. agent has to learn to reach goals in the real world using communication with humans as its only tool).
Arguably during Deep Learning renaissance period there hasn't been progress in real-world problems such as robotics and language understanding nearly as significant as in fixed games running in simulated environments.
Opinions on progress of AI research from some of the most realistic researchers:
Deep Learning methods are very non-robust in image understanding tasks [papers on generalization and adversarial examples below] [38a] [38b] [38c] [38d] [38e].
Deep Learning methods haven't come even close to replacing radiologists [39a] [39b] [39c].
Deep Learning methods are very non-robust in text understanding tasks [papers on generalization and adversarial examples below] [40a] [40b].
Deep Learning methods can't pass first levels of the hardest Atari game .
"Approximating CNNs with Bag-of-local-Features Models Works Surprisingly Well on ImageNet"
"Measuring the Tendency of CNNs to Learn Surface Statistical Regularities"
"Excessive Invariance Causes Adversarial Vulnerability"
"Do Deep Generative Models Know What They Don't Know?"
"Do ImageNet Classifiers Generalize to ImageNet?"
"Do CIFAR-10 Classifiers Generalize to CIFAR-10?"
"Deep Learning for Segmentation of Brain Tumors: Impact of Cross‐institutional Training and Testing"
"Confounding Variables Can Degrade Generalization Performance of Radiological Deep Learning Models"
"Natural Adversarial Examples"
"One Pixel Attack for Fooling Deep Neural Networks"
"A Rotation and a Translation Suffice: Fooling CNNs with Simple Transformations"
"Semantic Adversarial Examples"
"Why Do Deep Convolutional Networks Generalize so Poorly to Small Image Transformations?"
"The Elephant in the Room"
"Strike (with) a Pose: Neural Networks Are Easily Fooled by Strange Poses of Familiar Objects"
"Universal Adversarial Triggers for Attacking and Analyzing NLP"
"Semantically Equivalent Adversarial Rules for Debugging NLP Models"
"Right for the Wrong Reasons: Diagnosing Syntactic Heuristics in Natural Language Inference"
"Probing Neural Network Comprehension of Natural Language Arguments"
"Data becomes temporarily interesting by itself to some self-improving, but computationally limited, subjective observer once he learns to predict or compress the data in a better way, thus making it subjectively simpler and more beautiful. Curiosity is the desire to create or discover more non-random, non-arbitrary, regular data that is novel and surprising not in the traditional sense of Boltzmann and Shannon but in the sense that it allows for compression progress because its regularity was not yet known. This drive maximizes interestingness, the first derivative of subjective beauty or compressibility, that is, the steepness of the learning curve. It motivates exploring infants, pure mathematicians, composers, artists, dancers, comedians, yourself, and artificial systems."
Intelligence can be viewed as compression efficacy: the more one can compress data, the more one can understand it. Example of increase in compression efficacy: 1. raw observations of planetary orbits 2. geocentric Ptolemaic epicycles 3. heliocentric ellipses 4. Newtonian mechanics 5. general relativity 6.? Under this view, compression of data is understanding, improvement of compressor is learning, progress of improvement is intrinsic reward. To learn as fast as possible about a piece of data, one should decrease as rapidly as possible the number of bits one need to compress that data. If one can choose which data to observe or create, one should interact with environment in a way to obtain data that maximizes the decrease in bits — the compression progress — of everything already known.
"Driven by Compression Progress: A Simple Principle Explains Essential Aspects of Subjective Beauty, Novelty, Surprise, Interestingness, Attention, Curiosity, Creativity, Art, Science, Music, Jokes"
"Formal Theory of Creativity, Fun, and Intrinsic Motivation"
"Curiosity Driven Reinforcement Learning for Motion Planning on Humanoids"
"PowerPlay: Training an Increasingly General Problem Solver by Continually Searching for the Simplest Still Unsolvable Problem"
"Unsupervised Minimax: Adversarial Curiosity, Generative Adversarial Networks, and Predictability Minimization"
"Intelligent system needs to optimize future causal entropy, or to put it in plain language, maximize the available future choices. Which in turn means minimizing all the unpleasant situations with very few choices. This makes sense from evolutionary point of view as it is consistent with the ability to survive, it is consistent with what we see among humans (collecting wealth and hedging on multiple outcomes of unpredictable things) and generates reasonable behavior in several simple game situations."
"By solving a more general problem of physical prediction (to distinguish it from statistical prediction), the input and label get completely balanced and the problem of human selection disappears altogether. The label in such case is just a time shifted version of the raw input signal. More data means more signal, means better approximation of the actual data manifold. And since that manifold originated in the physical reality (no, it has not been sampled from a set of independent and identically distributed gaussians), it is no wonder that using physics as the training paradigm may help to unravel it correctly. Moreover, adding parameters should be balanced out by adding more constraints (more training signal). That way, we should be able to build a very complex system with billions of parameters (memories) yet operating on a very simple and powerful principle. The complexity of the real signal and wealth of high dimensional training data may prevent it from ever finding "cheap", spurious solutions. But the cost we have to pay, is that we will need to solve a more general and complex task, which may not easily and directly translate to anything of practical importance, not instantly at least."
"Rebooting AI — Postulates"
"Intelligence Confuses The Intelligent"
"Intelligence Is Real"
"AI And The Ludic Fallacy"
"The Peculiar Perception Of The Problem Of Perception"
"Statistics And Dynamics"
"Reactive Vs Predictive AI"
"Learning Physics Is The Way To Go"
"Predictive Vision In A Nutshell"
"Unsupervised Learning from Continuous Video in a Scalable Predictive Recurrent Network"
"Fundamental principles of cortical computation: unsupervised learning with prediction, compression and feedback"
"The primary problem in computing today is that computers cannot organize themselves: trillions of degrees of freedom doing the same stuff over and over, narrowly focused rudimentary AI capabilities. Our mechanistic approach to the AI problem is ill-suited to complex real-world problems: machines are the sum of their parts and disconnected from the world except through us, the world is not a machine. Thermodynamics drives the evolution of everything. Thermodynamic evolution is the missing, unifying concept in computing systems. Thermodynamic evolution supposes that all organization spontaneously emerges in order to use sources of free energy in the universe and that there is competition for this energy. Thermodynamic evolution is second law of thermodynamics, except that it adds the idea that in order for entropy to increase an organization must emerge that makes it possible to access free energy. The first law of thermodynamics implies that there is competition for energy."
"All systems perform computations by means of responding to their environment. In particular, living systems compute, on a variety of length- and time-scales, future expectations based on their prior experience. Most biological computation is fundamentally a nonequilibrium process, because a preponderance of biological machinery in its natural operation is driven far from thermodynamic equilibrium. Physical systems evolve via a sequence of input stimuli that drive the system out of equilibrium and followed by relaxation to a thermal bath."
"The free energy principle seems like an attempt to unify perception, cognition, homeostasis, and action. Free energy is a mathematical concept that represents the failure of some things to match other things they’re supposed to be predicting. The brain tries to minimize its free energy with respect to the world, ie minimize the difference between its models and reality. Sometimes it does that by updating its models of the world. Other times it does that by changing the world to better match its models. Perception and cognition are both attempts to create accurate models that match the world, thus minimizing free energy. Homeostasis and action are both attempts to make reality match mental models. Action tries to get the organism’s external state to match a mental model. Homeostasis tries to get the organism’s internal state to match a mental model. Since even bacteria are doing something homeostasis-like, all life shares the principle of being free energy minimizers. So life isn’t doing four things – perceiving, thinking, acting, and maintaining homeostasis. It’s really just doing one thing – minimizing free energy – in four different ways – with the particular way it implements this in any given situation depending on which free energy minimization opportunities are most convenient."
"The Free-Energy Principle: A Unified Brain Theory?"
"Exploration, Novelty, Surprise, and Free Energy Minimization"
"Action and Behavior: a Free-energy Formulation"
"Computational Mechanisms of Curiosity and Goal-directed Exploration"
"Expanding the Active Inference Landscape: More Intrinsic Motivations in the Perception-Action Loop"
Solving many problems in science/engineering might not require computer intelligence described above — if computers will continue to be programmed to solve non-standard problems by humans as it is today. But some very important (and most hyped) problems such as robotics (truly unconstrained self-driving) and language understanding (truly personal assistant) might remain unsolved without such intelligence.