This post is based on my paper “Unpredictability of AI.”
With increases in the capabilities of artificial intelligence, over the last decade, a significant number of researchers have realized the importance in not only creating capable intelligent systems, but also making them safe and secure [1-6]. Unfortunately, the field of AI Safety is very young, and researchers are still working to identify its main challenges and limitations. Impossibility results are well-known in many fields of inquiry [7-13], and some have now been identified in AI Safety [14-16].
In this post, we concentrate on a poorly understood concept of unpredictability of intelligent systems , which limits our ability to understand the impact of intelligent systems we are developing and is a challenge for software verification and intelligent system control, as well as AI Safety in general.
What is Unpredictability of AI?
In theoretical computer science, and in software development in general, many well-known impossibility results are well-established, and some of them are strongly related to the subject of this blog post. For example, Rice’s Theorem states that no computationally effective method can decide if a program will exhibit a particular non-trivial behavior, such as producing a specific output . Similarly, Wolfram’s Computational Irreducibility states that complex behaviors of programs can’t be predicted without actually running those programs . Any physical system which could be mapped onto a Turing Machine will similarly exhibit Unpredictability [20, 21].
Unpredictability of AI, one of many impossibility results in AI Safety also known as Unknowability  or Cognitive Uncontainability , is defined as our inability to precisely and consistently predict what specific actions an intelligent system will take to achieve its objectives, even if we know terminal goals of the system. It is related but is not the same as unexplainability and incomprehensibility of AI. Unpredictability does not imply that better-than-random statistical analysis is impossible; it simply points out a general limitation on how well such efforts can perform, and is particularly pronounced with advanced generally intelligent systems (superintelligence) in novel domains. In fact, we can present a proof of unpredictability for such superintelligent systems.
Proof. This is a proof by contradiction. Suppose that unpredictability is wrong and it is possible for a person to accurately predict decisions of superintelligence. That means they can make the same decisions as the superintelligence, which makes them as smart as superintelligence, but that is a contradiction, as superintelligence is defined as a system smarter than any person is. That means that our initial assumption was false, and unpredictability is not wrong.
The amount of unpredictability can be formally measured via the theory of Bayesian surprise, which measures the difference between posterior and prior beliefs of the predicting agent [24-27]. “The unpredictability of intelligence is a very special and unusual kind of surprise, which is not at all like noise or randomness. There is a weird balance between the unpredictability of actions and the predictability of outcomes.” . A simple heuristic is to estimate the amount of surprise as proportionate to the difference in intelligence between the predictor and the predicted agent. See Yudkowsky [29, 30] for an easy-to-follow discussion on this topic.
What Does This Mean for AI Safety?
Unpredictability is practically observable in current narrow domain systems with superhuman performance. Developers of famous intelligent systems such as Deep Blue (Chess) [31, 32], IBM Watson (Jeopardy) , and AlphaZero (Go) [34, 35] did not know what specific decisions their AI is going to make for every turn. All they could predict was that it would try to win using any actions available to it, and win it did. AGI developers are in exactly the same situation; they may know the ultimate goals of their system, but they do not know the actual step-by-step plan it will execute, which of course has serious consequences for AI Safety [36-39].
There are infinitely many paths to every desirable state of the world. A great majority of them are completely undesirable and unsafe, most with negative side effects. In harder and most real-world cases, even the overall goal of the system may not be precisely known or may be known only in abstract terms, aka to “make the world better.” While in some cases the terminal goal(s) could be learned, even if you can learn to predict an overall outcome with some statistical certainty, you cannot learn to predict all the steps to the goal a system of superior intelligence would take. Lower intelligence can’t accurately predict all decisions of higher intelligence, a concept known as Vinge’s Principle . “Vinge's Principle implies that when an agent is designing another agent (or modifying its own code), it needs to approve the other agent's design without knowing the other agent's exact future actions.” .
Unpredictability is an intuitively familiar concept; we can usually predict the outcome of common physical processes without knowing specific behavior of particular atoms, just like we can typically predict overall behavior of the intelligent system without knowing specific intermediate steps. Rahwan and Cebrian observe that “… complex AI agents often exhibit inherent unpredictability: they demonstrate emergent behaviors that are impossible to predict with precision—even by their own programmers. These behaviors manifest themselves only through interaction with the world and with other agents in the environment. … In fact, Alan Turing and Alonzo Church showed the fundamental impossibility of ensuring an algorithm fulfills certain properties without actually running said algorithm. There are fundamental theoretical limits to our ability to verify that a particular piece of code will always satisfy desirable properties, unless we execute the code, and observe its behavior.” .
Others have arrived at similar conclusions.
“Given the inherent unpredictability of AI, it may not always be feasible to implement specific controls for every activity in which a bot engages.” .
“As computer programs become more intelligent and less transparent, not only are the harmful effects less predictable, but their decision-making process may also be unpredictable.” .
“The AI could become so complex that it results in errors and unpredictability, as the AI will be not able to predict its own behavior.” .
“… the behavior of [artificial intellects] will be so complex as to be unpredictable, and therefore potentially threatening to human beings.” .
We can conclude that Unpredictability of AI will forever make 100% safe AI an impossibility, but we can still strive for Safer AI, because we are able to make some predictions about AIs we design.
You can follow me on Twitter for more information on AI safety.
1. Yampolskiy, R.V., Artificial Intelligence Safety and Security. 2018: Chapman and Hall/CRC.
2. Callaghan, V., et al., Technological Singularity. 2017: Springer.
3. Baum, S.D., et al., Long-term trajectories of human civilization. foresight, 2019. 21(1): p. 53-83.
4. Duettmann, A., et al., Artificial General Intelligence: Coordination & Great Powers.
5. Charisi, V., et al., Towards Moral Autonomous Systems. arXiv preprint arXiv:1703.04741, 2017.
6. Brundage, M., et al., The malicious use of artificial intelligence: Forecasting, prevention, and mitigation. arXiv preprint arXiv:1802.07228, 2018.
7. Fisher, M., N. Lynch, and M. Peterson, Impossibility of Distributed Consensus with One Faulty Process. Journal of ACM, 1985. 32(2): p. 374-382.
8. Grossman, S.J. and J.E. Stiglitz, On the impossibility of informationally efficient markets. The American economic review, 1980. 70(3): p. 393-408.
9. Kleinberg, J.M. An impossibility theorem for clustering. in Advances in neural information processing systems. 2003.
10. Strawson, G., The impossibility of moral responsibility. Philosophical studies, 1994. 75(1): p. 5-24.
11. Bazerman, M.H., K.P. Morgan, and G.F. Loewenstein, The impossibility of auditor independence. Sloan Management Review, 1997. 38: p. 89-94.
12. List, C. and P. Pettit, Aggregating sets of judgments: An impossibility result. Economics & Philosophy, 2002. 18(1): p. 89-110.
13. Dufour, J.-M., Some impossibility theorems in econometrics with applications to structural and dynamic models. Econometrica: Journal of the Econometric Society, 1997: p. 1365-1387.
14. Yampolskiy, R.V., What are the ultimate limits to computational techniques: verifier theory and unverifiability. Physica Scripta, 2017. 92(9): p. 093001.
15. Armstrong, S. and S. Mindermann, Impossibility of deducing preferences and rationality from human policy. arXiv preprint arXiv:1712.05812, 2017.
16. Eckersley, P., Impossibility and Uncertainty Theorems in AI Value Alignment (or why your AGI should not have a utility function). arXiv preprint arXiv:1901.00064, 2018.
17. Yampolskiy, R.V. The space of possible mind designs. in International Conference on Artificial General Intelligence. 2015. Springer.
18. Rice, H.G., Classes of recursively enumerable sets and their decision problems. Transactions of the American Mathematical Society, 1953. 74(2): p. 358-366.
19. Wolfram, S., A new kind of science. Vol. 5. 2002: Wolfram media Champaign.
20. Moore, C., Unpredictability and undecidability in dynamical systems. Physical Review Letters, 1990. 64(20): p. 2354.
21. Moore, C., Generalized shifts: unpredictability and undecidability in dynamical systems. Nonlinearity, 1991. 4(2): p. 199.
22. Vinge, V. Technological singularity. in VISION-21 Symposium sponsored by NASA Lewis Research Center and the Ohio Aerospace Institute. 1993.
23. Cognitive Uncontainability, in Arbital. Retrieved May 19, 2019: Available at: .
24. Itti, L. and P. Baldi. A principled approach to detecting surprising events in video. in 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05). 2005. IEEE.
25. Itti, L. and P.F. Baldi. Bayesian surprise attracts human attention. in Advances in neural information processing systems. 2006. MIT Press.
26. Storck, J., S. Hochreiter, and J. Schmidhuber. Reinforcement driven information acquisition in non-deterministic environments. in Proceedings of the international conference on artificial neural networks, Paris. 1995. Citeseer.
27. Schmidhuber, J., Simple algorithmic theory of subjective beauty, novelty, surprise, interestingness, attention, curiosity, creativity, art, science, music, jokes. Journal of SICE, 2009. 48(1).
28. Yudkowsky, E., Expected Creative Surprises, in Less Wrong. October 24, 2008: .
29. Yudkowsky, E., Belief in Intelligence, in Less Wrong. October 25, 2008: Available at: .
30. Yudkowsky, E., Aiming at the Target, in Less Wrong. October 26, 2008: Available at: .
31. Vingean Uncertainty, in Arbital. Retrieved May 19, 2019: Available at: .
32. Campbell, M., A.J. Hoane Jr, and F.-h. Hsu, Deep blue. Artificial intelligence, 2002. 134(1-2): p. 57-83.
33. Ferrucci, D.A., Introduction to “this is watson”. IBM Journal of Research and Development, 2012. 56(3.4): p. 1: 1-1: 15.
34. Yudkowsky, E., Eliezer Yudkowsky on AlphaGo’s Wins, in Future of Life Institute. March 15, 2016: .
35. Silver, D., et al., A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play. Science, 2018. 362(6419): p. 1140-1144.
36. Pistono, F. and R.V. Yampolskiy, Unethical Research: How to Create a Malevolent Artificial Intelligence. arXiv preprint arXiv:1605.02817, 2016.
37. Yampolskiy, R.V., What to Do with the Singularity Paradox?, in Philosophy and Theory of Artificial Intelligence. 2013, Springer Berlin Heidelberg. p. 397-413.
38. Babcock, J., J. Kramar, and R. Yampolskiy, The AGI Containment Problem, in The Ninth Conference on Artificial General Intelligence (AGI2015). July 16-19, 2016: NYC, USA.
39. Majot, A.M. and R.V. Yampolskiy. AI safety engineering through introduction of self-reference into felicific calculus via artificial pain and pleasure. in IEEE International Symposium on Ethics in Science, Technology and Engineering. May 23-24, 2014. Chicago, IL: IEEE.
40. Vinge's Principle, in Arbital. Retrieved May 19, 2019: Available at: .
41. Vingean Reflection, in Aribital. Retrieved May 19, 2019: Available at: .
42. Rahwan, I. and M. Cebrian, Machine Behavior Needs to Be an Academic Discipline, in Nautilus. March 29, 2018: Available at: .
43. Mokhtarian, E., The Bot Legal Code: Developing a Legally Compliant Artificial Intelligence. Vanderbilt Journal of Entertainment & Techology Law, 2018. 21: p. 145.
44. Bathaee, Y., The artificial intelligence black box and the failure of intent and causation. Harvard Journal of Law & Technology, 2018. 31(2): p. 889.
45. Turchin, A. and D. Denkenberger, Classification of global catastrophic risks connected with artificial intelligence. AI & SOCIETY, 2018: p. 1-17.
46. De Garis, H., The Artilect War. Available at , 2008.