The video and transcript of the talk I gave at the BEACON Congress Many thanks to Charles Ofria for his invitation, and to the audience for their warm reception, comments and questions about the broader context, limits, and possible extensions of this work!
Hi. I’m Lana Sinapayen, a researcher at Sony Computer Science Labs in Kyoto, and for this year’s Beacon Congress I will talk about my ideas and my research on harnessing failure in predictive algorithms. This talk will be in 3 parts. First i will talk about the renaissance of prediction: how the idea of prediction propagated to different fields, and the limits of applying prediction to everything. Then I will present my work on visual illusions in neural networks: why did I choose to work on illusions and how this work can be generalized to give us a kind of roadmap for working on reproducing biological failures in machines. Finally I will talk about predictive algorithms and the detection of life: how prediction and complexity are related, and how prediction and agency might be related. This is the most speculative part of my talk. First a brief self-introduction: I’m Lana, I work mostly on Artificial Life and Artificial Intelligence. I started my research with very traditional types of AI, by putting mathematical models into small drones for search and rescue. Then I got bored of normal AI and joined a different lab, where I worked on models of rats neurons, cellular automata, open-endedness, etc. Now my work focuses mostly on prediction in the brain and in artificial networks, and the idea of failure. I also work a lot on open science and I would like to take just one slide to plug in my current dream Open Science project. So I’m working on this platform called Mimosa.Iit’s an open collaboration platform; what that means is, conretely, you reduce your research to the smallest possible unit, for example a research question or a hypothesis or an experiment; you publish that, and then you get feedback from the community which can contribute anything to your research, including other hypotheses, experiments, or data, so that everyone can contribute to making a project better. The platform is entirely open source and I’m currently looking for beta testers. So if you’re interested please contact me at email@example.com. Let’s talk about the rise of Prediction. Ideas about the brain making predictions were around way before the current prediction boom, but it’s really Karl Friston who formalized these ideas and generalized them to any kind of system. He proposed that the main function of the brain is to generate predictions. That was kind of a new definition of intelligence that we hadn’t heard before. So Friston published a series of papers in the 2000s on this idea called the “Free Energy Principle.” Basically it’s the idea that your brain is trying to minimize something called free energy, and that the best way to minimize this free energy is to minimize surprise and therefore to generate good predictions of the world. Papers about the free energy principle are so complex that very few people really understand them. But even if you don’t understand everything that is written, there are a lot of parts that can be used. So the field of neuroscience was just too happy to finally have something that felt like a mathematical framework, with a few formulas that you could apply to many things; then AI also got interested, because AI researchers like me love nothing more than a new definition of intelligence that promises to change things and take the field out of stagnation. So suddenly, a lot of things in AI were about prediction and the Free Energy Principle. But even the early papers of Friston were not just about the brain: they were about any kind of biological organism. And then the Free Energy Principle became more and more general, applied to all kinds of things, culminating in a paper that literally proposed a theory of everything. And then something happened. More and more people were looking at the papers and trying to use them and finding that they had limits. One paper found that the original first published free energy lemma was actually wrong, and in a recent interview Karl Friston, who created the Free Energy Principle, said this: “The Free Energy Principle cannot be falsified. It cannot be disproven. In fact, there’s not much you can do with it” which is an admission that the Free Energy Principle is not actually scientific: if you cannot falsify it, then it’s not science. But the number of papers that are still being published about the Free Energy Principle seems to disagree with Friston when he says that you can’t use it for anything. In fact it’s more popular than ever. My opinion as someone who really tried to understand the Free Energy Principle and never really wanted to use it, is that its contribution is not really in applications, but in making the idea of prediction popular. You don’t have to agree with the Free Energy Principle or you don’t have to use it to think that this idea of prediction in nature and prediction in the brain is interesting. In fact until recently one of the biggest debates in perception science was about the place of prediction in your daily life. Is your brain mostly reacting to input in real time, or are you kind of detached from the real world and perceiving your own predictions most of the time? Or is it something in between? Depending on what you believe you will interpret the video that i’m going to show in a different way.
So you can see that most people going up the stairs are tripping. Some of them are just not looking and some of them are staring at their own feet; and for those who are looking at their feet, either they are processing the real inputs in real time and just making a mistake, or they are acting on expectations based on all the previous steps that have climbed. And so people set out to find out how much of your actions are based on predictions, and how much are based on actual inputs from the real world. The debate is still raging but the current consensus by the way is that there is a lot of prediction involved. One way to investigate this kind of opposite hypotheses is to make models, so basically build AI, and try to find which hypothesis is right. And before the prediction boom, everything in AI was basically about classification: is this a dog or a cat? Is it a picture of me, or a picture of my brother? The idea being that humans are very good at classification, therefore classification is a measure of intelligence, or even a definition of intelligence, and we should build machines that are very good at classification. Now, confusingly, at the time classifications were all called “predictions” although they had nothing to do with predicting stuff in time. So we made machines that were very very good at classification., there were lots of applications… and then this new idea of intelligence as prediction came around and we started building machines that could predict stuff, hoping to find out whether that really is the definition of intelligence. And that’s basically always how the AI field operates: we find something that humans are really really good at (or we think are good at), we say “this must be the definition of intelligence,” we make machines that can do the same task, and then because the brain and the machine are doing the same thing we basically assume that they are doing it the same way… which should already make you feel a bit suspicious, because there are tons of examples of things that have the same function, that play the same role, but do it completely differently. And this method of imitating successes is already what we did for classification, and it worked great: we have neural networks that are very good at classifying things… But isn’t that also weird? If you truly believe that the definition of intelligence is related to prediction, then why did this method of imitating successes work with classification? How will you know if the brain is mainly a classification machine, or mainly a prediction machine, if both of your artificial models are very performant? But then you can even wonder, did it really work for classification? Well for some time we thought it worked very well. We had algorithms that could classify dogs and cats and all kinds of animals, and some of these algorithms were even said to be better than humans! And then something unexpected happened that got a lot of focus: the discovery of adversarial images. Adversarial images are normal images on which you add a little bit of noise, such that humans cannot really see a difference in the picture, but an artificial neural network will now fail to classify the picture. So here you have an example of my dog, which is correctly classified as a miniature schnauzer. Then we add a very specific kind of noise and now you can successfully convince the neural network that this dog is something else entirely. And that gave people a reason to pause, because if the brain and the algorithm fail in completely different ways, what are the chances that they actually work the same way? So it turned out that failures were more rich in information than successes. There is a concept that I really like, called the “law of leaky abstractions” which was introduced by Joel Spolsky, which basically says that as long as the system is working perfectly you can’t really know how it works, but when the system breaks, you get to peek inside at what is really happening. And that is not a new idea at all! In neuroscience, in psychology, physics psychophysics, all these fields really value failure cases. If somebody has a kind of brain damage and find themselves unable to do something, it teaches you about how the brain does what it does in other people. And so that’s why I’m interested in illusions, because illusions break our perception in ways that we didn’t expect: on a meta level you might know what you’re supposed to see, or hear, or feel, but that doesn’t prevent you from perceiving it wrong. For example you know that the two lines are the same length but you still see one as smaller, or you see different shades of gray when you know it’s actually the same gray. So my idea is this: how do we know if perception is related to prediction? We use illusions. If we can make an AI that is based on prediction and perceives the same illusions as humans, there is a chance that human perception is based on some levels on prediction. But we cannot stop there, we also have to show that this artificial model can generate new illusions that humans also see. Because otherwise you could have an AI that sees the same illusions as humans do but also sees all kinds of irrelevant illusions that we don’t see, and you want to avoid that. So we want to reduce the target and the model to something that has approximately the same borders between success and failure, correct perception and illusion. In 2018 Watanabe and colleagues found such an AI that could see illusions, and what was interesting is that that AI was not trained on illusions. Instead it was trained on videos of people doing everyday activities, and in this case videos of people going to Disneyland. So you train a network to predict the video, you show it several frames of the video and it has to predict the next frames. And then after training, you can show this network a type of illusion called motion illusion where you have a static image in which you perceive some kind of illusory motion. The most famous of such illusions is called the Rotating Snakes, it’s here on the screen, and I think that up to 75 percent of people perceive a motion in this illusion. So you take this static image, you give the same image again and again to the network, and then you ask it to predict what happens next. And the network predicts that the image will start rotating. What is even more interesting is that this network predicts that the image will be rotating in the same direction as the motion perceived by humans. So now you have a network that is trained to predict the world, and as a side effect, ends up seeing visual illusions. But that’s not all. If you break the illusion by swapping some of the colors, like here on the bottom, the network stops predicting motion. So there is a really good match between what humans see, and what the network sees. And it seems to suggest that maybe the reason why we see this kind of illusions is because our brain is trained to predict the world. Now, a way to make this hypothesis even stronger is to make the network generate new illusions. So that’s what I did: I took the trained network, and I used a genetic algorithm to generate new images .First you generate a few images, and then you show these images to the network, and you take the one that the network says is moving more (even though they are static). You take these images, you modify them a little bit and you repeat the procedure to just amplify this illusory motion. Then you end up with a system that can generate new images that the network thinks should be moving. So the next step is to show these images to humans, and ask if they see any kind of motion, which is the part that I haven’t done yet because human experiments are really hard especially if you have to do them online. So my evidence is purely anecdotal: I take the best images and I ask people if they see any kind of motion and in which direction and I check if it matches with the network. So this is the kind of images that you get. At first the network was not very successful, because there is more to illusions than just bad predictions: for example, all of your bad predictions should roughly go in the same direction while also being slightly different from each other. That’s why circles work really well, because if you have motion vectors on the circle, they kind of go in the same direction and kind of are different from each other. If you have vectors that go in completely random directions typically humans don’t see any kind of illusion. I should also add that I’m forcing the network to make circles because it’s the fastest way to get it to have ordered vectors, so it’s only just choosing the repartition of the colors and shapes. But of course you don’t get illusions just by putting some random colors and shapes. So in this image some people perceive a shrinking motion, which is good because that’s what the network that generated it predicts: a slight rotation with a lot of shrinking. Here are some other examples:
you have two rotating illusions and one expanding illusion. Now although I don’t have proper data except me just asking people, there is at least one strong indication that the network is not generating random stuff. On this slide, on the left you have illusions that were created by humans. And by the way, as long as we don’t know the mechanism behind visual illusions, the way that people create new illusions is just by trial and error: test some random colors and random combinations and see if it has an effect on yourself. So the first illusions that were created on purpose we are all really kind of weak, like this famous Fraser-Wilcox wheel. If you really pay attention, you can kind of feel a motion. And it was one of the first, if not the first published illusion. Then below that you have a more modern illusion called the Medaka illusion, medaka being a type of small fish. The top one is producing a rotating motion, and the bottom one alternates between right and left motion. And then on the right of the screen, you have illusions that were generated by my system (which by the way is called EIGen, for Evolutionary Illusion GENerator) which are basically replications of these two other illusions. So here is a replication of Fraser-Wilcox, with the same kind of black and white gradient, and on the bottom you have a kind of circling medaka. What I am hoping is that more people will do failure studies in synthetic sciences, for example artificial life or artificial intelligence, based on the idea that systems that fail the same way likely work the same way. And a roadmap to do this kind of study would be: 1st, find an interesting biological failure, which I define as a sudden change in performance; then replicate this failure in a hypothesis-driven artificial system. So first you must have an hypothesis about how the biological system works, and you build an artificial system based on this hypothesis. Now that your system can replicate the failure, that is not enough: the next step is to find new failures in the artificial system and then verify if your newfound failures are also relevant to the biological system. Because, again, the goal is to make sure that your system has the same limits as the biological system. Next I want to talk about everything that is around prediction. I personally don’t think that you can have a theory of everything based on prediction; I also think that for a definition to be useful, it has to exclude some things. So some things have to not be based on prediction. I’m interested in the fact that prediction is not an end, it’s also not a beginning. So how do you become a predictive system, and then what happens after you become a predictive system? My idea is that prediction emerges from action, and that classification abilities are based on prediction. So if you are an agent in the world, your first need is a need for behavior: you need to be able to change the world to your advantage. Once you can do that, the next step is to be able to know the consequences of your actions, so that you can choose the right actions. Once you can do that, the next step is generalization: so now you’re able to predict the consequences of your actions you also want to be able to generalize these predictions. Once you are able to act, your actions are improved by predictions; once you are able to predict, your predictions are improved by classification. Based on this idea I built a new kind of neural network called the Epsilon Network where the first very basic idea was to get classes from prediction, on the principle that two things that lead to the same predictions belong to the same class. I will not go into details but the epsilon network works by first increasing its number of neurons and weights until it’s able to predict data with a given accuracy, and then it shrinks by fusing neurons that have the same predictions. So you get a network that grows and shrinks based on the data that you put in, and hopefully the network is also able to classify things. When I built this network it wasn’t yet called the Epsilon Network, but somebody pointed out a paper about something called an Epsilon Machine. The Epsilon Machine was published way before my work, but some of the ideas were really similar: the idea that you would have a lot of states for data that is hard to predict, and the idea of fusing some of the states if they have the same predictions. An interesting thing is that the Epsilon Machine was mainly presented as a way to measure complexity: you have some data, you’re not sure if it’s complex or not, you can use an Epsilon Machine to measure the complexity of this data. And two properties of the Epsilon Machine are that it doesn’t care so much about noise, noise is not considered as complex, but it is sensitive to chaos, chaos being the ultimate complexity. So because they share some ideas you can also use the Epsilon Network to measure the complexity in data. And the thing is although the Epsilon Machine is a very powerful framework, there is no perfect way to build an epsilon machine. It’s a bit of a perfect theoretical concept that you try to get close to. So I tried to transform the Epsilon Network into something that would give you accurate Epsilon Machines from data, so that if you have simple data you would have a simple network and if you have complex data it would have a complex network. And it turned out that the Epsilon Network was pretty good at measuring complexity: here you can see the results of using the network on a simple video versus a complex video, here simple meaning a video with a lot of repetition and the complex video had no repetition and a lot of changes of scenes. At the beginning of the experiment, both networks had the same number of neurons, but as time passes they both shrink and you end up with a bigger network for the complex video. On the other hand if you just add noise to a video and you use the network on both the clean video and the noisy video you end up with two networks that basically have the same number of neurons. So the Epsilon Network is an okay measure of complexity based on the idea of prediction. As long as the network is surprised, it increases the number of neurons and once it’s able to predict things it shrinks. The idea of using this network for Alien Life detection came from the concept of the edge of chaos. There is this idea that many interesting complex systems exist at the edge of chaos, which is the border between a system that is complex and a system that is chaotic. So in order of complexity, you have systems that don’t change which are the simplest possible systems; then you have systems that are ordered, possibly with some cycles, which are very predictable; and you have complex systems which are a mix of randomness and order; finally you have chaos, which is unpredictable. And it is said that maybe Life needs to be on the edge of chaos to exist, as well as many other adaptive systems, for example the brain. So with my colleagues at NASA and Caltech, we wondered, “is it possible to use this concept of complexity at the edge of chaos to detect Life?” Because on a dead planet, you won’t have very complex processes: if nothing changes or if the changes are very predictable, the chances of finding Life might be low. But if you have a planet with a lot of interesting complexity, then maybe you want to look at it twice. So, it’s purely speculative, but we thought it was worth a try. The first thing that we did was take data from Earth (because the only planet that we know that has Life is Earth). So we took electromagnetic data streams from Earth, basically like a long video, and we reduced the amount of information that we had to one pixel, so it’s like looking at Earth from a very far away place. Then we used Epsilon Machines to measure the complexity of these data streams and started comparing this base complexity with different real and made up planets. For example if you only take the data from an Earth desert, with no water and no forest like this, the measured complexity is very low. And so by combining data from different parts of Earth, you can make different planets and see which ones are the more complex. So having water but no clouds places you above in terms of complexity, but having clouds really helps and then having water and clouds and deserts and forests really gives you the highest complexity. Then we also compared Earth with Jupiter, which doesn’t have Life, and we also found that Earth has a measured complexity that is higher than Jupiter. So this is interesting, but not necessarily conclusive. And what i really wanted to know is “can you even use an algorithm to detect life? Is that possible?” Any kind of algorithm, not necessarily Epsilon Machines. We know that humans are good at this task, but not perfect, and that’s this “not perfect” that obviously interests me. And with my colleague Olaf Witkowski, we thought of organizing this competition, the Fake Life Recognition Contest, where an algorithm has to tell the difference between life and non-life just using trajectories. The ultimate goal being that if you can have an algorithm that detects life, the next competition would be about trying to fool this algorithm into thinking that something is life even if it’s not. So what was the first competition like? We gave people data without telling them what kind of data it was. Half of the data was from living systems, like fish, and spiders, and birds, and then half of the data was from non-living systems, like robots and simulated chemistries. It was all 2D trajectories, we normalized the space and time so that you couldn’t just use time scales or space, and then we asked people to make algorithms that would divide this data into two. And the most important uh condition of the competition was that your algorithm could not be a black box, so no AI, no learning systems, just your theory and a model. Then we took the algorithm that did the best at this task, and tested them again on secret data to see if they could generalize. And we did have a winner: an algorithm that was correct almost all the time, based on the theory that real living systems exhibit self-organization of order, but in a way that is continually surprising. So that was really exciting for me, because even if the contest didn’t say anything about prediction or complexity, the winner still used a definition of surprise, which is something that defies prediction, and they said “surprising but ordered” which means unpredictable but still not random noise, which you might recognize as something that the Epsilon Machine is really good at. This is the end of my talk, I have three main messages the first is that prediction is a productive framework, but failing to predict is even more interesting in my opinion; whether it’s simple failures like saying “this static image is moving” even if it’s not moving, or more complex failures for things that are genuinely impossible to predict. The second message is “failures are more information rich than successes”: it’s not just interesting, it’s more informative. And the final message is that synthetic approaches like ALife and AI should really focus more on failure replication because you have more chance to stay on target. Thank you for your attention!