Darius at Leela AI
The Startup
Published in
3 min readMay 1, 2020

--

Which Line Would You Pull To Catch The Fish? Our Deep Learning Networks Don’t Know.

You may have seen something similar to this image on a paper placemat at a restaurant. This is a simple puzzle intended for kids — using their fingers, young children can trace the line and “catch the fish”. Can a computer do this? — Absolutely. Most programmers will propose efficient algorithms to solve this simple puzzle. The real question is — can any Deep Learning software solve it? Deep Learning uses enormous amounts of data and artificial neurons to perform certain tasks as well as or better than humans. These programs are very effective in a multitude of fields, but are stumped by this image and similar puzzles that children can complete. How is it possible that these AI frameworks can correctly identify faces from a crowd, but cannot distinguish between the paths of fishing lines?

The issue lies in the architecture of contemporary machine vision software. These system use deep neural networks to recognize objects, trajectories, faces, and other patterns. Without a doubt, deep networks have become proficient at many tasks. Facebook uses facial recognition software based on these deep neural networks to help you tag your friends in posts. Maybe you find this helpful, maybe slightly unnerving, but most users are not concerned about the implications of this software. If Facebook confuses you for your sibling, it’s not a big deal. That being said, Deep Learning seems to be impressively accurate and more or less harmless. The stakes are a bit higher when you look at other, more complicated tasks being “solved” by DNNs (Deep Neural Networks). Is the current framework strong enough to safely drive a car in a crowded city? Can we trust it to identify suspected criminals in CCTV footage? As a culture, our level of trust in Deep Learning has been steadily increasing, but the general public still does not know about limitations of DNNs in computer vision.

DNNs are proficient in specific tasks, that much is obvious. We start to see problems when we use our multilayer neural networks in open-world applications. What are the implications of failure? A great example which has been widely discussed is the fatal accident involving a self-driving Tesla Model S in 2016. The car’s camera mistook a semi truck in front of the driver for the bright sky behind it, and drove directly into it. This accident was a result of the fragility of current DNN object recognition. This is not analogous to our fishing game, but the underlying issue remains — we need to develop new methods of image understanding. It appears we cannot skip the developmental phase of intelligent systems. What we are looking for is the evolution of visual common sense; something which is sorely lacking in our monolithic neural networks.

Let’s go back to the fishing game again. This problem is extremely difficult for DNNs because they process the world by analyzing features of objects in the scene without taking into account semantic relations (are two objects connected? Are they contained within each other? Are the similar? etc). These “common sense” questions require some kind of serial processing, which is not the way things are done with current deep network approaches. Basically, programs need to process elements of an image in a specific order, rather than analyzing everything at the same time (the latter is known as “parallel processing”). If we genuinely want AI we can trust, we need to move beyond the current multilayer perceptron networks (i.e “Deep Learning”). These children’s puzzles are only one example of dangerous blindspots in computer vision. Would you trust your safety in an autonomous vehicle which cannot perform a task that a two year-old could? We need a system that can can not only solve this fishing game, but any similar task, while applying this process to its understanding of the human world. It is a difficult challenge, but it is necessary for the future of AI.

If you would like to learn more about computer vision and progress in common sense AI, please read Henry Minsky’s Blog post at Leela AI.

https://www.leela.ai/ is developing new ways to move beyond multilayer neural networks, creating truly resilient intelligence. Sign up for our newsletter to receive more information as the technology develops.

--

--