New reports on the progress of computer vision highlight some of the abstractions and deeper questions about how well we can train computers to “see.”
Reports today at The Verge go over what writer James Vincent calls the ‘hammer on a bed’ problem, showing examples from an ObjectNet data training set, to illustrate how easy it is to fool computers when it comes to object recognition.
Here’s some of the context – over the past few years, engineers have been making rapid progress with convolutional neural networks – neural networks with multiple processing layers that take particular algorithmic approaches to scanning an image.
Some of the notable past successes include feature-rich images – for example, computers have learned to present us with shots of Chihuahuas and blueberry muffins, both of which contain dark, bulbous orbs, whether they’re eyes or berries.
What Vincent’s reporting on MIT scientist AI vision efforts shows us is that there’s a limit to what computers can intuitively see based on even the most cutting-edge algorithms.
As Vincent points out, chairs and other objects treated in this report don’t appear in training data.
“These systems have a limited understanding of how objects in the real world work,” Vincent writes. “AI systems can’t easily extrapolate from items they’ve seen before, to imagine how they might appear from different angles in different lighting and so on.”
You might also note that the kinds of objects cited as difficult are not feature-rich, but instead are based on common geometric shapes.
With that in mind, it might not be a surprise that computer algorithms would struggle to identify something like a hammer on a bed.
In fact, although Vincent suggests that humans would have an easy time recognizing the images, we all know how much something can look like something else, especially in low light or in a confusing colorful juxtaposition.
Meanwhile, Valerie Shchutskaya at Indata Labs suggests that deep learning networks are paving the way for future big advances in how computer systems can track objects, including in video, writing at KDNuggets:
Deep learning algorithms, on the other hand, learn about the task at hand through a network of neurons that map the task as a hierarchy of concepts. Each complex concept is defined by a series of simpler concepts. And all of this the algorithms can do by themselves. In the context of computer vision, this means identifying light and dark areas first, then categorizing lines, then shapes before moving towards a full picture recognition.
Deep learning algorithms also perform better when you give them more data, which is not typical of machine learning algorithms.
Stay tuned for more on VR/AR, image processing and computer vision, and how these are due to change industries.