We consider any image to be a collection of a finite number of discrete features. This is a novel approach to images – until now they were always thought of as continuous. Of course, when we talk about a discrete set of elements, we don’t mean pixels. Pixels represent discreteness which has nothing to do with the content of the image; it simply is a property of the display or image acquisition device. If the same object is photographed by two digital cameras with different resolutions, different orientations in space, etc., we get two very different pixel-based descriptions. It would be useless to try to compare them pixel by pixel – the pixel values are not going to match. To the human eye it is immediately obvious that the two photographs have the same content. Why? Because, we believe, our brain records an image not as an array of pixels, but as a discrete set of features – like the tip of the nose, a fork on a branch, an end of a stick, etc. The existence of these is not affected by the resolution of the camera or its orientation.
Embracing Digital Technology.
If we can fully describe an image as a discrete collection of features, we can easily solve the image recognition problem. Unlike pixel values, features have to match. If they do, we declare that the two images have the same content. Of course, if the first image contains a single face and the second one is a group photo with the same face and some others, only some of the features in the two descriptions are going to match. Still, something has to match.
Is it possible to determine if two images have the same content without describing them as discrete sets of features? Of course. A purely continuous approach relies on the fact that if two images have the same content, they are going to match pixel for pixel after one of them is translated, rotated, scaled, distorted and have its brightness and contrast adjusted in a certain specific way. Of course, you don’t know in advance the transformation (if any) that is going to make these two images identical. Finding this transformation is one of the tasks of the image recognition algorithm. A considerable number of such algorithms have been developed, mostly based on the Neural Networks approach.
Step Backs and Possible Flaws.
There is one problem, however, inherent in any such algorithm. Let’s suppose we want to develop a Web search engine based on image content. The user submits, let’s say, a picture of a face. The engine has to find all the images on the Web containing this same face. The problem is that a meaningful image contains hundreds of thousands of pixel values, and there are millions of images on the Web. No matter how efficient the image comparison algorithm is, it has to perform at least a few operations per pixel (realistically, more than just a few), so we are talking about many trillions of operations – every time somebody requests a search.
Let us compare this with text-based search engines. There are also millions of text strings, but each string is a small discrete and ordered a set of characters. If the first character does not match (and roughly speaking, in 25 cases out of 26, it does not), the whole string does not match. Thus on average it takes just a little more than one operation to compare two text strings. This is very manageable even with millions and hundreds of millions of text strings to compare to. Image-based Web search can only become a reality if we create for images a situation similar to the one that exists for text strings: that is, we convert pixel-based image descriptions into the feature-based descriptions mentioned above. The latter will be small and discrete, easy to compare. Of course, this conversion will require accessing all the images on the Web in their regular pixel-based form and running a complicated algorithm on them – and this, as mentioned above, is slow. However, this needs to be done only once, not every time somebody wants to do a search.
The same is true in any situation where an image has to be checked against a large database of images. Dealing with all these images on the pixel level every time a search is needed is horrendously inefficient.
How do we know that it is possible to describe an image as a discrete collection of features fully? That a function can be described by a finite set of parameters is not the sort of a statement that you can encounter in your calculus textbook. In fact, you will encounter exactly the opposite statement there. And yet in our everyday lives, we always describe images as finite sets of features. We perceive the letter ‘S’ as having two bends, while to a mathematician it is a curve – an infinite set of points – who’s curvature changes continuously along it. We can verbally describe the shape of an unknown object by enumerating its bumps, dents, and other such features. It will only require a finite number of words, and it will be useful when trying to describe this object to another person who has never seen it. Is it possible to create a mathematical algorithm that converts pixel-based description into such a description enumerating similar feature?
Next Step in Technological Innovation.
Having such advanced technology that could recognize the face pattern from a picture was not even imaginable twenty or even ten years ago. Let’s not forget that a single computer unit with the same specs of basic phone device was ten times larger in 1960’s than it was now. We’ve progressed so far regarding technological innovation in the past few years that we are close to hologram technology and much other technologies that was considered unimaginable before. Finding information online would be so much easier if we could perfect this image recognition system. Imagine the web as a bit social network that can tag you wherever you appear online. And with every system connected to another, cities, towns, and even countries would be connected like never before. Being within reach of this technology also implies that we may also see similar ideas appearing anytime soon. If an image trace to find a similar photo online is about to happen, why not do the same with smell. We all know what Shazam app is, using the same principle we may find a way to find similar aromas around us. No one knows exactly what the future brings for human civilization, but one thing is certain, our technology will improve exponentially year after year.