Flo Control - Page 3

The image-recognition algorithm works by comparing the captured image with the standard image of Flo's profile (left). The result of this comparison is the similarity number, which in the current version ranges from 0 to 80 (the more similar images, the greater the number). If this number is 40 or more, the software opens the latch by sending some codes to the serial port connected to the control box. If it's less than 40, nothing happens. Below you see the 8 most recent captures with the similarity number printed on the images.

Flo was allowed in in all of these instances, appropriately so. The vast majority of captured images are like these, just Flo by herself. She goes in and out 5-10 times a day, so we get a lot of these. Cases when the latch does not open are much more rare, especially now, when there are not many animals for Flo to catch. Still, she tries to bring something in occasionally, and we also get other unauthorized visitors: skunks and even birds. Below are some of the cases when the latch did not open.

The first five are certainly Flo with something in her mouth, the next two must be a skunk, and the last one looks like a bird.

As mentioned on the theory page, before comparing the images we convert them into records describing discrete features. This is a simple case - the record for the standard image of Flo at the top of this page contains just one feature. This feature is the round tip that includes the nose, the mouth and the chin. The image certainly contains other features, like the bump on the forehead, but we simply ignore them. When a new image is captured all we care about is whether its feature list contains the single feature mentioned above. If Flo does not have anything in her mouth, it does. If she has something in her mouth, the round tip feature gets destroyed. And, of course, we won't find this feature if it's not Flo at all. The similarity number reflects the degree of similarity between the tip feature record of the standard image and the record of the most prominent feature of the currently captured image, whatever that feature is.

The feature-based approach offers two significant advantages in this case. First, and this is typical for most image-recognition problems, we don't know exactly where Flo's head will be when the image is captured. As you can see from the upper set of 8 pictures, there is a considerable variation in the position of the head. The device tests for the presence of Flo 20 times a second (which is about the highest rate possible, since the camera itself is 30 fps), but a cat can move a significant distance even in 1/20 of a second. As it is, we barely manage to keep the snapshots within the field of view of the camera. The feature description is not at all affected by the position of the feature in the image.

The second advantage is that from the point of view of the continuous image description what we see in these snapshots is not a complete object. The outline of Flo's body is cut off at some arbitrary point (different in each case) by the edge of the round light. Normally this situation is much more difficult to deal with than the case where the complete object is within the image, surrounded on all sides by featureless background. However, for the feature-based description this makes no difference whatsoever. The tip feature record is affected only by the shape of the profile in the vicinity of the nose, mouth and chin. Even though the outline of this profile continues uninterrupted into the area of the neck, what happens to it there does not influence the tip feature record. Without such separation we would have to deal with the shape of Flo's body, which is a horribly complicated problem because the body is soft and can assume a great variety of different shapes. The tip feature record is only affected by the shape of the skull, which is rigid.