For each of the 500 images in the Accuracy Evaluation part of the study every tag from the image recognition engines was evaluated on whether or not it was accurate. This was a basic "yes, no, or I'm not sure" decision (only 1.2% of tags were marked "not sure").
The distinction here is that a tag could be judged to be accurate, even if it was one that a human would not be likely to use in describing the image. For example, a picture of an outdoor scene might get tagged by the engine as "panorama," and be perfectly accurate, but still not be one of the tags a user would think of to describe the image.
With that in mind, here is the summary data with the overall score for each engine, across all of the tags they returned:
The clear winner here is Google Vision, with Amazon AWS Rekognition coming in second.
The above scores are across all tags returned by each engine. However, each engine also returns a score on the confidence level they have with each tag. This enables it to return tags that are quite a bit more speculative. Here is the data showing a summary of confidence level scores each engine provided across all engines:
It's interesting to look more closely at images that the engines feel they have a very high degree of confidence about. Here is a look at all the images where the engines have a 90% or higher confidence level:
What's fascinating about this data is that on a pure accuracy basis, three of the four engines (Amazon, Google, and Microsoft) scored higher than human tagging for tags with greater than 90% confidence.
Let's see how this varies when we take the confidence level down to 80% or higher:
At this level, we see that the scores for 'human hand tagged' is basically equivalent to what we see for Amazon AWS Rekognition, Google Vision, and Microsoft Azure Computer Vision.
One would expect that the tags that were given a low confidence level would be lower in accuracy, and that proves to be case:
For the next few charts we'll take a look at the accuracy by image recognition engine across many classes of confidence levels.
Amazon AWS Rekognition:
Microsoft Azure Computer Vision:
Across all engines we can see that they do significantly better with the tags that they have assigned higher confidence scores to.
2. How Well Do the Image Recognition Engines Match Up With What Humans Think?
The difference in the Matching Human Descriptions Evaluation is that we presented to users the top five highest-confidence tags provided by each engine for each image without telling them which image recognition engine they were from.
We then asked users to select and rank the top five tags that they felt best described the images. We did this across 2,000 total images. Unlike the prior data set, the focus here is on best matching what a human thinks. The goal of this evaluation was to see which of the engines came closest to doing that.
For the data, let's start with the average score by platform, in aggregate:
As you can see, the 'human hand-tagged' images score far higher than any of the engines. This is to be expected, as there is a clear difference between a tag being accurate, and a tag being what a human would use in describing an image.
The gap between the engines and humans here is quite large. That noted, the clear winner among the engines was Google Vision, but human hand tagged results were picked far more often than results from any of the engines.
In summary, humans can still see and explain what they are seeing to other humans better than machine APIs can. This is because of several factors, including language specificity and a greater contextual knowledge base The engines often focus on attributes which are not of great significance to humans, so while these are accurate, humans are more likely to describe what they feel makes the image unique.
Our next chart gives us the view of the score broken out by category of image types:
The breakout by categories is interesting. Once again, the human tags are by far the most on target in each category. Google Vision wins three of the four categories, with Amazon barely edging them out in the products category.