For decades, researchers have used benchmarks to measure progress in different areas of artificial intelligence (AI), such as vision and language. However, while benchmarks can help compare the performance of AI systems on specific problems, they are often taken out of context, sometimes to harmful results.
In a recent paper referenced in the article, scientists at the University of California, Berkley; the University of Washington; and Google outline the limits of popular AI benchmarks. According to the paper, "Progress on benchmarks is often used to make claims of progress toward general areas of intelligence, which is far beyond the tasks these benchmarks are designed for."