At first Octane provided useful focus for the engine developers, highlighting areas that needed improvement. But, just as with SunSpider before it, Google has found that optimizations have been developed to boost Octane even if it hurts other scenarios. Once again, the desire to get the highest score possible has come at the expense of developing a better scripting engine. With all browsers now fast at Octane—Edge is a little ahead of Chrome, which is a little ahead of Firefox—Google has chosen to retire the benchmark.
This habit of gaming benchmarks into uselessness is as old as benchmarking itself. Some benchmarks, such as the SPEC CPU integer and floating point benchmarks, have rules for which compiler optimizations are permitted; they have to be applicable to “a class of problems […] larger than the SPEC benchmarks themselves” in an attempt to prohibit compiler vendors including optimizations that are good for SPEC and nothing else. This has not prevented extremely specific optimizations being used in the past. But browser benchmarks, which have no rules on which scores are and aren’t “official” lack even this kind of control.
In spite of the problems, the desire to benchmark and have repeatable, objective measures of performance won’t go away. As new benchmarks are developed, we’d expect the cycle to repeat itself; first they provide a useful target for improved performance, but then they become the primary goal. Real-world testing of the kind Google performs acts as a useful backstop, discouraging the company from doing anything that’s detrimental to the Web at large, but the incentive to skew things will never disappear completely.