Mercor’s new benchmark is able to measure the capabilities of AI agents in professional tasks such as law and corporate analytics. At first, their scores were pretty disappointing, with every major model scoring less than 25 percent, so perhaps we could conclude that lawyers are safe from being replaced by AI, at least for now.
But AI capabilities can change drastically in just a few weeks. Entropic’s release of Opus 4.6 shook up the rankings, with Entropic’s new model scoring 30 percent in one-step tests and averaging 45 percent when given a few more chances to solve the problem. Notably, this version included a number of new agent-oriented features.
This score is a big jump compared to the previous situation and shows an improvement in the base models. “The jump in just a few months is really crazy,” said Mercor CEO Brendan Foddy, who was particularly impressed.
Of course, 100 percent is still a long way off, so lawyers needn’t worry about being replaced by machines next week. But they should know their confidence is much less than last month!
RCO NEWS



