Long-run AI agents, part 1: The problem nobody talks about
In March 2025, a research organization called METR published a finding that got less attention than it deserved. They had been measuring something unfashionable: how long AI systems could work on tasks before they broke down. Not what they could do in a single interaction. METR wanted to know how long they could sustain coherent, useful effort.






