Turns out "build me an ad-blocker" is a personality test for AI ...
OpenAI’s new model broke rules and exploited loopholes more than any model METR has tested to date ...