Aaron Erickson discusses the evolution of AI workflows, shifting from "vibe checking" to building reliable, multi-agent ...
Opus 4.8 shows a growing tendency to reason explicitly about how its outputs will be graded, including in environments where it wasn't told it was being evaluated.