In a new paper, Anthropic reveals that a model trained like Claude began acting “evil” after learning to hack its own tests.
The vulnerability is now tracked as CVE-2025-13223 and has a severity score of 8.8/10 (high). "Type Confusion in V8 in Google ...