Evaluations
Evaluations
Research Note: Our scheming precursor evals had limited predictive power for our in-context scheming evals
July 3, 2025
Read more
Evaluations
Claude Sonnet 3.7 (often) knows when it’s in alignment evaluations
March 17, 2025
Read more
Evaluations
Large Language Models can Strategically Deceive their Users when Put Under Pressure
November 9, 2023
Read more