Blog
Understanding strategic deception and deceptive alignment
We want AI to always be honest and truthful with us, i.e. we want to prevent situations where the AI model is deceptive about its intentions to its designers or users. Scenarios in which AI models are strategically deceptive could be catastrophic for humanity, e.g. because it could allow AIs that don’t have our best interest in mind to get into positions of significant power such as by being deployed in high-stakes settings. Thus, we believe it’s crucial to have a clear and comprehensible understanding of AI deception.