Blog

Marius Hobbhahn Marius Hobbhahn

We need a Science of Evals

In this post, we argue that if AI model evaluations (evals) want to have meaningful real-world impact, we need a “Science of Evals”, i.e. the field needs rigorous scientific processes that provide more confidence in evals methodology and results.

Read More
Chris Akin Chris Akin

A starter guide for Evals

This is a starter guide for model evaluations (evals). Our goal is to provide a general overview of what evals are, what skills are helpful for evaluators, potential career trajectories, and possible ways to start in the field of evals.

Read More
Chris Akin Chris Akin

Theories of Change for AI Auditing

In this post, we present a theory of change for how AI auditing could improve the safety of advanced AI systems. We describe what AI auditing organizations would do; why we expect this to be an important pathway to reducing catastrophic risk; and explore the limitations and potential failure modes of such auditing approaches.

Read More
Chris Akin Chris Akin

Recommendations for the next stages of the Frontier AI Taskforce

We recently composed and shared two policy recommendations with the Frontier AI Taskforce (“Taskforce”), in light of their mission to “ensure the UK’s capability in this rapidly developing area is built with safety and reliability at its core”. Our recommendations address the role of the Taskforce as a potential future (1) Regulator for AI Safety, and encourage the Taskforce to (2) Focus on National Security and Systemic Risk.

Read More
Chris Akin Chris Akin

The UK AI Safety Summit - our recommendations

Apollo Research is an independent third-party frontier AI model evaluations organisation. We focus on deceptive alignment, where models appear aligned but are not. We believe it is one of the most important components of many extreme AI risk scenarios. In our work, we aim to understand and detect the ability of advanced AI models to evade standard safety evaluations, exhibit strategic deception and pursue misaligned goals.

Read More
Chris Akin Chris Akin

Understanding strategic deception and deceptive alignment

We want AI to always be honest and truthful with us, i.e. we want to prevent situations where the AI model is deceptive about its intentions to its designers or users. Scenarios in which AI models are strategically deceptive could be catastrophic for humanity, e.g. because it could allow AIs that don’t have our best interest in mind to get into positions of significant power such as by being deployed in high-stakes settings. Thus, we believe it’s crucial to have a clear and comprehensible understanding of AI deception.

Read More