Research

Our Research
Governance

Capturing and Countering Threats to National Security: a Blueprint for an Agile AI Incident Regime

15/04/2025
Read more
Interpretability

Interpretability in Parameter Space: Minimizing Mechanistic Description Length with Attribution-based Parameter Decomposition

11/02/2025
Read more
Interpretability

Detecting Strategic Deception Using Linear Probes

06/02/2025
Read more
Governance

Precursory Capabilities: A Refinement to Pre-deployment Information Sharing and Tripwire Capabilities

06/02/2025
Read more
Evaluations

Frontier Models are Capable of In-Context Scheming

05/12/2024
Read more
Evaluations

Towards Safety Cases For AI Scheming

31/10/2024
Read more
Interpretability

Identifying functionally important features with end-to-end sparse dictionary learning

30/05/2024
Read more
Interpretability

The Local Interaction Basis: Identifying Computationally-Relevant and Sparsely Interacting Features in Neural Networks

30/05/2024
Read more
Evaluations

A Causal Framework for AI Regulation and Auditing

08/11/2023
Read more
Evaluations

Our research on strategic deception presented at the UK’s AI Safety Summit

05/11/2023
Read more