A Causal Framework for AI Regulation and Auditing

8 Nov

Read the full paper here.

Artificial intelligence (AI) systems are poised to become deeply integrated into society. If developed responsibly, AI has potential to benefit humanity immensely. However, it also poses a range of risks, including risks of catastrophic accidents. It is crucial that we develop oversight mechanisms that prevent harm. This article outlines a framework for evaluating and auditing AI to provide assurance of responsible development and deployment, focusing on catastrophic risks. We argue that responsible AI development requires comprehensive auditing that is proportional to AI systems’ capabilities and available affordances. This framework offers recommendations toward that goal and may be useful in the design of AI auditing and governance regimes.

Our main contributions are:

A causal framework: Our framework works backwards through the causal chain that leads to the effects that AI systems have on the world and discusses ways auditors may work toward assurances at each step in the chain.
Conceptual clarity: We develop several distinctions that are useful in describing the chain of causality. Conceptual clarity should lead to better governance.
Highlighting the importance of AI systems’ available affordances: We identify a key node in the causal chain - the affordances available to AI systems - which may be useful in designing regulation. The affordances available to AI systems are the environmental resources and opportunities for affecting the world that are available to it, e.g. whether it has access to the internet. These determine which capabilities the system can currently exercise. They can be constrained through guardrails, staged deployment, prompt filtering, safety requirements for open sourcing, and effective security. One of our key policy recommendations is that proposals to change the affordances available to an AI system should undergo auditing.

We outline the causal chain leading to AI systems’ effects on the world below, working backwards from the real world effects to inputs and determinants:

AI system behaviors - The set of actions or outputs that a system actually produces and the context in which they occur (for example, the type of prompt that elicits the behavior).
Available affordances - The environmental resources and opportunities for affecting the world that are available to an AI system.
Absolute capabilities and propensities - The full set of potential behaviors that an AI system can exhibit and its tendency to exhibit them.
Mechanistic structure of the AI system during and after training - The structure of the function that the AI system implements, comprising architecture, parameters, and inputs.
Learning - The processes by which AI systems develop mechanistic structures that are able to exhibit intelligent-seeming behavior.
Effective compute and training data content - The amount of compute used to train an AI system and the effectiveness of the algorithms used in training; and the content of the data used to train an AI system.
Security - Adequate information security, physical security, and response protocols.
Deployment design - The design decisions that determine how an AI system will be deployed, including who has access to what functions of the AI system and when they have access.
Training-experiment design - The design decisions that determine the procedure by which an AI system is trained.
Governance and institutions - The governance landscape in which AI training-experiment and security decisions are made, including of institutions, regulations, standards, and norms.

We identify and discuss five audit categories, each aiming to provide assurances on different determinants:

AI system evaluations

Security audits
Deployment audits
Training design audits
Governance audits.

We highlight key research directions that will be useful for designing an effective AI auditing regime. High priority research questions include interpretability; predictive models of capabilities and alignment; structured access; and potential barriers to transparency of AI labs to regulators.

Read the full paper here.

Chris Akin

A Causal Framework for AI Regulation and Auditing

Our main contributions are:

Identifying functionally important features with end-to-end sparse dictionary learning

Our research on strategic deception presented at the UK’s AI Safety Summit