An Overview Of Our Current Governance Efforts
The governance team at Apollo Research conducts technical governance research, develops tailored policy recommendations, and communicates our organisation’s learnings to key stakeholders across industry, civil society and governments.
Our goal is to ensure that future governance mechanisms concerned with loss of control, scheming*, and dangerous capability evaluations are functional, scientifically validated, and effective. We therefore focus on both the development of foundational research and the translation thereof into practical policy levers ready for implementation by industry and governments.
Our recent portfolio of workstreams included**:
1. Specifying and contextualising how dangerous capability evaluations can serve governance frameworks.
As part of this workstream, we focused on researching, developing and communicating on, e.g., best practices, access levels for dangerous capability evaluations, safe harbors, the limitations of evaluations, incentives for a healthy evaluation ecosystem, safe and secure AI procurement, and on connecting evaluation results to broader governance mechanisms.
This work has informed and continues to inform legislative frameworks on both sides of the Atlantic, as well as relevant governance endeavours across select agencies.
- ‘Pre-Deployment Information Sharing: A Zoning Taxonomy for Precursory Capabilities‘. In this paper, presented at UK AISI’s Conference on Frontier AI Safety Frameworks, we built on the Frontier AI Safety Commitments and explained how a zoning taxonomy of precursory capabilities (i.e., smaller preliminary components to high-impact capabilities) could provide select government actors—such as U.K. AISI and U.S. CAISI—with situational awareness on frontier AI capabilities, while preserving information security.
- ‘Capturing and Countering Threats to National Security: a Blueprint for an Agile AI Incident Regime‘. In this paper, we proposed a no-regrets blueprint for an AI incident regime that would allow nation-states to track and swiftly counter national security threats posed by AI systems, with each component of our proposal reflecting best practices in existing incident regimes in nuclear power, aviation, and biosafety.
- ‘Towards Frontier Safety Policies Plus‘. In this paper, presented at the inaugural International Association for Safe and Ethical AI Conference, we built on our earlier work and explained how precursory capabilities could also serve as more granular tripwires in Frontier Safety Policies (FSPs).
2. Bespoke science-communication, ‘demo’ development, and policy recommendations on AI scheming.
Our second workstream has informed decision makers about our organisation’s core technical and governance research, and research results. As part of this workstream, we have developed multiple privately held demonstrations and specialized briefing materials, and provided verbal and written advice and briefings upon request, on topics such as evaluation-based policy frameworks, scheming and loss of control, internal deployment, and incident monitoring frameworks.
Over the last few months, we have met and engaged with an array of international stakeholders representing multiple governments, government-adjacent bodies and multilateral organizations. This includes, for example, members of the E.U. AI Office; the U.S. Center for AI Standards and Innovation; the U.K. AI Security Institute; the Korean AI Safety Institute; Singapore’s Infocomm Media Development Authority; the French AI Safety Institute; officers from multiple agencies within the U.S. intelligence community; staffers at the Senate Subcommittee on Emerging Threats and Spending Oversight, the House Select Committee on the Chinese Communist Party, and the Senate Commerce Committee; the offices of Members of the U.S. Congress; the U.S. Department of State; and U.K. Parliamentarians and members of the House of Lords.
Our most recent public-facing memorandum can be read below and focuses on strengthening AI procurement and adherence to the US AI Action plan.
- ‘Guidelines to Implement the AI Action Plan and Strengthen the Testing & Evaluation of AI Model Reliability and Governability‘. In this paper, we put forward a practical roadmap to strengthen the principles of AI model reliability and AI model governability as the Department of War (DoW), the Office of the Director for National Intelligence (ODNI), the National Institute of Standards and Technology (NIST), and the Center for AI Standards and Innovation (CAISI) refine AI assurance frameworks under the AI Action Plan. Our focus concerns the open scientific problem of misalignment and its implications on AI model behavior. Specifically, scheming capabilities stemming from misalignment can be understood as a red flag indicating an AI model’s insufficient reliability and governability. To address the national security threats arising from misalignment, we recommend that DoW and the Intelligence Community (IC) strategically leverage existing testing and evaluation (T&E) pipelines and their Other Transaction (OT) authority to future-proof the principles of AI model reliability and AI model governability through a suite of scheming and control evaluations.
As part of our public facing engagements, we presented our work at events including the French AI Action Summit and the United Nations ITU AI for Good Summit, participated in the inaugural AI Safety Institute network meeting, and attended the Singapore Consensus on Global AI Safety Research.
3. Framing the need for a ‘governance of internal deployment’ from a governance, risk and legal perspective.
Our third workstream focused on conducting research and raising awareness on a timely, neglected and important area: the internal deployment of highly advanced AI systems. As part of this workstream, we published a landmark report on the governance, or lack thereof, of internal deployment and usage of highly advanced AI systems.
- In our landmark report, ‘AI Behind Closed Doors: A Primer on The Governance of Internal Deployment’, we conceptualised and reflected on internal deployment, the relevant threat models, learnings from internal deployment in other safety-critical industries, the legal landscape, and effective solutions.
- We engaged in public science communication on the topic, breaking it down into accessible mitigations, with a particular focus on transparency measures, in an op-ed in TIME with Turing Award recipient Prof. Yoshua Bengio titled ‘When it Comes to AI, What We Don’t Know Can Hurt Us‘.
We are in the process of building on this research and sharing results with a range of governmental stakeholders, civil society and industry.
4. Raising the bar on a better understanding of AI ‘loss of control, and potential mitigations.
Our fourth workstream focuses on conceptualizing, contextualizing and explaining “loss of control” (LoC) alongside putting forward actionable policy measures to mitigate its threat. This workstream acts in tandem with our technical work at Apollo Research and takes a bird’s eye perspective to a threat model arising out of, for example, scheming and deception. Despite increasing policy and research attention to LoC, decision- and policymakers are still operating in the absence of a common understanding of LoC, comparable metrics and straightforward interventions that can be implemented despite scientific uncertainty.
- Our research report, ‘The Loss of Control Playbook: Degrees, Dynamics, and Preparedness’ addresses existing uncertainties surrounding LoC. The report aims to make LoC conceptually tractable and operationally useful, so that governments and organizations can start adequately preparing for national security and societal threats from advanced AI systems today. In doing so, it puts forward: (1) a novel taxonomy which allows targeted interventions; (2) a practical governance framework that focuses on mitigations that can be actioned today; (3) an analysis of the long-term dynamics that could lead society into a precarious “state of vulnerability” to LoC and the resulting consequences for societal resilience and national security.
Until the end of the year, we plan to go into more depth on some of the aforementioned workstreams and research areas, including through collaboration with other organisations in the field. This will support our future engagements and, where appropriate, result in bespoke policy advice, reports and other publications.
Please contact us if you are interested in learning more.
* We define scheming to describe an AI system secretly and systematically pursuing objectives that are not shared by the user or developer. You can learn more about our work on scheming here and here.
** You can find a previous update on our policy positions here.