An Overview Of Our Current Governance Efforts
The governance team at Apollo Research conducts technical governance research, develops tailored policy recommendations, and communicates our organisation’s learnings to key stakeholders across industry, civil society and governments.
Our goal is to ensure that future governance mechanisms concerned with loss of control, scheming*, and dangerous capability evaluations are functional, scientifically validated, and effective. We therefore focus on both the development of foundational research and the translation thereof into practical policy levers ready for implementation by industry and governments.
Our recent portfolio of workstreams included**:
1. Framing the need for a ‘governance of internal deployment’ from a governance, risk and legal perspective.
Our first workstream focuses on conducting research and raising awareness on a timely, neglected, and important area: the internal deployment of highly advanced AI systems. As part of this workstream, we published a landmark research report on the governance, or lack thereof, of internal deployment and usage of highly advanced AI systems.
- In our landmark report, ‘AI Behind Closed Doors: A Primer on The Governance of Internal Deployment’, we conceptualised and reflected on internal deployment, the relevant threat models, learnings from internal deployment in other safety-critical industries, the legal landscape, and effective solutions.
- In our policy memorandum, ‘Internal Deployment in the EU AI Act’, we zoomed in on the European Union (EU) Artificial Intelligence (AI) Act, and discussed arguments supporting the inclusion of internal deployment within the scope of the Act, as well as potential exceptions and objections.
- We engaged in public science communication on the topic, breaking it down into accessible mitigations, with a particular focus on transparency measures, in an op-ed in TIME with Turing Award recipient Prof. Yoshua Bengio titled ‘When it Comes to AI, What We Don’t Know Can Hurt Us‘.
We are in the process of building on this research and sharing results with a range of governmental stakeholders, civil society and industry.
2. Raising the bar on a better understanding of AI ‘loss of control, and potential mitigations.
Our second workstream focuses on conceptualizing, contextualizing and explaining “loss of control” (LoC) alongside putting forward actionable policy measures to mitigate the relevant threats. This workstream acts in tandem with our technical work at Apollo Research and takes a bird’s eye perspective to a threat model arising out of, among others, scheming and deception. Despite increasing policy and research attention to LoC, decision- and policymakers are still operating in the absence of a common understanding of LoC, comparable metrics and straightforward interventions that can be implemented despite scientific uncertainty.
- Our research report, ‘The Loss of Control Playbook: Degrees, Dynamics, and Preparedness’ addresses existing uncertainties surrounding LoC. The report aims to make LoC conceptually tractable and operationally useful, so that governments and organizations can start adequately preparing for national security and societal threats from advanced AI systems today. In doing so, it puts forward: (1) a novel taxonomy which allows targeted interventions; (2) a practical governance framework that focuses on mitigations that can be actioned today; (3) an analysis of the long-term dynamics that could lead society into a precarious “state of vulnerability” to LoC and the resulting consequences for societal resilience and national security.
3. Specifying and contextualising how dangerous capability evaluations can serve governance frameworks.
As part of our third workstream, we focus on researching, developing, and communicating best practices for dangerous capability evaluations, access levels, safe harbors, limitations of evaluations, incentives for a healthy evaluation ecosystem, and on connecting evaluation results to broader governance mechanisms.
This work has informed and continues to inform legislative frameworks on both sides of the Atlantic, as well as relevant governance endeavours across select agencies.
- ‘Pre-Deployment Information Sharing: A Zoning Taxonomy for Precursory Capabilities‘. In this paper, presented at UK AISI’s Conference on Frontier AI Safety Frameworks, we built on the Frontier AI Safety Commitments and explained how a zoning taxonomy of precursory capabilities (i.e., smaller preliminary components to high-impact capabilities) could provide select government actors—such as U.K. AISI and U.S. CAISI—with situational awareness on frontier AI capabilities, while preserving information security.
- ‘Capturing and Countering Threats to National Security: a Blueprint for an Agile AI Incident Regime‘. In this paper, we proposed a no-regrets blueprint for an AI incident regime that would allow nation-states to track and swiftly counter national security threats posed by AI systems, with each component of our proposal reflecting best practices in existing incident regimes in nuclear power, aviation, and biosafety.
- ‘Towards Frontier Safety Policies Plus‘. In this paper, presented at the inaugural International Association for Safe and Ethical AI Conference, we built on our earlier work and explained how precursory capabilities could also serve as more granular tripwires in Frontier Safety Policies (FSPs).
4. Safe and secure government procurement of frontier AI
Our fourth workstream concentrates on government procurement of frontier AI systems for high-stakes applications, and aims to strengthen the testing and evaluation of frontier AI systems in adherence with the U.S. AI Action Plan and the National Defense Authorization Act.
- ‘Guidelines to Implement the AI Action Plan and Strengthen the Testing & Evaluation of AI Model Reliability and Governability’. In this paper, we put forward a practical roadmap to strengthen the principles of AI model reliability and AI model governability as the Department of War (DoW), the Office of the Director for National Intelligence (ODNI), the National Institute of Standards and Technology (NIST), and the Center for AI Standards and Innovation (CAISI) refine AI assurance frameworks under the AI Action Plan. Our focus concerns the open scientific problem of misalignment and its implications on AI model behavior. Specifically, scheming capabilities stemming from misalignment can be understood as a red flag indicating an AI model’s insufficient reliability and governability. To address the national security threats arising from misalignment, we recommend that DoW and the Intelligence Community (IC) strategically leverage existing testing and evaluation (T&E) pipelines and their Other Transaction (OT) authority to future-proof the principles of AI model reliability and AI model governability through a suite of scheming and control evaluations.
- ‘Keep AI Testing Defense-Worthy’. In this op-ed for Lawfare, we explained the difference between AI capability and AI reliability, discussed how misalignment can become a national security threat, and suggested how existing testing and evaluation pipelines within the DoW and the IC could adapt and accelerate to mitigate the relevant threats without additional red tape.
Relatedly, in our blog post ‘Seven Provisions in the National Defense Authorization Act with High Potential to Accelerate AI Security’, we also reviewed and discussed select provisions within the NDAA that hit the bull’s eye on AI safety and security hazards, which, when occurring in high-stakes environments such as defense or intelligence agencies, could escalate into national security threats and LoC.
5. Bespoke science-communication, ‘demo’ development, and policy recommendations on AI scheming.
Our fifth workstream has informed decision makers about our organization’s core technical and governance research, and research results. As part of this workstream, we have developed multiple privately held demonstrations and specialized briefing materials, and provided verbal and written advice and briefings upon request, on topics such as evaluation-based policy frameworks, scheming and loss of control, internal deployment, government procurement, and incident monitoring frameworks.
Over the last few months, we have met and engaged with an array of international stakeholders representing multiple governments, government-adjacent bodies and multilateral organizations. This includes, for example, members of the E.U. AI Office; the U.S. Center for AI Standards and Innovation; the U.K. AI Security Institute; the Korean AI Safety Institute; Singapore’s Infocomm Media Development Authority; the French AI Safety Institute; officers from multiple agencies within the U.S. intelligence community; staffers at the Senate Subcommittee on Emerging Threats and Spending Oversight, the House Select Committee on the Chinese Communist Party, and the Senate Commerce Committee; the offices of Members of the U.S. Congress; the U.S. Department of State; and U.K. Parliamentarians and members of the House of Lords.
As part of our public facing engagements, we presented our work at events including the French AI Action Summit and the United Nations ITU AI for Good Summit, participated in the inaugural AI Safety Institute network meeting, and attended the Singapore Consensus on Global AI Safety Research.
Until the end of the year, we plan to go into more depth on some of the aforementioned workstreams and research areas, including through collaboration with other organisations in the field. This will support our future engagements and, where appropriate, result in bespoke policy advice, reports and other publications.
Please contact us if you are interested in learning more.
*We define scheming to describe an AI system secretly and systematically pursuing objectives that are not shared by the user or developer. You can learn more about our work on scheming here and here.
**You can find a previous update on our policy positions here.