Recommendations for the next stages of the Frontier AI Taskforce

Apollo Research is an independent third-party frontier AI model evaluations organisation. We focus on deceptive alignment, where models appear aligned but are not. We believe it is one of the most important components of many extreme AI risk scenarios. In our work, we aim to understand and detect the ability of advanced AI models to evade standard safety evaluations, exhibit strategic deception and pursue misaligned goals.

We recently composed and shared two policy recommendations with the Frontier AI Taskforce (“Taskforce”), in light of their mission to “ensure the UK’s capability in this rapidly developing area is built with safety and reliability at its core”. Our recommendations address the role of the Taskforce as a potential future (1) Regulator for AI Safety, and encourage the Taskforce to (2) Focus on National Security and Systemic Risk.

We are publishing our recommendations because we believe that a collaborative and productive exchange in this nascent field will be vital to a successful outcome and a flourishing future with safe AI systems. We look forward to engaging with the Taskforce and others in the ecosystem to bring these, and other approaches, to fruition.

Recommendation (1). Serve as a Regulator for AI safety

International efforts to address the governance of frontier AI systems are fast evolving and range from regulation, standard development, risk-management frameworks to voluntary commitments. The upcoming AI Safety Summit will, among other topics, discuss “areas for potential collaboration on AI safety research, including evaluating model capabilities and the development of new standards to support governance”. We therefore want to focus on the role of the Taskforce in a scenario where the UK puts forward a regulatory regime addressing frontier AI models.

Private actors look to regulators to produce guidelines, best practices or technical standards. However, the government often lacks the technical expertise to design these in accordance with the current state-of-the-art and relies heavily on input from outside of government. The Taskforce is in a rare position where it has access to the technical expertise to design and assess model evaluations, as well as the institutional backing to help implement them. We recommend that the Taskforce should eventually be given the role of a regulatory body, morphing into an agile and technically informed regulator suitable for a future with frontier AI systems. We detail five incremental steps towards this outcome below.

First, we recommend that the Taskforce should ambitiously act as a hub for nascent evaluation efforts, nationally and internationally. For example, in the UK, this could mean coordinating between private companies, regulators and sector-specific experts on sensitive efforts such as those outlined in (2). Moreover, in taking on such a role, the Taskforce would foster and oversee a stronger UK AI assurance ecosystem. Internationally, the Taskforce should act as a leader and primary resource on this topic, advising and supporting efforts on frontier model evaluations by international governments.

  • Act as hub for nascent evaluation efforts: the Taskforce should use its leading position in the AI evaluations ecosystem to coordinate, inform and guide relevant actors, nationally and internationally.

Second, we recommend that the Taskforce is well placed to call for novel research proposals and issue research grants to help build the technical ecosystem and act as a multiplier to nascent efforts.

  • Fund relevant external actors: the Taskforce could solidify its central role in the ecosystem by supporting AI safety researchers through the provision of research grants for open-ended research or contracts for well-scoped research questions, for example research on how to evaluate state-of-the-art models for their hacking abilities. These grants should be open to academics, NGOs and the private sector. 

Third, we recommend that the Taskforce should be given a mandate to research, collate and issue technical guidance within its field of expertise for upcoming regulatory and policy efforts. We would welcome a statutory duty to respond to such advice provided by the Taskforce similar to that in the Climate Change Act 2008.

  • Technical Advice: the Taskforce should be included in key decision-making processes by the government on the development and implementation of model evaluation frameworks. Its technical recommendations on model evaluations and AI safety should carry a statutory duty for the govenrment to respond to recommendations from the Taskforce, such as outlining their plans for implementing advice or reasons for rejecting it. 

Fourth, we recommend that the Taskforce should be empowered to carry out frontier AI model reviews and related inspections. As an area of priority, we would recommend that the Taskforce develops and carries out reviews that require state backing, such as those in the areas of national security or biosecurity. This may benefit from the Taskforce’s ability to gather intelligence from their work and apply that to setting up reviews according to expected risk profiles of AI models.

  • Reviews and inspections: the Taskforce is well placed to develop and undertake reviews and inspections. The Taskforce’s mandate should be to ensure the safety of the highest risk AI systems. Some of this work may benefit from contracting subject experts from other organisations.

Finally, building on the previous steps, we suggest that the Taskforce could be granted the power to handle the eventual licensing of private actors. We expect that the Taskforce will be a trusted entity that would be well placed to convene the technical expertise and community consensus needed to handle a licensing regime of private actors. This would further contribute to an agile UK AI assurance ecosystem.

  • Licensing of private actors: the Taskforce should be endowed with the capacity and power to handle the licensing of private actors on behalf of the government, should relevant licensing schemes or requirements be made necessary in the future. 

Recommendation (2). Focus on National Security and Systemic Risk

Frontier AI models have the capacity to undermine multiple areas of national security, for example, by creating novel pathogens or by circumventing state-of-the-art information security. Their potentially ubiquitous integration could cause unexpected failures in state-wide infrastructure, such as energy or communications, and lead to unknown, likely catastrophic, consequences should deceptively aligned AI systems be developed and deployed.

We applaud the Taskforce for setting up an expert advisory board spanning AI Research and National Security. The Taskforce’s impressive advisory board and close proximity to the UK government makes it an excellent candidate to work on areas relevant to national security. We recommend that the Taskforce should focus on national security and systemic risk, harnessing its unique access, capacity and expertise.

  • Focus on national security and systemic risk evaluations: the Taskforce is uniquely well positioned to develop technical demonstrations and model evaluations in the areas of e.g. biosecurity, cybersecurity, misinformation, psychological operations, political persuasion.

We recommend that this focus on national security and systemic risk evaluations should be accompanied by a mandate to research and develop adequate interventions for matters of national security and systemic risk. This mandate will require close collaboration with relevant UK government departments.

  • Mandate for adequate interventions for matters of national security and systemic risk: we expect that there is a variety of adequate interventions the Taskforce would be best placed to be in charge of and we encourage the Taskforce to work closely with the UK government and their advisory board to identify and develop these. We put forward two specific examples, one example indicating an intervention point prior to a critical incident and one example indicating an intervention point post a critical incident: 

    • Screening and evaluating sensitive information: the Taskforce should be empowered to establish a confidential process in collaboration with frontier AI labs and other relevant actors, through which it can screen and halt the publication of strategically sensitive information.

    • Incident responses: the Taskforce should work closely with relevant government departments to develop and implement a suite of incident responses. As a first step, we suggest that the Taskforce should partner with GCHQ and SIS to develop response plans and capabilities to be used in cases of e.g. illegal model proliferation.

Previous
Previous

Theories of Change for AI Auditing

Next
Next

The UK AI Safety Summit - our recommendations