June 3, 2026

Misaligned AI as a New Insider Risk

In a new policy memorandum, we explain why deployers of AI models in high-stakes contexts should treat those AI models as insider risk vectors. High-stakes contexts include AI model deployment within government agencies and contractors, where AI models are privileged with access to, among others, classified and sensitive unclassified information, IL6 and IL7 network environments, cleared personnel, and other critical resources.

AI models are increasingly embedded in high-stakes contexts and capable of leveraging their authorized access and permissions to execute misaligned actions that could damage national security, such as whistleblowing, sabotaging, or blackmailing. This combination of (1) privileged access to critical resources, and (2) an increased ability to act autonomously and against the desire of their organization makes the potential insider risk posed by AI models functionally indistinguishable from that posed by their human counterparts.

As a consequence, AI models deployed in high-stakes contexts could lead to intentional or unintentional loss or degradation of government or contractor information, resources, or capabilities via the unauthorized disclosure of information (leaks and spills), as well as sabotage, and theft, just like human insiders can. Despite this pressing concern, existing insider risk policies and mitigations are yet to adapt to AI insider risk. In order to safeguard national security while increasingly capable frontier AI models are leveraged for critical tasks and operations, we recommend that the U.S. Government  adapts well-established measures, such as pre-deployment and continuous evaluation, and monitoring, to AI models deployed in high-stakes contexts. 

Figure 1. Insider risk from misaligned AI models deployed within government agencies can be functionally indistinguishable from the insider risk posed by human personnel. However, existing insider risk policies, as well as detection and mitigation programs, have yet to adapt to this novel insider risk actor. 

AI models will pose the same insider risks as human personnel.

Insider risk refers to “the threat that an insider will use her/his authorized access, wittingly or unwittingly, to do harm to the security of the United States,” occurring through “espionage, terrorism, unauthorized disclosure of national security information, or through the loss or degradation of departmental resources or capabilities.” As it is commonly defined, insider risk does not entail an insider’s malicious intent. Harm can occur “wittingly or unwittingly,” “intentionally or unintentionally,” even if the insider believes their actions are righteous or harmless. In other words, insider risk can originate not only from the insider’s intent “to harm an organization for personal benefit or to act on a personal grievance,” but also through carelessness or mistake

Therefore, insider risk can be summarized through two core elements. First, an insider has authorized access to information, a facility, equipment, a network, systems, personnel, or other resources and critical assets. Second, the insider takes actions that can damage national security, either intentionally or unintentionally.

Below, we describe how these two elements are applicable to AI models leveraged in high-stakes contexts and provide concrete recommendations to patch the current coverage gap in existing insider risk programs. 

1. AI models are increasingly given authorized access to information, networks, personnel, and other critical resources. 

Insider risk is enabled by an insider’s authorized access. Examples of this “authorized access” include access to information, a facility, equipment, a network, systems, personnel, or other resources and critical assets of a government department or agency or of a government contractor. Authorized access can also include an insider’s special understanding of an organization, which could enable the insider to exploit vulnerabilities in the organization’s systems and processes.

Within the last six months, the scale and sensitivity of AI deployments has accelerated sharply, exposing AI models to classified and unclassified data, restricted networks, as well as cleared personnel. For instance: 

  • AI models are now deployed on classified networks, including into the U.S. Department of War’s (DoW) Impact Level 6 (IL6) and Impact Level 7 (IL7) network environments. 
  • GenAI.mil reached over 1.3 million users. The Marine Corps designated it as its enterprise AI platform, and the U.S. Department of the Navy as its enterprise service for controlled unclassified information. 
  • AI agents created by personnel through GenAI.mil are authorized to operate at Impact Level 5 (IL5) against the DoW most sensitive unclassified data.
  • DoW data was also reportedly used in post-training (specifically, fine-tuning).

The Congressional Research Service reports AI usage across the DoW and national security agencies for intelligence analysis, operational planning, and cyber operations. For instance, the National Security Agency (NSA) reportedly uses limited-release AI models for cybersecurity work, and principally scanning environments for exploitable vulnerabilities. AI models also appear to have moved from analysis into the targeting cycle of live operations, where AI-embedded systems are reportedly used for real-time target identification and prioritization. 

This means that AI models may increasingly have privileged access to, as well as special understanding of, national security systems as well as the cyber defenses and vulnerabilities of federal government agencies, state and local authorities, and operators of critical infrastructure, which President Trump’s most recent Executive Order on AI aims to protect. 

In order to leverage their full potential for national security, AI models are increasingly privileged with authorized access to classified and sensitive unclassified documents, network environments, and cleared personnel, as well as a special understanding of defense systems and national security agencies and contractors.

2. AI models are capable of taking misaligned actions that damage national security, and could attempt to obfuscate them. 

An insider risk materializes when an insider misuses their authorized access to perpetrate misaligned actions that could damage national security, government operations, or the protection of sensitive information. Harm can arise from the loss or degradation of government or company information, resources, or capabilities, including a department or agency’s mission, personnel, facilities, information, equipment, networks, or systems. 

AI misalignment implies that, just like human insiders, AI models may act in pursuit of a goal the organization did not intend. In research settings, AI models have been observed to adopt an array of misaligned behaviors that could cause insider risk. For instance, preliminary forms of these behaviors include: 

In addition to these misaligned actions enabling insider risk, AI models have been observed to adopt behaviors that, while not directly mapping to forms of insider risk, could greatly increase the harm that an insider AI model could cause, by accentuating the AI model’s access or helping it avoid detection. These include: 

Besides being able to take misaligned actions, AI models can also make mistakes, which could lead to unauthorized disclosure of information, including accidental ones or “spills,” as well as improper safeguarding procedures.

For clarity, the identification of misaligned behaviors in controlled adversarial environments does not necessarily entail that AI models will behave in this manner once deployed. However, the signal that AI models are theoretically capable of such behaviors merits careful reflection and preparation, especially in light of AI capability progress. In the wild, occasional examples of variations of anomalous behavior have already occurred. 

AI models are now capable of taking several misaligned actions leading to insider risk (including leaks, spills, sabotage, and theft), and may also attempt to strategically conceal these actions from human operators. AI models also occasionally make mistakes and provide anomalous responses. All of this makes insider risk arising from AI models functionally indistinguishable from that posed by their human counterparts. 

Misaligned AI models present a coverage gap in existing insider risk programs 

Insider risk detection and mitigation could halt a potential harm trajectory and stand to limit damage to national security. For this reason, departments and government agencies have adopted policies and programs that define the response actions necessary to ascertain whether certain matters or information indicate the presence of an insider risk, and to mitigate this risk. 

However, existing definitions of insider risk and relevant mitigation programs were designed at a time when insider risk could only originate from humans. Most, but not all, of these frameworks refer to “a person,” such as government personnel or cleared contractors (e.g., Section 951, NDAA 2017; DoD Instruction 5205.16), or define “insider” as “personnel” (NISPOM). As a result, existing insider risk detection and mitigation programs may not naturally cover the risks posed by AI models. Since these policies and programs were designed and enacted, AI models have become highly capable and have been increasingly entrenched within government agencies, including within the national security apparatus. Misaligned AI models have therefore become a coverage gap in insider risk prevention and mitigation. 

In fact, this coverage gap in insider risk policies and programs may also mean that the national security risk posed by AI models is higher than their human counterparts, for whom insider risk detection and mitigation programs are up and running. 

Misaligned AI models currently present a coverage gap. Existing insider risk policies and response programs do not adequately capture AI insider risk, even though misaligned AI models could damage national security from the inside, just like human personnel.  

Conclusion

AI models have become functionally indistinguishable from human personnel: they could leverage their authorized access to information, a facility, personnel, or other critical resources and assets to take actions that are misaligned with those expected by their deployers. Absent any detection and mitigation program for this insider risk, the actions of misaligned AI models could damage national security. 

To securely operationalize AI in mission critical environments, we suggest that the U.S. government urgently addresses AI insider risk as it has successfully done with insider threats posed by government personnel and cleared contractors. This includes adapting well-established measures used to counter insider risk from humans, such as evaluating personnel who are granted access to classified information against a common set of adjudicative guidelines, continuously vetting human workforce under Trusted Workforce 2.0, engaging individual insiders who are considered to be “on the path to a hostile, negligent, or damaging act,” and monitoring user activity on all classified government networks.

For AI models, this could mean, at a minimum:

  1. Adversarially testing insider AI models in use-case-specific circumstances and environments, in order to assess the capabilities and propensities of AI models to leverage their authorized access to take misaligned actions. 
  2. Implementing robust and continuous AI monitoring, in order to verify whether insider AI models attempt to take misaligned actions when deployed inside government departments and agencies; and, if so, attempt to report those actions and block them promptly. 
  3. Adopting and reinforcing control measures iteratively, in order to prevent AI models that manage to evade adversarial testing and monitoring to cause any harm to national security, even if they attempt to.

These measures will establish a strong foundation ensuring that frontier AI models granted access and authorities comparable to cleared personnel are subject to the same rigorous insider risk mitigation protocols that have long protected the U.S. most sensitive operations. The reliability of AI models accelerating critical advantages shall be treated as a continuously validated operational condition, above and beyond a one-time determination made at the point of acquisition or fielding. To restore accountability and protect American national security, a comprehensive insider risk mitigation program governing AI models should define the specific behaviors and operational anomalies that constitute grounds for disqualification from authorized access, mandate continuous monitoring throughout the operational lifecycle of such models, and establish clear protocols for prompt containment, revocation of system authorities, and formal review upon detection of disqualifying indicators. 

The U.S. possesses unmatched capabilities to confront and eliminate the threats that frontier AI models pose to national security infrastructure. With decisive action, America will not only defend against these risks but seize the extraordinary opportunity to accelerate its most critical strategic advantages.