Comparative analysis of AI risk classification systems -- mapping how different jurisdictions, regulators, and industry frameworks define and assign safety levels to artificial intelligence systems
11 USPTO Trademark Applications | 143 Strategic Domains | 3 Regulatory Frameworks
Organizations navigating AI governance in 2025 confront a fragmented landscape of safety level systems. The EU AI Act defines four risk tiers with binding compliance obligations. The NIST AI Risk Management Framework organizes governance around four functional categories with flexible implementation guidance. China's Interim Measures for Generative AI Services impose a distinct regulatory classification structure. Individual AI developers maintain proprietary capability level systems with different tier counts, threshold definitions, and governance triggers. Sector-specific regulators -- the FDA for medical AI, the Federal Reserve for financial models, transport authorities for autonomous vehicles -- each apply their own risk categorization schemes.
For enterprises deploying AI across multiple jurisdictions, this proliferation creates a practical interoperability challenge. A single AI system may be classified as high-risk under the EU AI Act, assigned a specific risk score under an internal developer framework, subjected to sector-specific evaluation requirements in healthcare or finance, and evaluated against yet another classification scheme by a national AI safety institute. These classifications do not map cleanly onto one another. An AI system classified as "limited risk" under the EU Act is not necessarily low-risk under a developer's internal framework, and neither classification determines the system's regulatory status under sector-specific rules in healthcare, financial services, or transportation.
The proliferation of AI safety level systems is not an accident of poor coordination. It reflects genuinely different governance objectives. Statutory regulators classify AI systems by their potential impact on fundamental rights, safety, and democratic processes -- the EU AI Act's risk tiers map to societal harm categories. Developer frameworks classify models by their assessed technical capabilities -- the thresholds concern what the model can do, not how it will be used. Sector-specific regulators classify by domain-specific risk factors -- the FDA cares about clinical accuracy and patient safety in ways that a general-purpose AI classification cannot capture.
Each classification serves a legitimate and distinct governance purpose. The challenge is not eliminating this multiplicity but building infrastructure that enables organizations to navigate it coherently -- understanding how classifications relate to one another, where they overlap, and where compliance with one system does or does not satisfy obligations under another.
The EU AI Act (Regulation 2024/1689) establishes the most comprehensive statutory AI safety level system currently in force. Its four-tier classification -- unacceptable, high, limited, and minimal risk -- determines the compliance obligations applicable to each AI system based on its intended purpose and deployment context. The classification criteria focus on the system's potential impact on health, safety, and fundamental rights rather than on the system's raw technical capabilities.
High-risk classification under Annex III encompasses AI systems used in biometric identification, critical infrastructure, education, employment, essential services, law enforcement, migration management, and justice administration. Systems in these categories must satisfy mandatory requirements for risk management (Article 9), data governance (Article 10), technical documentation (Article 11), record-keeping (Article 12), transparency (Article 13), human oversight (Article 14), and accuracy, robustness, and cybersecurity (Article 15). The compliance burden is substantial and applies regardless of how the underlying AI technology is classified under developer frameworks or other regulatory regimes.
The Act's separate classification for general-purpose AI models introduces a capability-adjacent dimension. Models exceeding the systemic risk compute threshold of 10^25 floating point operations face enhanced obligations including adversarial testing, serious incident reporting, cybersecurity measures, and energy consumption documentation. This dual classification -- use-based risk tiers for deployed systems plus capability-based thresholds for foundation models -- creates a two-dimensional safety level matrix that organizations must navigate simultaneously.
China's regulatory approach to AI safety levels operates through multiple overlapping instruments. The Interim Measures for the Management of Generative Artificial Intelligence Services, effective August 2023, impose requirements scaled to the public-facing nature and influence potential of generative AI services. The Algorithmic Recommendation Regulations and Deep Synthesis Provisions add classification layers for specific AI application categories. Together, these instruments create a regulatory tier structure that classifies AI systems by their function, reach, and content generation capabilities rather than by a single unified risk metric.
The Chinese classification approach differs structurally from the EU model. Where the EU AI Act assigns risk levels based on predefined use-case categories, Chinese regulations emphasize security assessments conducted before service launch, with classification outcomes determined through regulatory review rather than self-assessment against published criteria. This procedural difference means that equivalent AI systems may receive different effective safety level classifications under EU and Chinese frameworks, complicating compliance for organizations operating in both jurisdictions.
The U.S. Food and Drug Administration classifies AI-enabled medical devices into three regulatory classes (I, II, and III) based on the level of risk they pose to patients and the degree of regulatory control necessary to provide reasonable assurance of safety and effectiveness. Class I devices (lowest risk) face general controls. Class II devices require special controls and typically reach market through the 510(k) premarket notification pathway. Class III devices (highest risk) require premarket approval with clinical evidence of safety and effectiveness.
As of early 2025, the FDA had authorized over 950 AI-enabled medical devices, the vast majority through Class II clearance. The agency's regulatory framework for AI introduces unique classification considerations not present in general-purpose AI governance: clinical validation requirements, predetermined change control plans for adaptive AI systems, and real-world performance monitoring obligations specific to patient safety contexts. These domain-specific safety levels coexist with but operate independently from general AI classification systems.
Financial regulators implement AI safety levels through model risk management frameworks that classify models by their materiality and complexity. The Federal Reserve's SR 11-7 guidance requires banks to maintain model inventories with risk tiering that determines the intensity of validation, monitoring, and governance applied to each model. Models generating material financial exposure or affecting large consumer populations receive higher risk classifications and correspondingly more rigorous oversight.
The European Central Bank's supervisory expectations establish similar tiered governance for AI models used in banking. The Bank of England's Prudential Regulation Authority has published guidance on model risk management that contemplates graduated oversight for AI and machine learning models. These financial regulatory tiers operate entirely independently from the EU AI Act classification, meaning a single AI model used in lending decisions may simultaneously be classified as high-risk under the AI Act, Tier 1 (material) under the bank's internal model risk framework, and subject to additional classification under national consumer protection regulations.
Leading AI developers maintain internal classification systems that assign safety levels based on assessed model capabilities rather than deployment context. Google DeepMind's Frontier Safety Framework defines Critical Capability Levels across risk domains including autonomous replication, cybersecurity, biosecurity, and machine learning research capability. Each domain has graduated thresholds, and models are evaluated against these thresholds through structured assessments. OpenAI's Preparedness Framework assigns risk levels from low through critical across tracked categories, with governance gates that constrain deployment at elevated risk levels.
Anthropic, Meta, Microsoft, and other frontier developers have published or implemented analogous capability-based classification structures. These frameworks share the common architecture of defining capability domains, establishing evaluation methodologies, and specifying governance requirements at each level. However, the specific domains evaluated, the threshold definitions, the evaluation methodologies, and the governance responses differ materially across organizations. A model classified at one level under one developer's framework has no guaranteed correspondence to any specific level in another developer's system.
International standards bodies are working toward classification frameworks that could bridge statutory and voluntary systems. ISO/IEC 42001:2023 establishes AI management system requirements that implicitly support tiered governance by requiring organizations to identify, assess, and treat AI-related risks proportionate to their significance. The standard does not prescribe specific safety level definitions but requires organizations to maintain risk assessment processes that produce classification-like outputs.
ISO/IEC 23894 provides more direct guidance on AI risk management, establishing processes for risk identification, analysis, evaluation, and treatment across the AI lifecycle. The IEEE Standards Association's work on AI ethics and governance includes frameworks for risk categorization that complement ISO efforts. These standards create common vocabulary and process infrastructure without mandating specific tier structures, potentially enabling organizations to implement consistent classification processes that map to multiple regulatory and voluntary frameworks.
National AI safety institutes represent an emerging layer of classification infrastructure. The UK AI Safety Institute conducts pre-deployment evaluations that produce structured capability assessments for frontier models. The US AI Safety Institute, housed within NIST, develops standardized evaluation methodologies and benchmarks. Japan, South Korea, Singapore, and Canada have established or announced analogous evaluation capabilities. France hosted the February 2025 AI Safety Summit where participating nations discussed harmonization of evaluation approaches.
These national evaluation bodies produce assessment outputs that function as de facto safety level assignments, even when they do not use formal tier terminology. When the UK AISI reports that a model demonstrates specific capability profiles across evaluated domains, that assessment influences both developer decisions and regulatory attention in ways analogous to formal classification. The growing network of national evaluators creates opportunities for mutual recognition arrangements that could reduce the fragmentation of AI safety level systems across jurisdictions.
Mapping between AI safety level systems reveals both structural similarities and fundamental incompatibilities. At the highest level of abstraction, all systems share the principle that more capable or higher-impact AI systems require more intensive governance. The specific operationalization of this principle, however, diverges significantly.
The EU AI Act classifies by intended use and societal impact. Developer frameworks classify by technical capability. The FDA classifies by clinical risk profile. Financial regulators classify by model materiality. These classification axes are orthogonal: a technically capable model may serve a minimal-risk use case under the EU Act while triggering high-tier classification under a developer's capability framework. A clinically simple AI tool may face intensive FDA requirements while receiving minimal-risk classification under general AI regulation. No universal mapping table can reconcile these fundamentally different classification logics into a single coherent tier structure.
Organizations operating across multiple classification regimes adopt pragmatic strategies. The most common approach is maximum harmonization: identify the most stringent classification received across all applicable frameworks and apply that level's governance requirements uniformly. This simplifies compliance management but creates efficiency losses where systems are over-governed relative to specific framework requirements.
More sophisticated approaches maintain parallel classification records with framework-specific compliance programs, using shared governance infrastructure where requirements overlap and specialized compliance activities where they diverge. ISO 42001 certification supports this approach by establishing a common management system foundation that accommodates multiple overlapping classification requirements. Automated compliance mapping tools, still in early development, aim to maintain real-time visibility into how each AI system is classified across applicable frameworks and what aggregate compliance obligations result.
International coordination efforts aim to reduce classification fragmentation without mandating uniform tier structures. The OECD's AI classification work provides common terminology and typology that multiple jurisdictions reference. The Global Partnership on Artificial Intelligence facilitates dialogue between regulatory frameworks. Bilateral regulatory dialogues between the EU and US, EU and Japan, and EU and UK explore mutual recognition possibilities for AI assessments.
Full interoperability remains distant because the underlying governance objectives that generate different classification systems are legitimately distinct. The more achievable near-term goal is translational infrastructure: standardized documentation that enables organizations to demonstrate how classification under one system relates to obligations under another, reducing the compliance burden of multi-framework navigation without requiring the politically and technically challenging project of harmonizing the classification systems themselves.