Google DeepMind strengthens border security framework

We are expanding our risk domains and improving the risk assessment process.

Breakthrough artificial intelligence solutions are changing our everyday lives, from advances in mathematics, biology and astronomy to realizing the potential of personalized education. As we build increasingly powerful artificial intelligence models, we are committed to responsibly developing our technologies and using an evidence-based approach to stay ahead of emerging threats.

Today we are publishing the third iteration of ours Border Security Framework (FSF) — our most comprehensive approach to identifying and mitigating significant threats from advanced AI models.

This update builds on our ongoing collaboration with experts from industry, academia and government. We have also incorporated lessons learned from implementing previous versions and developing best practices for border AI security.

Key Framework Updates

Counteracting the risk of harmful manipulation

In this update, we are introducing a Critical Capacity Level (CCL)* that focuses on malicious manipulation – specifically, AI models with powerful manipulation capabilities that can be misused to systematically and materially change beliefs and behaviors in specific high-stakes contexts while interacting with the model, reasonably resulting in additional expected harm on a significant scale.

This appendix builds on and operationalizes the research we conducted to identify and evaluate it mechanisms driving generative manipulation with artificial intelligence. We will continue to invest in this area in the future to better understand and measure the risks associated with malicious manipulation.

Adapting our approach to the risk of misalignment

We have also expanded our framework to address potential future scenarios where misaligned AI models could disrupt operators' ability to direct, modify, or stop operations.

While our previous version of the Framework included an exploratory approach focused on instrumental inference CCLs (i.e., warning levels specific to when an AI model begins to think deceptively), in this update we now provide further protocols for our machine learning R&D CCLs, focused on models that could accelerate AI research and development to potentially destabilizing levels.

In addition to the risk of misuse arising from these capabilities, there is also a risk of divergence arising from the model's potential for untargeted actions at these capability levels and the likely integration of such models into AI development and deployment processes.

To address the risks posed by CCLs, we conduct security case reviews prior to external launches once the appropriate CCLs have been achieved. This includes carrying out detailed analyzes demonstrating how risks have been reduced to a manageable level. For advanced CCLs engaged in machine learning research and development, large-scale internal deployments can also pose risks, so we are now extending this approach to include such deployments.

Improving our risk assessment process

Our Framework is designed to respond to risks in proportion to their severity. We have tightened our CCL definitions to identify critical hazards that require the most stringent management and mitigation strategies. We continue to employ measures that limit safety and security until certain CCL thresholds are reached and as part of our standard approach to model development.

Finally, in this update we discuss our risk assessment process in more detail. Building on our key early warning assessments, we describe how we conduct holistic assessments that include systematic risk identification, comprehensive model capability analyses, and a clear determination of risk acceptability.

Increasing our commitment to border security

This latest update to our border security framework reflects our continued commitment to taking a scientific and evidence-based approach to tracking and staying ahead of AI threats as capabilities evolve towards AGI. By expanding our risk domains and strengthening our risk assessment processes, we aim to ensure that transformative AI benefits humanity while minimizing potential harm.

Our Framework will continue to evolve based on new research, stakeholder input, and implementation lessons. We remain committed to collaboration between industry, academia and government.

The path to favorable AGI requires not only technical breakthroughs but also a robust risk mitigation framework. We hope that our updated border security framework will make a significant contribution to this collective effort.

LEAVE A REPLY

Please enter your comment!
Please enter your name here