Update of the border security framework

Our next version of the FSF outlines stronger safety protocols on the path to AGI

Artificial intelligence is a powerful tool that helps discover new breakthroughs and make significant progress on some of the biggest challenges of our time, from climate change to drug discovery. However, as development progresses, advanced capabilities may create new threats.

That's why last year we launched the first version of our Frontier Safety Framework, a set of protocols that will help us stay ahead of potentially serious threats from powerful pioneering AI models. Since then, we have been working with industry experts, academia and government to deepen our understanding of the threats, the empirical assessments to test them, and the countermeasures we can take. We have also implemented the Framework in our security and governance processes to evaluate pioneering models such as Gemini 2.0. As a result of this work, we are publishing an updated version today Border security framework.

Key platform updates include:

  • Security posture recommendations for our critical capacity levels (CCLs), helping to determine where the greatest efforts are needed to reduce the risk of exfiltration
  • Implementing a more consistent procedure for applying implementation limiting measures
  • Discuss the industry-leading approach to the risk of deceptive customization

Recommendations for increased safety

Security restrictions help prevent unauthorized parties from extracting model weights. This is especially important because access to the model weights allows you to remove most of the protection. Given the risks associated with the emergence of increasingly powerful artificial intelligence, getting it wrong could have serious consequences for safety and security. In our initial framework, we recognized the need for a tiered approach to security, enabling the implementation of risk-specific mitigation measures of varying strengths. This proportionate approach also ensures that the right balance is struck between mitigating risk and supporting access and innovation.

We have continued since then broader research develop these security containment levels and recommend a level for each of our CCLs.* These recommendations reflect our assessment of the minimum appropriate level of security that the field of borderline AI should apply to such models in the CCL. This mapping process helps us determine where the strongest countermeasures are needed to reduce the greatest risk. In practice, some aspects of our security practices may exceed the baselines recommended here due to our high overall security posture.

The second version of the Principles recommends particularly high levels of security for CCLs in the area of ​​machine learning research and development (R&D). We believe it will be important for pioneering AI developers to have strong safeguards in place for future scenarios where their models can significantly accelerate and/or automate AI development itself. This is because the uncontrolled spread of such capabilities could significantly threaten society's ability to carefully manage and adapt to the rapid pace of AI development.

Ensuring the continued security of cutting-edge AI systems is a shared global challenge and a shared responsibility of all leading developers. Importantly, properly solving this problem is a collective action problem: the social value of security mitigation measures implemented by a single actor will be significantly reduced if they are not widely applied across the field. It will take time to build the security capabilities we believe may be needed, so it is critical that all pioneering AI developers work together on stronger security measures and accelerate efforts to introduce common industry standards.

Implementation risk mitigation procedure

The Framework also outlines implementation countermeasures that focus on preventing the misuse of critical capabilities in the systems we deploy. We have updated our deployment mitigation approach to apply a more stringent security mitigation process for models achieving CCL in the misuse risk domain.

The updated approach includes the following steps: We first prepare a set of countermeasures by iterating over the set of countermeasures. In doing so, we will also develop a safety justification, which is an evaluable argument showing how the serious risk associated with the model's CCL has been minimized to an acceptable level. The security justification is then reviewed by the appropriate corporate governance body, and implementation of general accessibility only occurs upon approval. Finally, after implementation, we continue to review and update security and security justification. We made this change because we believe all critical functions require this thorough mitigation process.

Approaching the Risk of Deceptive Adjustment

The first iteration of the Framework focused primarily on misuse risk (i.e., the risk that threat actors could exploit critical capabilities of deployed or mined models to cause harm). Building on this, we have taken an industry-leading approach to proactively eliminating the risk of fraudulent adjustment, i.e. the risk that an autonomous system intentionally undermines human control.

An initial approach to this question focuses on detecting when models may develop a basic ability of instrumental reasoning, allowing them to challenge human control unless safeguards are put in place. To address this, we are exploring automated monitoring to detect illicit use of instrumental reasoning abilities.

We do not expect that automated monitoring will remain sufficient in the long term if models achieve even higher levels of instrumental reasoning, so we are actively undertaking – and strongly encouraging – further research into developing mitigation approaches for these scenarios. While we don't yet know how likely these opportunities are, we believe it's important for the industry to prepare for the possibility.

Application

We will continue to review and develop the Framework over time, guided by ours Principles of artificial intelligencewhich further emphasize our commitment to responsible development.

We will continue to work with partners across society as part of our efforts. For example, if we assess that a model has achieved a CCL that poses a complete and significant risk to overall public safety, our goal is to share the information with appropriate government authorities where it will facilitate the development of safe AI. In addition, the latest Framework identifies a number of potential areas for further research – areas where we look forward to collaborating with the research community, other companies and government.

We believe that an open, iterative and collaborative approach will help establish common standards and best practices for assessing the security of future AI models, while ensuring their benefits for humanity. The AI security commitments at the Seoul border was an important step towards this joint effort, and we hope that our updated border security framework will contribute to further progress. Looking ahead to AGI, properly tackling this issue will mean tackling very important issues – such as appropriate capacity thresholds and mitigation measures – that will require input from wider society, including governments.

LEAVE A REPLY

Please enter your comment!
Please enter your name here