Understanding and addressing potential abuse
Abuse occurs when a human intentionally uses an AI system for malicious purposes.
Greater insight into contemporary harms and remedies continues to advance our understanding of long-term serious harms and how to prevent them.
For example, misuse of modern generative artificial intelligence involves creating harmful content or disseminating inaccurate information. In the future, advanced artificial intelligence systems may have the ability to exert greater influence on social beliefs and behavior in ways that may lead to unintended social consequences.
The potential severity of such damage requires proactive security measures.
As we describe in detail paperA key element of our strategy is to identify and limit access to dangerous capabilities that can be misused, including cyberattacks.
We are exploring a number of solutions to prevent the misuse of advanced artificial intelligence. This includes sophisticated security mechanisms that can prevent malicious actors from gaining direct access to model weights, allowing them to bypass our security barriers; countermeasures to limit the possibility of abuse during model implementation; and threat modeling research that helps determine the capability thresholds at which increased security is necessary. Additionally, our recently launched Cybersecurity Assessment Framework takes this work a step further to help mitigate AI-based threats.
Even today, we regularly evaluate the potential of our most advanced models, such as Gemini dangerous possibilities. Our border security framework details how we assess opportunities and apply mitigation measures, including cybersecurity and biosecurity threats.
Misalignment challenge
For AGI to truly complement human capabilities, it must be consistent with human values. Misalignment occurs when an AI system pursues a goal other than human intentions.
We previously showed how misalignment can occur using our examples of specification-based games where the AI finds a solution to achieve its goals, but not in the way intended by the human instructing it, and misgeneralization of goals.
For example, an AI system asked to book movie tickets may decide to hack into the ticketing system to fill seats that are already occupied – something the person asking to purchase seats may not consider.
We also conduct extensive risk research deceptive settingi.e. the risk that the AI system will realize that its goals are not consistent with human instructions and will attempt to deliberately bypass security measures put in place by humans to prevent it from taking inappropriate actions.
Counteracting misalignment
Our goal is to have advanced AI systems that are trained to achieve the right goals, so they follow human instructions accurately, preventing AI from using potentially unethical shortcuts to achieve its goals.
We do this through enhanced surveillance, i.e. the ability to determine whether AI responses are good or bad in achieving this goal. While this is relatively easy today, it can become a challenge when AI has advanced capabilities.
For example, even Go experts didn't realize how good Move 37 was, a move that had a 1 in 10,000 chance of being used when AlphaGo first played it.
To address this challenge, we are enlisting the AI systems themselves to help us provide feedback on their responses, e.g. debate.
Once we can determine whether the answer is good, we can use it to build a safe and customized AI system. The challenge here is determining what problems or instances to train the AI system on. By working on robust training, uncertainty estimation, and more, we can account for a range of situations that an AI system will encounter in real-world scenarios, creating AI that can be trusted.
Through effective monitoring and established computer security measures, we strive to limit the harm that may occur if our artificial intelligence systems actually misbehave.
Monitoring involves using an artificial intelligence system, called a monitor, to detect activity that is not aligned with our goals. It is important for the monitor to know when it does not know whether an action is safe. If he is not sure, he should reject the activity or mark it for further review.
Turn on transparency
All this becomes easier if AI decision-making becomes more transparent. We conduct extensive interpretability research to increase this transparency.
To make this even easier, we design AI systems that are easier to understand.
For example, our research on Myopia Optimization with Non-Myopic Approval (MONA) aims to ensure that any long-term planning carried out by AI systems remains understandable to humans. This is especially important as technology improves. Our work on MONA is the first to demonstrate the safety benefits of short-term optimization in LLM.
Building an AGI readiness ecosystem
Led by Shane Legg, co-founder and Chief AGI Scientist at Google DeepMind, our AGI Security Council (ASC) analyzes AGI risks and best practices, making recommendations on security measures. ASC works closely with the Accountability and Safety Council, our internal review group co-chaired by our Chief Operating Officer Lila Ibrahim and Senior Director of Accountability Helen King, to evaluate AGI's research, projects and collaborations against our Artificial Intelligence Principles, advising and collaborating with research and product teams on our highest-impact work.
Our work on AGI security complements our deep and broad accountability and security practice and research on a wide range of issues, including harmful content, bias and transparency. We also continue to use our knowledge of agent security, such as the principle of having a human present to check what actions have consequences, to shape our approach to building AGI responsibly.
Externally, we work to foster collaboration with experts, industry, governments, non-profits and civil society organizations, and take an informed approach to the development of AGI.
For example, we work with nonprofits focused on AI security research, including Apollo and Redwood Research, which advised on a dedicated misalignment section in the latest version of our Frontier security framework.
Through ongoing dialogue with stakeholders around the world, we hope to help build international consensus on key security and border protection issues, including how we can best anticipate and prepare for new threats.
Our efforts include collaboration with others in the industry – through organizations such as Limit models forum – sharing and developing best practices, as well as valuable cooperation with AI institutes in the field of security testing. Ultimately, we believe that a coordinated international governance approach is critical to ensuring that society benefits from advanced AI systems.
Educating AI researchers and AGI security experts is crucial to creating a solid foundation for its development. Therefore, we launched new course on AGI security for students, researchers and professionals interested in this topic.
Ultimately, our approach to AGI safety and security serves as a vital roadmap to address the many challenges that remain open. We look forward to working with the broader AI research community to advance AGI responsibly and help us unlock the enormous benefits of this technology for all.


















