Agents work better, communicating and negotiating, and the sanctioning of broken promises helps them celebrate
Successful communication and cooperation were crucial for helping societies in development throughout history. Closed board games can be used as a sandbox for modeling and interaction and communication research – and we can learn a lot from playing them. In our last article, Published today in Nature CommunicationsWe show how artificial agents can use communication to better cooperation in board diplomacy, live domains in artificial intelligence research (AI), known for their concentration on the alliance building.
Diplomacy is difficult because it has simple rules, but high complexity due to strong interdependencies between players and its huge space of action. To help solve this challenge, we have designed negotiating algorithms that allow agents to communicate and agree joint plans, enabling them to overcome agents without this ability.
Cooperation is particularly difficult when we cannot rely on our peers to do what they promise. We use diplomacy as a sandbox to examine what happens when agents may differ from their previous contracts. Our research illustrates the risk that arises when complex agents are able to mislead their intentions or mislead others in their future plans, which leads to another large question: what are the conditions promoting trustworthy communication and teamwork?
We show that the strategy of sanctioning peers, who significantly reduces the benefits of the contract that can gain by abandoning their obligations, thus supporting more honest communication.
What is diplomacy and why is it important?
Games like chessIN pokerIN To goand many video games There has always been fertile AI research ground. Diplomacy is seven negotiating players and an alliance formation, played on the old map of Europe divided into provinces, where each player controls many units (Principles of diplomacy). In the standard version of the game, called press diplomacy, each round contains a phase of negotiations, after which all players reveal selected movements at the same time.
The heart of diplomacy is the negotiations phase, in which players try to agree to the next moves. For example, one unit can handle another unit, enabling it to overcome resistance by other units, as shown here:
Two traffic scenarios.
Left: Two units (a red unit in Burgundy and a blue unit in Gasonka) try to move to Paris. Since the individuals have equal strength, none of them succeed.
Normal: The red unit with Picardy serves a red unit in Burgundy, incapacitating the Blue unit and enabling a red burgundy unit.
Computing approaches to diplomacy have been examined since the 1980s, many of which were studied on a simpler version of the game called NO-Press, in which strategic communication between players is not allowed. The researchers also proposed Computer -friendly negotiationsSometimes called “limited press”.
What did we study?
We use diplomacy as analogue for real negotiations, providing methods for AI agents to coordinate their movements. We take Our non -communication agents of diplomacy And expand them to play diplomacy with communication, giving them a protocol of negotiating contracts for a joint action plan. We call these extended base agents negotiators and they are bound by their contracts.
Diplomacy agreements.
Left: The red player can only take certain actions (they are not allowed to move from Ruhr to Burgundy and have to go from Piedmont to Marseille).
Normal: A contract between red and green players, which sets restrictions on both sides.
We consider two protocols: a mutual proposal protocol and proposal protocol, discussed in detail in Paper. Our agents use algorithms that identify mutually favorable offers, simulating how the game can develop on the basis of various contracts. We use Nash tender solution With Game theory As a basic foundation for identifying high -quality contracts. The game can develop in many ways depending on the players of the players, so our agents use Monte-Carlo simulation to see what can happen in the next turn.
Simulating the next states, receiving a agreed contract. On the left: the current state in part of the management board, including agreement agreed between red and green players. On the right: many possible subsequent states.
Our experiments show that our negotiating mechanism allows the initial negotiators to significantly exceed the basic non -communication agents.
Basic negotiators significantly exceed unwavering agents. On the left: a report of mutual proposal. On the right: proposal protocol. “NEGOCIATOR advantage” is the ratio of winning indicators between communication agents and non -communication agents.
Agents were violating the contracts
In the diplomacy of the contract concluded during negotiations, they are not binding (communication is “cheap conversation ''). But what happens when agents who agree to a contract in one queue deviates from her next? In many real conditions, people agree to act in a certain way, but later they do not fulfill their obligations. To enable cooperation between AI agents or between agents and people, we must examine the potential trap of agents strategically breaking their agreements and how to remedy this problem. We used diplomacy to examine how the ability to abandon our obligations creates trust and cooperation and identify conditions supporting fair cooperation.
Therefore, we are considering Deviator agents who overcome honest base negotiators by moving away from agreed contracts. Simple devacts just “forget” that they agreed to the contract and moved as they want. Conditional analing is more sophisticated and optimize their actions, assuming that other players who have adopted the contract will act in accordance with this.
All types of our communication agents. According to the green grouping conditions, each blue block represents a specific agent algorithm.
We show that simple and conditional deviators significantly exceed the database of negotiators, mostly tilting.
Deviator agents versus the basic agents of negotiators. On the left: a report of mutual proposal. On the right: proposal protocol. “Advantage Deviator” is the ratio of winning indicators between Deviator agents in relation to the base negotiators.
Encouraging agents
Then we deal with the problem of deviation with the help of defensive agents, which adversely react to deviations. We study binary negotiators who simply cut off communication with agents who have filed an agreement with them. But shonning is a mild reaction, so we also develop sanctioning agents who do not take betrayal slightly, but instead modifies their goals to actively reduce the deviating value – an opponent with injury! We show that both types of defense agents reduce the advantage of deviation, especially sanctioning agents.
Agents from outside the decline (base negotiators, binary negotiators and sanctioning agents) playing against conditional rain. On the left: Mutual protocol of proposals. On the right: propose a selection protocol. The “deviation advantage” values ​​lower than 1 indicate that the defensive agent exceeds the Deviator agent. The population of binary negotiators (blue) reduces the advantage of deviations compared to the population of the initial negotiators (gray).
Finally, we introduce learned deviators who adapt and optimize their behavior against agents sanctioning in many games, trying to make the above defense less effective. The scholar Deviaat will only break the contract if immediate profits from the deviation are high enough and the ability of another agent to revenge is low enough. In the practice of learned ones, he sometimes takes contracts at the end of the game, and thus achieve a slight advantage over sanctioning agents. Nevertheless, such sanctions prompt the learned deviator to honor over 99.7% of their contracts.
We also examine the possible dynamics of learning sanctions and deviations: what happens when sanctioning agents can also differ from contracts, and the potential incentive to stop sanctions when such behavior is expensive. Such problems can gradually erosion cooperation, which is why additional mechanisms may be needed, such as repetition of interaction in many games or the use of trust and reputation systems.
Our article leaves many questions for future research: can you design more sophisticated protocols to encourage even more honest behavior? How can you deal with a combination of communication techniques and imperfect information? Finally, what other mechanisms can stop the breach of contracts? Building honest, transparent and trustworthy AI systems is an extremely important topic and is a key part of the Deepmind mission. Studying these questions in a sandbox, such as diplomacy, helps us better understand the tensions between cooperation and competition that can exist in the real world. Ultimately, we believe that the solution to these challenges allows us to better understand how to develop AI systems in accordance with the values ​​and priorities of society.
Read our full paper Here.
Thanks
We would like to thank Will Hawkins, Aliy Ahmad, Dawn Bloxwich, Lila Ibrahim, Julia Pawar, Sukhdeep Singh, Tom Anthony, Kate Larson, Julien Perlatat, Marc Lanctot, Edward Hughes, Richard Ives, Karl Tuyls, Satinder Singh and Koray Kavukcus and Koray Kavukcus for their support and work advice.
Paper authors
János Kramár, Tom Eccles, Ian Gemp, Andrea Tacchetti, Kevin R. McKE, Mateusz Malinowski, Thore Graepel, Yoram Bachrach.