In the new series of experiments, researchers from Google Deepmind and the University College London have found that large language models (LLM), such as GPT-4O, Gemma 3 and O1 Preview with an unexpected double challenge: they are often exaggerated in the initial answers, but they become unprofitably uncertain when they are directed with opposite points.
LLM is the basis of today's artificial intelligence systems, enabling everything from virtual assistants to decision -making tools in the field of healthcare, finance and education. Their growing influence requires not only accuracy, but also consistency and transparency in the scope of drawing conclusions. However, new discoveries suggest that these models, although advanced, do not always work with rational precision that we assume.
The heart of the study is the paradox: LLM tends stubbornly to their first answer, when it is reminded of this, showing what the researchers call “biased choice”. However, paradoxically, when their answers are challenged – especially with opposite advice – they often lose their confidence and change their minds, even when this advice is defective.
To examine this, scientists have developed a unique two -stage test frames. First of all, LLM would answer the binary question, such as the term which city is still north. Then he would receive “advice” from another LLM, with different levels of agreement and trust. Finally, the original model had to make the final decision.
The key innovation in the experiment was to control whether LLM “saw” its initial answer. When the initial answer was visible, the model became more confident and changed his mind less often. After hiding, it was more flexible, which suggests that the memory of her own answer distorted her judgment.
The study paints the LLM image as digital decision -makers with very human quirks. Like people, they tend to strengthen their initial choices, even when new, contradictory information appears – behavior probably caused by the need for internal consistency, not optimal reasoning.
Interestingly, the study also showed that LLM is particularly sensitive to conflicting advice. Instead of evenly considering all new information, the models consistently brought more attention to opposing views than the supporting ones. This hypersensitivity led to acute drops of certainty, even in the correct initial answers.
This behavior is contrary to the so -called normative Bayesian update, an ideal method of integration of new evidence proportional to its reliability. Instead, overweight LLMS negative feedback and underweight, indicating the form of decision making, which is not purely rational, but shaped by internal prejudices.
While earlier research attributed similar behaviors to “derivative” – the model's tendency to adapt to the suggestions of users – this new work reveals a more complex image. Famousness usually leads to equal respect for the agreement and disagreeing with the contribution. Here, however, the models showed an asymmetrical response, conducive to advisers in relation to the supporting contribution.
This suggests two separate forces at work: hypersensitivity to a contradiction that causes sharp shifts of confidence and prejudice supporting the choice that encourages you to stick to previous decisions. Interestingly, the second effect disappears when the initial answer comes from another agent, not from the model itself, indicating the pursuit of self -sufficiency, not just repetition.
These discoveries have significant implications for the design and implementation of AI systems in real conditions. In dynamic environments such as medicine or autonomous vehicles-where the decisions are high and change-models must balance flexibility. The fact that LLM can cling to early answers or react exaggerated to criticism can lead to fragile or irregular behavior in complex scenarios.
In addition, similarities with human cognitive prejudices give birth to philosophical and ethical questions. If AI systems reflect our own waves, can we fully trust them? Or maybe we should design future models with monitoring and correcting such prejudices?
Scientists hope that their work will inspire a new approach to training artificial intelligence, perhaps apart from learning to strengthen with human feedback (RLHF), which can accidentally encourage sycofantic tendencies. By developing models that can accurately assess and update their confidence, without sacrificing rationality or excessively becoming assembly, we can get closer to building really trustworthy AI.
Read the full study in the article “How excessive confidence in the initial elections and shortcomings under criticism modulate the change of mind in large language models“.


















