Understanding the Significance of Temperature Settings in Generative AI: A Comprehensive Exploration
In today’s column, I am continuing my ongoing coverage of prompt engineering strategies and tactics that aid in getting the most out of using generative AI apps such as ChatGPT, GPT-4, Bard, Gemini, Claude, etc. The focus this time will be on the importance of a technical aspect known as “temperature” and “temperature settings” which involve important parameter adjustments that can be made in some generative AI apps.
The temperature setting for generative AI determines how varied the responses by generative AI will be. You can either have the AI produce relatively straightforward and somewhat predictable responses (that’s via the use of a low temperature), or you can heat things up and use high temperatures to prod AI toward producing seemingly more creative and less predictable responses.
I hesitate to compare AI to human capacities due to overstepping into anthropomorphizing AI, but I’ll ask for your indulgence for a moment. You probably have friends or colleagues who when they get heated up start to come up with amazing ideas. In contrast, when they are in a cooler state, they tend to be more plodding in their suggestions. This might be an analogous facet to the use of the temperature settings in generative AI (note that today’s AI is not sentient and please do not think so otherwise).
In the case of generative AI, suppose you ask a question about Abraham Lincoln. If the temperature setting of the AI is currently set at a low range, the odds are that you will get the usual run-of-the-mill response about Lincoln’s life. On the other hand, if the temperature is set relatively high, you are bound to get some unexpected indications about Lincoln that you never knew of.
Which is better, using a low temperature or a high temperature when working in generative AI?
That was a bit of a trick question.
The answer is that choosing the temperature depends upon what you are trying to accomplish. Do you want staid answers that are of a somewhat expected nature? Okay, go ahead and use a lower temperature. Do you want potentially wild and unpredictable answers from generative AI? Fine, use a higher temperature.
I would dare say that you cannot categorically declare that a low temp versus a high temp will always be better than the other. The situation ought to determine which temperature you opt to utilize.
Thus, a notable rule of thumb is that selecting a temperature for generative AI is usually situationally dependent, all else being equal.
Not all generative AI apps necessarily have a temperature parameter. Those that do will often not allow individual users to adjust the temperature. The temperature in that case is typically set across the board for all users by the AI maker. You have to just live with it, whatever it might be set at.
Global settings of temperature are usually of a somewhat neutral basis. The idea is that the temp isn’t too cold nor too hot. It’s the classic Goldilocks principle. This provides a fairly predictable set of outcomes that also allows for a touch of variety. The parlance of the AI field is that predictable AI is known as being deterministic, while less predictable AI is known as being non-deterministic.
Higher temps tend toward AI being more so non-deterministic.
Lower temps tend toward AI being more so deterministic.
Some generative AI apps do allow users to adjust the temperature, doing so for just their individual use (this doesn’t impact the global setting that is established for all users of the AI). The generative AI might be set at a medium or neutral temperature for everyone, while individual users are allowed to change the temp for their instance of using the AI.
The rub is that even if you can adjust the temperature, this often requires only doing so when using the API (application programming interface) for accessing the generative AI. The point is that you typically cannot simply provide a prompt that tells the AI to adjust the temperature. You will have to be somewhat more programming-oriented to do so.
There is a sneaky means to indirectly emulate a temperature change. You can tell generative AI via a prompt to act as though the temperature is set at some particular value. This won’t change the actual internal parameter. Instead, the generative AI will be pretending that you did so. This kind of indirectly simulates things, maybe, sort of. It is said to be a cheap way to play the temperature-changing gambit. To be abundantly clear, this is not the same as adjusting the true underlying temperature parameter within the AI.
Here’s a perhaps surprising aspect.
Temperatures are something that has been around for a very long time in the statistical modeling realm. Anyone who has taken an advanced statistics class might be aware of the use of temperatures for adjusting how a statistical technique will function. I mention this to emphasize that the AI field has carried over a common convention and there isn’t anything oddball or outlandish about the notion.
There isn’t an all-agreed standardized way of saying what numeric values a temperature must be.
A common convention is that the value of 1 as the temperature is construed as the neutral point. A value less than 1 means that you are seeking a more predictable or deterministic outcome in the generated responses. A value greater than 1 means that you want a less predictable and more non-deterministic output from the AI.
In my classes on prompt engineering, I tell the attendees to think of the temperature as a pot of water. If the pot goes over a temperature of 1, you start to boil or excite the water. This can lead to wild results. If the pot is less than a temperature of 1, you are going to have somewhat calm water and more predictable results.
Another way to express the idea of predictability is to refer to randomness.
A value of 1 is considered a normal or natural semblance of randomness. There is just enough randomness in the answer generated to make the response seem relatively unique and not merely a copycat of prior answers. A value less than 1 for the temperature indicates that randomness should be further tightened. You might begin to see answers that have been seen before. A value greater than 1 says that randomness can be let loose, and the answers will be quite varied.
The differences in answers at low versus high temperatures might not be as striking as you would expect them to be. Sometimes the responses will only be subtly different. You will need to look closely to realize that the answers aren’t quite the same.
Allow me to give you a quick example.
I am going to ask generative AI to tell me about sunsets.
I opted to go into the generative AI app and set the temperature to less than 1. Let’s say I make the temperature 0.5. We know that this means I am asking to get a more predictable style of answer.
Here’s what might happen.
My entered prompt: “Describe a sunset.”
Generative AI response (temperature is low at 0.5): “The sunset was beautiful, with shades of orange and pink blending into the sky as the sun slowly dipped below the horizon.”
I then adjust the temperature to be above 1, namely at 1.5. I ask the same question and want to see what answer I get.
My entered prompt: “Describe a sunset.”
Generative AI response (temperature is high at 1.5): “As the sun dipped below the horizon, the sky exploded into a riot of colors – fiery oranges, deep purples, and a splash of crimson, painting a surreal masterpiece that felt almost otherworldly.”
I want you to help me compare the two responses, thus here they are side-by-side:
Low temp: “The sunset was beautiful, with shades of orange and pink blending into the sky as the sun slowly dipped below the horizon.”
High temp: “As the sun dipped below the horizon, the sky exploded into a riot of colors – fiery oranges, deep purples, and a splash of crimson, painting a surreal masterpiece that felt almost otherworldly.”
Do you observe that the first response was somewhat more staid, while the second response was elaborative and had a flourish?
I purposely picked an example that wasn’t a knock-your-socks-off difference. I wanted you to see that the temperature setting does not necessarily cause the world to go berserk. Generative AI is still going to likely provide a response that is generally within the realm of normalcy.
When Temperatures Lead To AI Hallucinations
There is a twist that you need to be aware of.
A piece of conventional wisdom is that the higher the temperature that you set, the likelier that an AI hallucination will occur.
First, I disfavor the catchphrase of AI hallucination since it tends to anthropomorphize AI. Anyway, the moniker has struck a chord with society, and we are stuck with it. An AI hallucination simply means that the generative AI produces a response that contains fictitious elements that are not grounded in facts or presumed truths, see my analysis and coverage of AI hallucinations at the link here.
Your rule of thumb is this. If you strive to increase the temperature, the good news is that you are potentially getting a more creative kind of response. The bad news is that you are risking false aspects immersed in the response. You will have to very carefully inspect the response to ascertain that what it says is basically truthful.
Things perhaps can get even worse. The response might contain portions that seem hallucinatory in that they are wild and crazy. To some degree, the wording might appear to be altogether incoherent.
A tradeoff exists about using high temperatures. You might get surprising results. This could be seemingly creative and awe-inspiring. It might give you new ideas or showcase some potential innovations that heretofore were not necessarily apparent. The results might also be filled with falsehoods. Some of the responses might be utterly incoherent.
There is a technical perspective that says you are engaging in an exploration across a vast solution space. If you use low temperatures, you are aiming to discover typical or highly probable solutions in the solution space. If you use high temperatures then you are willing to look across a large swath of the solution space, hoping to find something at the edge of the solution arena.
One other thought comes to mind.
Most people who use generative AI for day-to-day purposes will probably never try to adjust whatever temperature has already been set for the generative AI. Few people know that a temperature setting exists. Of those that do know about it, they generally don’t mess with it.
The mainstay of those who seek to adjust the temperature are usually serious-minded prompt engineers who are tasked with using generative AI for harder or more novel problems. In addition, researchers and AI scientists examining the newest possibilities of generative AI are often playing around with temperatures to gauge how far AI can be stretched.
You know what your situation is, ergo you’ll need to decide to what degree you might want to get involved with setting temperatures in generative AI. If nothing else, I urge that all prompt engineers be aware of temperatures and know what they are for. It’s a fundamental aspect of generative AI and large language models (LLMs).
Latest Research Reveals More About Temperatures
Our collective understanding of the impacts of temperature settings is actually rather dismally slim. A lot of conjecture is out there. Some pundits will claim that this or that temperature will do this or that thing. These proclamations are often based on a seat-of-the-pants opinion. Watch out for flimflam.
Luckily, there is a growing body of research that seeks to empirically explore temperature settings in generative AI.
The rough news is that since generative AI apps and LLMs are continuously being improved and updated, there is a moving target syndrome involved. The moment a particular generative AI app is studied, a month later or even a day later the same experiment might produce quite different results.
An additional dilemma is that generative AI apps are different from each other. Just because you experiment on one generative AI app regarding temperature doesn’t mean that some other generative AI app will react in the same way. To try and deal with this conundrum, some researchers will use a multitude of generative AI apps when conducting their research. Good for them.
What I’m trying to tell you is that you need to interpret any such research with a heavy grain of salt.
I grandly applaud my fellow AI researchers for tackling the temperature topic. They are doing vital work. Thanks go to them for their heroics. Meanwhile, we all must be mindfully cautious in making any overreaching conclusions. I’ll say this, at least research studies try to do things in a systematic way, which far exceeds those that merely spout temperature-related pronouncements based on the thinnest of speculation and conjecture.
Okay, I will get down off my soapbox.
Let’s look at some recent research.
In a study entitled “Toward General Design Principles for Generative AI Applications” by Justin Weisz, Michael Muller, Jessica He, and Stephanie Houde, arXiv, January 13, 2023, here were salient points (excerpts):
“Keeping humans in control of AI systems is a core tenet of human-centered AI.”
“One aspect of control relates to the exploration of a design space or range of possible outcomes.”
“Many generative algorithms include a user-controllable parameter called temperature.”
“A low-temperature setting produces outcomes that are very similar to each other; conversely, a high-temperature setting produces outcomes that are very dissimilar to each other.”
“In the ‘lifecycle’ model, users may first set a high temperature for increased diversity, and then reduce it when they wish to focus on a particular area of interest in the output space. This effect was observed in a study of a music co-creation tool, in which novice users dragged temperature control sliders to the extreme ends to explore the limits of what the AI could generate.”
I’ll provide a few thoughts based on those key points.
You can conceive of temperature settings as a means of controlling generative AI. From that macroscopic viewpoint, it is useful and perhaps mandatory to have temperature settings. A crucial belief about AI ethics is that we should be aiming toward human-centric AI, see my coverage at the link here. Temperature settings give some modest ability to control AI. Sort of.
I liked it too that a research study about creating music was mentioned.
This seems to vividly highlight what I’ve been saying about the temperature settings. If you wanted to compose music via generative AI, you would be wise to use the temperature settings as an added means of doing so. Imagine that you wanted the music to be conventional. Easy-peasy, set the temperature low. For those who might want to explore the outer ranges of musical composition, you would set the temperature high.
You’ve now gotten your feet wet in the research realm of generative AI and temperatures.
Moving on, in a research study entitled “Is Temperature the Creativity Parameter of Large Language Models?” by Max Peeperkorn, Tom Kouwenhoven, Dan Brown,3 and Anna Jordanous, arXiv, May 1, 2024, these valuable points were made (excerpts):
“Large language models (LLMs) are applied to all sorts of creative tasks, and their outputs vary from beautiful, to peculiar, to pastiche, into plain plagiarism.”
“Temperature is a hyperparameter that we find in stochastic models to regulate the randomness in a sampling process.”
“The temperature parameter of an LLM regulates the amount of randomness, leading to more diverse outputs; therefore, it is often claimed to be the creativity parameter.”
“Here, we investigate this claim using a narrative generation task with a predetermined fixed context, model, and prompt. Specifically, we present an empirical analysis of the LLM output for different temperature values using four necessary conditions for creativity in narrative generation: novelty, typicality, cohesion, and coherence.”
“We observe a weak positive correlation between temperature and novelty, and unsurprisingly, a negative correlation between temperature and coherence. Suggesting a tradeoff between novelty and coherence.”
“Overall, the influence of temperature on creativity is far more nuanced and weak than the ‘creativity parameter’ claim suggests.”
This was an empirical study that experimented with a particular generative AI app. Keep that in mind when seeking to generalize the results of the study.
Their effort suggests that as you raise the temperature there is a rise in the novelty of the response, though they indicated it was a weak correlation. That’s generally though a handy result since it supports the seat-of-the-pants beliefs on that presumed relationship.
They also found that as the temperature goes up coherence tends to lessen, and likewise as the temperature goes down the coherence tends to go up. This is also something that conjecture has suggested. Furthermore, you need to be cautious of the tradeoff between striving unduly for novelty that might then introduce and intertwine regrettable incoherence.
I mentioned that to you earlier.
Finally, the widely stated idea that temperature is an all-encompassing magical means of sparking incredible creativity was seen as not borne out via the study.
I would say that anyone who seriously knows or uses temperature settings would agree wholeheartedly with this result. There seems to be a myth floating around that the wanton use of high temperatures gets you out-of-this-world creativity. I don’t think so. You can get modest creativity. And you will usually get the downsides of infused incoherence.
I’ll hit you with one more research study, doing so to whet your appetite and hopefully encourage you to consider reading up on this type of research. Of course, you are equally encouraged to dive into the pool and do research that contributes to this budding area of interest.
In a research study entitled “The Effect of Sampling Temperature on Problem-Solving in Large Language Models” by Matthew Renze and Erhan Guven, arXiv, February 7, 2024, these points were made (excerpts):
“The prompt engineering community has an abundance of opinions and anecdotal evidence regarding optimal prompt engineering techniques and inference hyperparameter settings. However, we currently lack systematic studies and empirical evidence to support many of these claims.”
“In this research study, we empirically investigate the effect of sampling temperature on the performance of Large Language Models (LLMs) on various problem-solving tasks.”
“We created a multiple-choice question-and-answer (MCQA) exam by randomly sampling problems from standard LLM benchmarks.”
“Then, we used four popular LLMs with five prompt-engineering techniques to solve the MCQA problems while increasing the sampling temperature from 0.0 to 1.0.”
“Despite anecdotal reports to the contrary, our empirical results indicate that changes