He didn't say "temperature, whatever that means", he said "temperature 0.7, what...

fauxpause_ · on May 21, 2023

Temperature is a measure of capriciousness. How likely is the model to choose a token that is not “the most likely” next token.

It’s not a big ask to look this up. But even if you don’t, making a point to show that you don’t know it seems bad.

dragonwriter · on May 21, 2023

> Temperature is a measure of capriciousness.

Yes, that’s what “temperature” means, what does a temperature of 0.7 mean?

> It’s not a big ask to look this up. But even if you don’t, making a point to show that you don’t know it seems bad.

Well, no, making a point of highlighting the points of your ignorance when discussing something is good. Especially when you are a notable expert in the broad field being discussed.

fauxpause_ · on May 21, 2023

https://lukesalamone.github.io/posts/what-is-temperature/

> Well, no, making a point of highlighting the points of your ignorance when discussing something is good. Especially when you are a notable expert in the broad field being discussed.

I disagree. Stating “whatever that means” indicates dismissiveness, not a transparent lack of expertise. Also, you should know what it means if you’re an expert.

This quote implies to me that he is actually a beginner when it comes to this technology but is expecting to be treated like an expert whose experience generalizes

copperx · on May 22, 2023

Absolutely disagree. I don't think anyone, except someone with access to the source code, knows exactly what temperature 0.7 means.

Knuth is a world expert in randomized algorithms. Do you think he doesn't have a good intuition for what could be happening? But he's a stickler for detail, and temperature is an obfuscation.

fauxpause_ · on May 22, 2023

I’m getting pretty titled at the number of people who are ignoring everything I’m posting and claiming temperature is some unknowable thing because Knuth does not know what it is. Look at my link. This is not a concept specific to them. It’s a single term in the softmax selection.

There is no reason to assume that OpenAI has changed the definition of this term.

woah · on May 21, 2023

They could literally have asked chatGPT and gotten a great explanation

doetoe · on May 21, 2023

I don't know what prompt you used, but this is what it tells me (just to be clear, I don't think it explains anything beyond higher temperature = higher randomness, range of API values 0 to 2):

> In the OpenAI GPT API, the temperature parameter controls the randomness of the model's output. A temperature value of 0.7 in the GPT API means that the model's responses will have a moderate level of randomness.

> When generating responses, a higher temperature value, such as 1.0, makes the output more random and creative. This can result in more diverse and unpredictable responses. On the other hand, a lower temperature value, such as 0.2, makes the output more focused and deterministic, with the model tending to choose more probable and conservative responses.

> By setting the temperature to 0.7, you can expect the model's responses to strike a balance between randomness and coherence, offering a mix of expected and unexpected answers.

fauxpause_ · on May 21, 2023

Seems like a good answer. If you want a deeper answer, you should ask a follow up question. Here is my follow up answer to “can you explain it at a deeper level”

> At a deeper level, the temperature parameter in generative AI models like GPT-3 affects the softmax function, which is used to convert the model's output into a probability distribution over the possible words or tokens.

> The softmax function takes the logits, which are the unnormalized output values of the model, and applies the exponential function to each value, normalizing them to produce probabilities. The formula for the softmax function is as follows:

> P(y_i) = exp(logit(y_i)) / sum(exp(logit(y_j)))

> Where:

> P(y_i) is the probability of the i-th token in the output sequence. logit(y_i) is the logit value (unnormalized score) of the i-th token. > The sum is taken over all possible tokens. > The temperature parameter comes into play when computing the exponential function. The temperature value, denoted as T, is applied as an exponent to the logits before performing the softmax. Mathematically, the modified softmax function with temperature is:

> P(y_i) = exp(logit(y_i) / T) / sum(exp(logit(y_j) / T))

> The temperature parameter T controls the steepness of the softmax function. A higher temperature value (e.g., 1.0) increases the exponentiation effect, making the probabilities more spread out and the distribution more uniform. This leads to a higher likelihood of selecting less probable tokens, resulting in more diverse and random output.

> Conversely, a lower temperature value (e.g., 0.5) reduces the exponentiation effect, making the probabilities sharper and more focused. The most probable tokens are emphasized, leading to more deterministic and conservative output.

> By adjusting the temperature, you can manipulate the balance between exploration and exploitation during text generation. Higher temperatures encourage the model to explore more possibilities, while lower temperatures bias it towards exploiting the most likely choices based on the training data.

> It's important to note that the choice of temperature is a design decision, and there is no fixed rule for determining the optimal value. It often requires experimentation and understanding the specific behavior of the model to achieve the desired output characteristics

doetoe · on May 21, 2023

Not saying it's bad as a qualitative answer, but it doesn't say anything quantitative about the effect of the temperature in the ChatGPT API. Temperature is a well known and wel documented concept, but if you don't know what y_i is, and for all I know that's just a number coming out of a black box with billions of parameters, you don't know what temperature 0.7 is, beyond the fact that a token i whose logit(y_i) is 0.7 higher that that of another token, is e times as likely to be produced. What does that tell me? Nothing.

fauxpause_ · on May 21, 2023

My dude it’s not my fault if you don’t understand the concept of asking follow up questions for clarification. This isn’t like a Google search. The way you retrieve knowledge is different

codethief · on May 21, 2023

> It’s not a big ask to look this up.

For the guy who doesn't even use email? https://www-cs-faculty.stanford.edu/~knuth/email.html

doetoe · on May 21, 2023

Maybe you misread my comment ;) I'm sure Knuth knows qualitatively what is meant by temperature, it's been used as a measure for randomness for half a century in simulated annealing and other algorithms

fauxpause_ · on May 21, 2023

I don’t really care if he knows it or not. Best case he’s virtue signaling ignorance.

doetoe · on May 21, 2023

I think you're still misreading my comment (and dragonwriter's and Knuth's): we all know or can look up what temperature is in randomized algorithms. However, what temperature 0.7 means is a mystery to me. I know that at temperature 0 the result is deterministic, and at higher temperature the randomness increases (possibly they are the Boltzmann factors associated to some energy function, but I don't know, and even if it is, I have no idea how it is scaled, i.e. what is the value of the Boltzmann constant). I know that the API accepts values from 0 to 2. I don't know more. Do you?

fauxpause_ · on May 21, 2023

Yes. I have posted both a very nice link and a complete explanation from chat gpt 3.5 itself. It’s honestly not that complicated, especially for someone who is supposed to have any sort of authoritative view in the field.

I do not feel it is appropriate for you to say you have looked it up if you don’t know what it is besides an API input that affects randomness.