For Mixture of Expert models (like GPTs are), they can produce different results...

For Mixture of Expert models (like GPTs are), they can produce different results for an input sequence if that sequence is retried together with a different set of sequences in its inference batch, because of the model (“expert”) routing depends on the batch, not the single sequence: https://152334h.github.io/blog/non-determinism-in-gpt-4/

And in general, binary floating point arithmetic cannot guarantee associativity - i.e. `(a + b) + c` might not be the same as `a + (b + c)`. That in turn can lead to the model picking another token in rare cases (and it’s auto-regressive consequences, that the entire remainder of the generated sequence might differ): https://www.ingonyama.com/blog/solving-reproducibility-chall...

Edit: Of course, my answer assumes you are asking about the case when the model lets you set its token generation temperature (stochasticity) to exactly zero. With default parameter settings, all LLMs I know of randomly pick among the best tokens.