I've heard it's because GPT's input is whole words/tokens, not characters. So it...

xg15 · on Dec 17, 2022

It's even more interesting, I think. The tokens are a byte pair encoding [1] of the input string. So a short, frequent word might be represented as one token, but an infrequent word (such as "bbbbbbb") might be represented by several tokens, each of which might or might not correspond to a letter.

This might also explain the weird "off-by-one" errors with the ROT13 task.

[1] https://en.m.wikipedia.org/wiki/Byte_pair_encoding