Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I've heard it's because GPT's input is whole words/tokens, not characters. So it has little insight into their spelling or letter positions, unless it's explicitly mentioned in the training set (e.g. rhymes are easy, counting letters of rare words is hard).

Though this is a specially bad example, surely it had bbbbbb somewhere in it's training set associated with "6".



It's even more interesting, I think. The tokens are a byte pair encoding [1] of the input string. So a short, frequent word might be represented as one token, but an infrequent word (such as "bbbbbbb") might be represented by several tokens, each of which might or might not correspond to a letter.

This might also explain the weird "off-by-one" errors with the ROT13 task.

[1] https://en.m.wikipedia.org/wiki/Byte_pair_encoding




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: