Dealing with words on the level of their constituent letters is a known weakness...

Dealing with words on the level of their constituent letters is a known weakness of OpenAI’s current GPT models, due to the kind of input and output encoding they use. The encoding also makes working with numbers represented as strings of digits less straightforward than it might otherwise be.

In the same way that GPT-4 is better at these things than GPT-3.5, future GPT models will likely be even better, even if only by the sheer brute force of their larger neural networks, more compute, and additional training data.

(To see an example of the encoding, you can enter some text at https://platform.openai.com/tokenizer. The input is presented to GPT as a series of integers, one for each colored block.)