Very cool idea. That'd be a good way to see how it can generalize, actually. I suspect you'd quickly find its limits with something that isn't a toy example of an intro to machine learning.
GPT-3 has been trolling simple examples too. I think it comes from what it crawled. You will find very few conversations like that with just correct data, what gets popular and how jokes are made is when there is a twist.
It's true that there are a lot of limitations still, but I think you are underestimating how powerful it can get with the right input prompt. For example for your task I wrote:
>You are a professional puzzle solver. Only answer with the element in the list that does not fit. Do not include an explanation. Do not write anything except the element that does not fit. Dog, 1, 2, 3
Dog
>Computer, Phone, Tree, Microwave
Tree
>red, blue, green, dog
dog
It seems to perform quite a lot better in my short time of testing than before which seems quite extraordinary to me. Now that is not to say you can't find a bunch of examples where it fails or that it is even close to human level for this particular task. But this still seems like a huge technological advancement to me and I did not expect that ai systems would be at this level quite so soon.
> The word "dog" does not fit with the other words in the list because "1", "2", and "3" are numbers, while "dog" is a type of animal. The other words in the list are all examples of numbers, whereas "dog" is not a number.