A child or unintelligent person may make errors when attempting to engage in reasoning -- applying the wrong deductive rules, or applying the right ones incorrectly, or applying them to invalid premises -- but LLMs are not even attempting to engage in reasoning in the first place. They are applying no deductive rules, and have no semantic awareness of any of their inputs, so cannot even determine whether they are correct or incorrect.
LLMs are simply making probabilistic inferences about what "words" (tokenized particles of language, not necessarily individual words from our perspetive) are most likely to appear in relation to other words, based on the training data fed into them.
Their output often resembles reasoning simply because the training data contains large amounts of explicit reasoning in it. But there is no actual reasoning process going on in response to your prompt, just probabilistic inference about what words are most closely correlated with the words in your prompt.
If you're getting what appear to be reasoned responses to "novel" prompts, then one of two things is likely happening: either (a) your prompt isn't as novel or unique as you thought it was, and the model's probabilistic inference was sufficient to generate a valid response without reasoning, or (b) the response isn't as well-reasoned as it appears, and you are failing to notice its errors.
If you want to genuinely test an LLM's ability to engage in reasoning, try throwing a complex math problem at it, or a logic puzzle that trips most people up.
LLMs are simply making probabilistic inferences about what "words" (tokenized particles of language, not necessarily individual words from our perspetive) are most likely to appear in relation to other words, based on the training data fed into them.
Their output often resembles reasoning simply because the training data contains large amounts of explicit reasoning in it. But there is no actual reasoning process going on in response to your prompt, just probabilistic inference about what words are most closely correlated with the words in your prompt.
If you're getting what appear to be reasoned responses to "novel" prompts, then one of two things is likely happening: either (a) your prompt isn't as novel or unique as you thought it was, and the model's probabilistic inference was sufficient to generate a valid response without reasoning, or (b) the response isn't as well-reasoned as it appears, and you are failing to notice its errors.
If you want to genuinely test an LLM's ability to engage in reasoning, try throwing a complex math problem at it, or a logic puzzle that trips most people up.