It just hasn't been prompted or fine-tuned to have the neutral, self effacing pe... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		make3 on March 22, 2023 \| parent \| context \| favorite \| on: Show HN: ChatLLaMA – A ChatGPT style chatbot for F... It just hasn't been prompted or fine-tuned to have the neutral, self effacing personality of ChatGPT. It's doing the pure, "try to guess the most likely next token" task on which they were both trained (https://heartbeat.comet.ml/causal-language-modeling-with-gpt...). ChatGPT is further trained with reinforcement from human feedback to make them more tool-like (https://arxiv.org/abs/2204.05862 & https://openai.com/blog/chatgpt & https://arxiv.org/abs/2203.02155), with a bit of randomness added for variety's sake (https://huggingface.co/blo1g/how-to-generate).

int_19h on March 22, 2023 | [–]

That bit about "fine-tuned on the Alpaca dataset" is precisely about that. But, yeah, no RLHF so far, although some people are already working on that.

ec109685 on March 22, 2023 | [–]

The randomness link doesn’t work. All the ChatGPT products need randomness or it can’t be creative.

make3 on March 22, 2023 | [–]

https://huggingface.co/blog/how-to-generate

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact