Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It looks like ChatGPT length is 827 while LLaMA2 length is more than double at 1790.

Disclaimer from the site:

> Caution: GPT-4 may favor models with longer outputs and/or those that were fine-tuned on GPT-4 outputs.

> While AlpacaEval provides a useful comparison of model capabilities in following instructions, it is not a comprehensive or gold-standard evaluation of model abilities. For one, as detailed in the AlpacaFarm paper, the auto annotator winrates are correlated with length.



Also, Llama 2 is still a few percentage points below GPT-4.

Which is not close, because performance is logarithmic in training compute. Each additional percentage point of performance requires exponentially greater investment in compute during pretraining. Llama 2 was pretrained on 2 trillion tokens -- a significant investment in compute, for sure, but still not enough to get close to GPT-4.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: