I don't get the conclusion the author is trying to draw. If you look at the data...

kcorbitt · on Oct 28, 2024

This is a fair point. The reason why I think "correlation" is a better metric than "predicts the exact correct score" is because of how I'll be using this model in the next post.

Broadly, the main use case for this model (in the RL context) will be to take two different versions of the same post, and predict which of the two is more likely to be upvoted. So what matters isn't that it gets the exact number of upvotes correctly, but that it correctly predicts the relative difference in likely upvote count between two variants.

Now it still doesn't do a great job at that (the correlation is only 0.53 after all) but it still does a good enough job to provide some useful signal.

espadrine · on Oct 29, 2024

That makes me wonder though what the best loss function was. I assume you used MSE on the logscore. I wonder if a sigmoid on which of two articles has the higher score would yield better results for the downstream RLHF task.

manx · on Oct 29, 2024

Scores are not a good metric to be compared. I did some data analysis and wrote about it here: https://felx.me/2021/08/29/improving-the-hacker-news-ranking...

nl · on Oct 29, 2024

The score divergence is likely because if a story makes the front page then it almost certainly gets comments and each comment adds one to the score.

But the number of comments depends on the time posted more than the story itself and that information isn't in the model.