Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I don't get the conclusion the author is trying to draw. If you look at the data presented, it seems that the model was actually pretty bad at guessing the real-world behavior of the posts listed. Out of the top ten it picked:

* 1 had a score that was reasonably close (8.4%) to what the model predicted

* 4 had scores wildly lower than the model predicted

* 2 had scores wildly higher than the model predicted

* the remaining 3 were not wildly off, but weren't really that close either (25%-42% off)

Then there's a list of 10 submissions that the model predicted would have scores ranging from 33 to 135, but they all only received a score of 1 in reality.

The graph shown paints a bit of a better picture, I guess, but it's still not all that compelling to me.



This is a fair point. The reason why I think "correlation" is a better metric than "predicts the exact correct score" is because of how I'll be using this model in the next post.

Broadly, the main use case for this model (in the RL context) will be to take two different versions of the same post, and predict which of the two is more likely to be upvoted. So what matters isn't that it gets the exact number of upvotes correctly, but that it correctly predicts the relative difference in likely upvote count between two variants.

Now it still doesn't do a great job at that (the correlation is only 0.53 after all) but it still does a good enough job to provide some useful signal.


That makes me wonder though what the best loss function was. I assume you used MSE on the logscore. I wonder if a sigmoid on which of two articles has the higher score would yield better results for the downstream RLHF task.


Scores are not a good metric to be compared. I did some data analysis and wrote about it here: https://felx.me/2021/08/29/improving-the-hacker-news-ranking...


The score divergence is likely because if a story makes the front page then it almost certainly gets comments and each comment adds one to the score.

But the number of comments depends on the time posted more than the story itself and that information isn't in the model.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: