The key advantage of self-play is that we don't actually have labels for the "right" probability to assign any given question, only binary outcomes - each event either happened (1.0) or did not happen (0.0).
Our thinking was that by generating multiple predictions and ranking them by proximity to the ground truth, self-play incentivizes each agent to produce more finely calibrated probabilities - or else the other agent might come just slightly closer to the actual outcome.
The key advantage of self-play is that we don't actually have labels for the "right" probability to assign any given question, only binary outcomes - each event either happened (1.0) or did not happen (0.0).
Our thinking was that by generating multiple predictions and ranking them by proximity to the ground truth, self-play incentivizes each agent to produce more finely calibrated probabilities - or else the other agent might come just slightly closer to the actual outcome.