You might benefit from a different mental approach to prompting, and models in g...

You might benefit from a different mental approach to prompting, and models in general. Also, be careful what you wish for because the closer they get to humans the worse they’ll be. You can’t have “far beyond the realm of human capabilities” and “just like Gary” in the same box.

They can chain events together as a sequence, but they don’t have temporal coherence. For those that are born with dimensional privilege “Do X, discuss, then do Y” implies time passing between events, but to a model it’s all a singular event at t=0. The system pressed “3 +” on a calculator and your input presses a number and “=“. If you see the silliness in telling it “BRB” then you’ll see the silliness in foreshadowing ill-defined temporal steps. If it CAN happen in a single response then it very well might happen.

“

Agenda for today at 12pm:

1. Read junk.py

2. Talk about it for 20 minutes

3. Eat lunch for an hour

4. Decide on deleting junk.py

“

12:00 - I just read junk.py.

12:00-12:20 - Oh wow it looks like junk, that’s for sure.

12:20-1:20 - I’m eating lunch now. Yum.

1:20 - I’ve decided to delete it, as you instructed. {delete junk.py}

</response>

Because of course, right? What does “talk about it” mean beyond “put some tokens here too”?

If you want it to stop reliably you have to make it output tokens whose next most probable token is EOS (end). Meaning you need it to say what you want, then say something else where the next most probable token after it is <null>.

I’ve tested well over 1,000 prompts on Opus 4.0-4.5 for the exact issue you’re experiencing. The test criteria was having it read a Python file that desperately needs a hero, but without having it immediately volunteer as tribute and run off chasing a squirrel() into the woods.

With thinking enabled the temperature is 1.0, so randomness is maximized, and that makes it easy to find something that always sometimes works unless it doesn’t. “Read X and describe what you see.” - That worked very well with Opus 4.0. Not “tell me what you see”, “explain it”, “describe it”, “then stop”, “then end your response”, or any of hundreds of others. “Describe what you see” worked particularly well at aligning read file->word tokens->EOS… in 176/200 repetitions of the exact same prompt.

What worked 200/200 on all models and all generations? “Read X then halt for further instructions.” The reason that works has nothing to do with the model excitedly waiting for my next utterance, but rather that the typical response tokens for that step are “Awaiting instructions.” and the next most probable token after that is: nothing. EOS.