Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Those were discovered by finding strings that OpenAI’s tokenizer didn’t properly split up. Because of this, they are treated as singular tokens, and since these don’t occur frequently in the training data, you get what are effectively random outputs when using them.

The author definitely tries to up the mysticism knob to 11 though, and the post itself is so long, you can hardly finish it before seeing this obvious critique made in the comments.



The ironic thing about lesswrong is that it’s quite the opposite in some fantastically oblivious ways.


Yeah, it’s quite strange indeed. Clearly people with decent educations but zero background in applied research/peer review. More concerned with the sound of their own voice than with whether or not their findings are actually useful (or even true).

Perhaps they are all on stimulants!


Thank you for your opinion on the post that I linked! I'm still curious about the associated neurons though.


Fair enough. You would need to use an open model or work at OpenAI. I assume this work could be used on the llama models - although I’m not aware of anyone has found these glitchy phrases for those models yet.


> You would need to use an open model or work at OpenAI.

The point of this post that we are commenting under is that they made this association public, at least in the neuron->token direction. I was thinking some hacker (like on hacker news) might be able to make something that can reverse it to the token->neuron direction using the public data so we could see the petertodd associated neurons. https://openaipublic.blob.core.windows.net/neuron-explainer/...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: