restricting the distribution of potential output imposes a cost. "Alignment" here likely refers to aligning the model to the desired safety parameters.
I'm not in the llm research business but I would expect that the best and worst/most dangerous outputs come from the tails of distributions. I imagine the tuning for safety often results in fewer really good and really bad answers by trimming these tails.
I have found in practice it can be annoying for ChatGPT to start lecturing me in response to a prompt that is not particularly controversial or edgy. I think this is a problem with the one-size-fits-all models. To give a kind of rough analogy, imagine that every time you watched a film or show - which would most likely be an older film or show - with cigarette smoking, your smart TV showed a pop up dialog warning you about the dangers of smoking. If you're an educated adult who already knows about these dangers, you might just find that annoying and condescending, and not "aligning" with your preferences.
I'm not in the llm research business but I would expect that the best and worst/most dangerous outputs come from the tails of distributions. I imagine the tuning for safety often results in fewer really good and really bad answers by trimming these tails.
Edit: I asked chatGPT4: https://chat.openai.com/share/a2c7d380-c6eb-4745-b91d-c3996a...