It is unsurprising that a system trained on human-generated content might end up encoding implicit bias, toxicity, and negative goals. And the more powerful and general-purpose a system is, the more suitable it is for a wide range of powerfully negative purposes.
Neither specializing the model nor filtering its output seems to have worked reliably in practice.
Neither specializing the model nor filtering its output seems to have worked reliably in practice.