Yes, obviously there will be caveats when using a tool beyond its intended domain:
* you do need to know that the tool passes non-ASCII through unchanged
* the text should not contain composed latin glyphs
* you're on your own if you're trimming strings to byte lengths
I've added the second point to my comment.
It's not about sweeping bugs under the rug at all. It's about using non-latin text on the command-line and in code. Most command-line tools are ASCII but will pass through non-ASCII characters unchanged. However, most require that you avoid composed latin glyphs. Since Unicode includes single-codepoint versions of all valid latin accented glyphs and these are the default entry methods, this isn't usually a problem but yes, it really constitutes a subset of UTF-8 and you must know about this limitation to avoid bugs.
* you do need to know that the tool passes non-ASCII through unchanged
* the text should not contain composed latin glyphs
* you're on your own if you're trimming strings to byte lengths
I've added the second point to my comment.
It's not about sweeping bugs under the rug at all. It's about using non-latin text on the command-line and in code. Most command-line tools are ASCII but will pass through non-ASCII characters unchanged. However, most require that you avoid composed latin glyphs. Since Unicode includes single-codepoint versions of all valid latin accented glyphs and these are the default entry methods, this isn't usually a problem but yes, it really constitutes a subset of UTF-8 and you must know about this limitation to avoid bugs.