Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

UCS-2 uses a fixed number of (16-bit) code units to represent a Unicode scalar value (code point). Of course, to represent a grapheme cluster, more than one code point may be needed, but that's true of Unicode in general.


> that's true of Unicode in general.

Yes, that was rather my point: if you're using a Unicode-based character encoding, you're going to have variable-width characters regardless, so you might as well use UTF-8.

> UCS-2 uses a fixed number of (16-bit) code units to represent a Unicode scalar value (code point).

Sure, but that's a implementaion detail of the mapping from characters (at the application level) to bytes (at the physical(-ish) representation level).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: