[SDL] SDL 1.3: UTF-8 vs UTF-16 vs UTF-32?

Simon Roby simon.roby at gmail.com
Fri Feb 3 15:22:13 PST 2006


On 2/3/06, Christophe Cavalaria <chris.cavalaria at free.fr> wrote:
> UTF-8/UTF-16/UTF-32 is not Unicode. It's a memory representation encoding :)
>
> When returning a single char, just return the associated codepoint. A
> codepoint is just a simple number after all. Encode it in a 32bit value
> since you need that to represent all the Unicode codepoints. After that,
> the user will have to convert the result to a string format ( UTF-8 char *,
> UTF-16LE wchar_t * for those using it etc ... ) himself as needed. That
> way, you avoid all endianess issues as long as the user doesn't just memory
> copy the result into a string ( which is bad anyway )

I strongly agree with Christophe. Unicode Translation Formats are
meant for encoding strings, not single characters. The returned
character should be a simple 32-bit integer, not encoded in any way.
If he needs to, the developper can easily translate it himself (it's
easy, really) to whatever UTF (or UCS) encoding he requires (or if
he's lazy, he can simply drop the upper 24-bits and use it as if it
were latin1). Everything else is too high-level for SDL.
--
- SR


More information about the SDL mailing list