[aprssig] APRS Message character sets ?
Heikki Hannikainen
hessu at hes.iki.fi
Fri Jul 11 20:05:02 EDT 2008
On Thu, 10 Jul 2008, Scott Miller wrote:
>> Even the UTF-16 characters are just pairs of such 8-bit bytes that are
>> considered as encoding for single character of Unicode codespaces.
>
> Someone else mentioned that UTF-8 avoids characters below 32, so the
> 0x0d 0x0a issue I brought up shouldn't be a problem.
The biggest benefit of using UTF-8 is probably it's backwards
compatibility. If an user with an Unicode-capable application enables
UTF-8 messaging, and then sends a message to an UI-View user, using only
these "normal" letters (in the ASCII character set), the message will be
shown just fine on UI-View. UTF-8 is equal to ASCII in the 0 to 127
(decimal) range. No need to flip switches, just type in English and it'll
be backwards compatible.
If you type Japanese, or use Finnish special characters, they will
naturally show up as a couple of characters of gibberish on a non-Unicode
application. But there's not much we can do about that.
> I can think of no reason why UTF-8 would NOT be the right choice for
> this. I'd say go ahead and do some end-to-end testing, see what breaks,
> and see what can be fixed.
I think I'll try to implement UTF-8 message parsing in aprs.fi. There's
a recode pluging for a popular IRC client (irssi) which can automatically
detect UTF-8 messages and do appropriate conversions, maybe I could try
something similar to get multiple charsets right. aprs.fi mostly stores
and displays data in UTF-8 already (because of the translations the whole
web site needs to be in Unicode anyway).
UTF-8 is clearly the most common way to represent Unicode in email and
web pages anyway, it's become quite a standard encoding. With some
exceptions, of course - Windows seems to use UTF-16 in it's API, but it
also provides conversion functions, so it's easy to convert back and
forth.
- Hessu
More information about the aprssig
mailing list