[aprssig] APRS Message character sets ?

Fri Jul 11 20:05:02 EDT 2008

On Thu, 10 Jul 2008, Scott Miller wrote:

>> Even the UTF-16 characters are just pairs of such 8-bit bytes that are
>> considered as encoding for single character of Unicode codespaces.
>
> Someone else mentioned that UTF-8 avoids characters below 32, so the
> 0x0d 0x0a issue I brought up shouldn't be a problem.

   The biggest benefit of using UTF-8 is probably it's backwards 
compatibility. If an user with an Unicode-capable application enables 
UTF-8 messaging, and then sends a message to an UI-View user, using only 
these "normal" letters (in the ASCII character set), the message will be 
shown just fine on UI-View. UTF-8 is equal to ASCII in the 0 to 127 
(decimal) range. No need to flip switches, just type in English and it'll 
be backwards compatible.

   If you type Japanese, or use Finnish special characters, they will 
naturally show up as a couple of characters of gibberish on a non-Unicode 
application. But there's not much we can do about that.

> I can think of no reason why UTF-8 would NOT be the right choice for
> this.  I'd say go ahead and do some end-to-end testing, see what breaks,
> and see what can be fixed.

   I think I'll try to implement UTF-8 message parsing in aprs.fi. There's 
a recode pluging for a popular IRC client (irssi) which can automatically 
detect UTF-8 messages and do appropriate conversions, maybe I could try 
something similar to get multiple charsets right. aprs.fi mostly stores 
and displays data in UTF-8 already (because of the translations the whole 
web site needs to be in Unicode anyway).

   UTF-8 is clearly the most common way to represent Unicode in email and 
web pages anyway, it's become quite a standard encoding. With some 
exceptions, of course - Windows seems to use UTF-16 in it's API, but it 
also provides conversion functions, so it's easy to convert back and 
forth.

   - Hessu