[aprssig] Please, standardize UTF-8 for APRS (was: Future Concept for APRS)

Mon Sep 21 14:30:15 EDT 2009

I am all for this.  One caution - we need to define max message length  
I'm BYTES not characters!

Also, we may need a way of specifying text direction (right to left or  
vertical or whatever)

On Sep 21, 2009, at 8:24 AM, Heikki Hannikainen <hessu at hes.iki.fi>  
wrote:

> On Mon, 21 Sep 2009, Sergej wrote:
>
>> Will be enough 8bit chars set to include regional subsets.
>> But for Chinese or Japan hams possible UTF-x encoding is better?
>
> UTF-8 would be very, very good for everyone involved (including  
> Sergej), even if your particular character set happens to fit within  
> 8 bits.
>
> I'd like to make it work in a single application (aprs.fi) for  
> everyone without trying to guess whether the message is in Russian,  
> Finnish or Japanese. If UTF-8 would be used (like is typically done  
> in email and web today), there'd be no need to select or guess which  
> character set is used in each message.
>
> This message was sent in the UTF-8 encoding of Unicode. For those of  
> you who have UTF-8 support in your email software (probably almost  
> all of you by now), and the relevant fonts installed (this is what  
> usually fails for Windows users) you should see these strings  
> correctly:
>
> Russian: русский язык
> Traditional Chinese: 漢語
> Japanese: 日本語
> Finnish: Ääliöt ja pölvästit
> French: Français
>
> For those of you who do not have Unicode support, or have not  
> installed the fonts containing symbols for all of those funny  
> characters, you'll probably see little rectangles, gibberish, or  
> strings like "=C3=84=C3=A4=C3=B6=C3=B6=C3=B6" instead of the correct  
> glyphs. But, because UTF-8 is backwards compatible with ASCII, at  
> least you'll see this English text correctly! ASCII characters have  
> the same single-byte values in ASCII and UTF-8.
>
> If some other Unicode encoding, like UTF-16, would be used, the  
> English parts would look really funny, or would not be visible at  
> all. Every other character would be binary zero (or NULL), and every  
> other character would be the ASCII character. It's also a waste of  
> valuable bandwidth when sending ASCII text! Here's a real-world  
> UTF-16 example (the control codes are shown in hex in the raw  
> packets display):
>
> http://aprs.fi/?c=raw&call=JA6VRP-2
>
> There's a good chance that a lot of old software written in C will  
> cut those messages at the first NULL byte, since the string handling  
> functions use it as the "end-of-string" marker.
>
> If UTF-8 would be used on APRS, users would not need to switch  
> between ASCII (backwards compatible for messaging with English- 
> speaking friends) and some other character set (for the non-English  
> messages). It would "just work". It works for the Internet, and it'd  
> work for us.
>
> Links:
>
> http://en.wikipedia.org/wiki/UTF-8
> http://en.wikipedia.org/wiki/UTF-16
>
>  - Hessu
> _______________________________________________
> aprssig mailing list
> aprssig at tapr.org
> https://www.tapr.org/cgi-bin/mailman/listinfo/aprssig