[aprssig] Please, standardize UTF-8 for APRS (was: Future Concept for APRS)
Joel Maslak - N7XUC
jmaslak-aprs at antelope.net
Mon Sep 21 14:30:15 EDT 2009
I am all for this. One caution - we need to define max message length
I'm BYTES not characters!
Also, we may need a way of specifying text direction (right to left or
vertical or whatever)
On Sep 21, 2009, at 8:24 AM, Heikki Hannikainen <hessu at hes.iki.fi>
wrote:
> On Mon, 21 Sep 2009, Sergej wrote:
>
>> Will be enough 8bit chars set to include regional subsets.
>> But for Chinese or Japan hams possible UTF-x encoding is better?
>
> UTF-8 would be very, very good for everyone involved (including
> Sergej), even if your particular character set happens to fit within
> 8 bits.
>
> I'd like to make it work in a single application (aprs.fi) for
> everyone without trying to guess whether the message is in Russian,
> Finnish or Japanese. If UTF-8 would be used (like is typically done
> in email and web today), there'd be no need to select or guess which
> character set is used in each message.
>
> This message was sent in the UTF-8 encoding of Unicode. For those of
> you who have UTF-8 support in your email software (probably almost
> all of you by now), and the relevant fonts installed (this is what
> usually fails for Windows users) you should see these strings
> correctly:
>
> Russian: русский язык
> Traditional Chinese: 漢語
> Japanese: 日本語
> Finnish: Ääliöt ja pölvästit
> French: Français
>
> For those of you who do not have Unicode support, or have not
> installed the fonts containing symbols for all of those funny
> characters, you'll probably see little rectangles, gibberish, or
> strings like "=C3=84=C3=A4=C3=B6=C3=B6=C3=B6" instead of the correct
> glyphs. But, because UTF-8 is backwards compatible with ASCII, at
> least you'll see this English text correctly! ASCII characters have
> the same single-byte values in ASCII and UTF-8.
>
> If some other Unicode encoding, like UTF-16, would be used, the
> English parts would look really funny, or would not be visible at
> all. Every other character would be binary zero (or NULL), and every
> other character would be the ASCII character. It's also a waste of
> valuable bandwidth when sending ASCII text! Here's a real-world
> UTF-16 example (the control codes are shown in hex in the raw
> packets display):
>
> http://aprs.fi/?c=raw&call=JA6VRP-2
>
> There's a good chance that a lot of old software written in C will
> cut those messages at the first NULL byte, since the string handling
> functions use it as the "end-of-string" marker.
>
> If UTF-8 would be used on APRS, users would not need to switch
> between ASCII (backwards compatible for messaging with English-
> speaking friends) and some other character set (for the non-English
> messages). It would "just work". It works for the Internet, and it'd
> work for us.
>
> Links:
>
> http://en.wikipedia.org/wiki/UTF-8
> http://en.wikipedia.org/wiki/UTF-16
>
> - Hessu
> _______________________________________________
> aprssig mailing list
> aprssig at tapr.org
> https://www.tapr.org/cgi-bin/mailman/listinfo/aprssig
More information about the aprssig
mailing list