[aprssig] APRS Message character sets ?

Tapio Sokura oh2kku at iki.fi
Wed Jul 9 22:55:26 EDT 2008


Stephen H. Smith wrote:
> (or equivalent for other TNCs) so that they WILL be transparent to 8-bit 
> characters.  This is required so that any arbitrary character values 
> created by Mic-E encoding will pass.    In fact I have modified the TNC2 
> firmware Ver 1.1.9 to DEFAULT to 8-N-1 bit transparent mode so no init 
> is even required.

Actually mic-e doesn't use the eighth bit anywhere, the bytes it uses 
are all between 28 and 127 decimal. The bytes 28-31 and 127 usually 
cause the problems because they are "unprintable" ASCII.

It's another thing if there's a binary character filter in a TNC that 
filters out binary characters. Those indeed can and do break mic-e, but 
it's not the same as an eighth bit "filter".

> 2)    Since the APRS network infrastructure is heavily based on legacy 
> 1980s-1990s packet hardware that doesn't support 16-bit/character 
> encoding (i.e. UniCode), I wouldn't hold my breath for 16-bit support 
> any time soon.   Not to mention that the most widely used APRS 
> application (UI-View) is now frozen in time and unchangeable.    

Even old packet _hard_ware doesn't care about character sets, it's all 
just bytes (and bits) to it. A typical TNC acting as an APRS digipeater 
could care less what character set is used in the payload of a packet. 
It's just bytes (for KPC3, UIDIGI, etc). If a digipeater looks inside a 
packet payload, then it can be affected by character sets. But UTF-8 is 
100% compatible with ASCII in the byte range 0-127, so I don't see a 
problem there. It's a bit different for software.

One of the beauties of UTF-8 is that if an application doesn't really 
support UTF-8, it typically doesn't fall apart if it sees strings 
encoded in UTF-8. For encoding non-ASCII characters, UTF-8 only uses 
bytes that are over 32 decimal, so no sudden binary characters messing 
up displays where printable ASCII is expected. Sure, a program expecting 
(8-bit) ASCII will display 2-4 garbage characters for each character 
that has a Unicode code point value U+0080 or above, but that's the 
price you have to pay one way or another to go beyond 256 different 
characters in 8-bit byte world. Wrt UI-View, I think we've discussed 
that before and there's nothing new I have to add.

   Tapio




More information about the aprssig mailing list