[aprssig] Please, standardize UTF-8 for APRS

Stephen H. Smith wa8lmf2 at aol.com
Wed Sep 23 01:31:04 EDT 2009


Heikki Hannikainen wrote:
> On Tue, 22 Sep 2009, Stephen H. Smith wrote:
>
>> Heikki Hannikainen wrote:
>>> I
>
> Something like that could be expected! But at least it doesn't crash, 
> which seems to be the main worry. I would have been very surprised if 
> it did, since utf-8 strings are, for a non-utf-8-enabled device, just 
> strings which the reader won't understand, because they're rendered 
> with the wrong glyphs.
>
> Excellent work, Stephen, thank you!
>

I was fearing that some 8-bit UTF-8 octet might falsely be interpreted 
as a control or escape character in older hardware.   


The other problem is that the strings of raw data can be much longer 
than the displayed text, and may overrun the very limited RAM or buffers 
in devices like the D700.    The real "acid test" would be to create 
some 60-70 character messages in Chinese or Japanese (which would result 
in worst-case long strings around 200 octets), and see if they trash the 
'700. 

An intermediate "torture test" would be a 60-70 character message in 
Vietnamese, which uses far more variations on the Latin alphabet than 
any other Romanized language.   (The pre-Unicode "VISCII" [Vietnamese 
ASCII] required TWO 8-bit 255-char "font files" for each type face, one 
upper-case and one lower-case to accommodate all the variations on the 
Roman alphabet.)   Virtually EVERY word has at one or more letters 
modified with what the Vietnamese call "tone marks" above and/or below 
the letter -- many many more than any of the Scandinavian or East 
European languages.    The result is  that statistically, a Vietnamese 
message should have the highest possible percentage of Latin characters 
expressed in  more than one octet.







More information about the aprssig mailing list