[aprssig] Please, standardize UTF-8 for APRS
Stephen H. Smith
wa8lmf2 at aol.com
Wed Sep 23 01:31:04 EDT 2009
Heikki Hannikainen wrote:
> On Tue, 22 Sep 2009, Stephen H. Smith wrote:
>
>> Heikki Hannikainen wrote:
>>> I
>
> Something like that could be expected! But at least it doesn't crash,
> which seems to be the main worry. I would have been very surprised if
> it did, since utf-8 strings are, for a non-utf-8-enabled device, just
> strings which the reader won't understand, because they're rendered
> with the wrong glyphs.
>
> Excellent work, Stephen, thank you!
>
I was fearing that some 8-bit UTF-8 octet might falsely be interpreted
as a control or escape character in older hardware.
The other problem is that the strings of raw data can be much longer
than the displayed text, and may overrun the very limited RAM or buffers
in devices like the D700. The real "acid test" would be to create
some 60-70 character messages in Chinese or Japanese (which would result
in worst-case long strings around 200 octets), and see if they trash the
'700.
An intermediate "torture test" would be a 60-70 character message in
Vietnamese, which uses far more variations on the Latin alphabet than
any other Romanized language. (The pre-Unicode "VISCII" [Vietnamese
ASCII] required TWO 8-bit 255-char "font files" for each type face, one
upper-case and one lower-case to accommodate all the variations on the
Roman alphabet.) Virtually EVERY word has at one or more letters
modified with what the Vietnamese call "tone marks" above and/or below
the letter -- many many more than any of the Scandinavian or East
European languages. The result is that statistically, a Vietnamese
message should have the highest possible percentage of Latin characters
expressed in more than one octet.
More information about the aprssig
mailing list