[nos-bbs] Analysis of recent frequent netrom related crashes

Maiko (Personal) maiko at pcsinternet.ca
Sun Nov 13 11:37:33 EST 2022


Good morning,

Slightly technical post ...

This has been driving me nuts the past few months, it just seems
to have started, perhaps because I took on a new netrom neighbour
or two, I just don't know, but I think I know the reasons for all
the crashes. After a few days of inserting some very heavy debugs
into the code, this is where I am at this morning :

JNOS keeps a table of netrom callbacks, the default is 20. When a
new connection happens, it gets put into the table, and when it's
done with, it is supposed to be removed from the table. However,
this removal is ONLY DONE when the state of the connection becomes
disconnected. What is happening, is that it appears the entry in
the table for a specific connection looks valid, but in fact it
has disappeared, but JNOS did not remove it, so crash !!!

What this suggests to me is that I did not get the final NETROM
disconnected, so JNOS still thinks the callback data is valid, but
in fact it is not, the memory has disappeared, so what happens is
you get every few days a crash in the nr4subr.c functions, like :

   Program received signal SIGSEGV, Segmentation fault.
   0x000000000047fdd9 in match_n4circ (index=23, id=71, user=0x2081457
   "\236\226d\240\212\234b\236\226d\240\212\234b", node=0x208145e
   "\236\226d\240\212\234b") at nr4subr.c:138
    138  if ((int)(cb->yournum) == index && (int)(cb->yourid) == id

AND

   Program received signal SIGSEGV, Segmentation fault.
   0x00007f96411f9780 in __memcmp_avx2_movbe () from /lib64/libc.so.6
   (gdb) where
   #0  0x00007f96411f9780 in __memcmp_avx2_movbe () from /lib64/libc.so.6
   #1  0x0000000000482727 in nrresetlinks (rp=0x22c5550) at nr3.c:1441
   #2  0x000000000047ca22 in doobsotick () at nrcmd.c:1316

It is very consistent, so I am running into cases where I am not getting
the final netrom layer 4 disconnect, so the callback remains, but JNOS
needs to loop through the whole circuit table to find valid ones to 
match up with, and this invalid one just happens to still be in the
table and kablewee :]

Anyways, I hope to have a fix of sorts for this 'soon', very 
frustrating. But again, why has this suddenly started happening
at the frequency it has for the past 3 months, possibly more ?

Jack, this is probably what you are experiencing as well.

Maiko / VE4KLM




More information about the nos-bbs mailing list