[nos-bbs] Analysis of recent frequent netrom related crashes

Maiko (Personal) maiko at pcsinternet.ca
Sun Nov 13 11:47:29 EST 2022


I am guessing (hopefully this shows up in my debugs) ...

IF the local side requests a netrom layer 4 disconnect, then JNOS
should probably free the callback there and then, instead of waiting
for the final disconnect (which may not get to us). I figure it would
not hurt to remove it at that point, since effectively it is done.

I could put in a timer based garbage collection, but I think it's
best to get rid of the callback data ASAP or else it will crash.

Anyways ...

Maiko / VE4KLM

On 2022-11-13 10:37 a.m., Maiko (Personal) wrote:
> Good morning,
> 
> Slightly technical post ...
> 
> This has been driving me nuts the past few months, it just seems
> to have started, perhaps because I took on a new netrom neighbour
> or two, I just don't know, but I think I know the reasons for all
> the crashes. After a few days of inserting some very heavy debugs
> into the code, this is where I am at this morning :
> 
> JNOS keeps a table of netrom callbacks, the default is 20. When a
> new connection happens, it gets put into the table, and when it's
> done with, it is supposed to be removed from the table. However,
> this removal is ONLY DONE when the state of the connection becomes
> disconnected. What is happening, is that it appears the entry in
> the table for a specific connection looks valid, but in fact it
> has disappeared, but JNOS did not remove it, so crash !!!
> 
> What this suggests to me is that I did not get the final NETROM
> disconnected, so JNOS still thinks the callback data is valid, but
> in fact it is not, the memory has disappeared, so what happens is
> you get every few days a crash in the nr4subr.c functions, like :
> 
>    Program received signal SIGSEGV, Segmentation fault.
>    0x000000000047fdd9 in match_n4circ (index=23, id=71, user=0x2081457
>    "\236\226d\240\212\234b\236\226d\240\212\234b", node=0x208145e
>    "\236\226d\240\212\234b") at nr4subr.c:138
>     138  if ((int)(cb->yournum) == index && (int)(cb->yourid) == id
> 
> AND
> 
>    Program received signal SIGSEGV, Segmentation fault.
>    0x00007f96411f9780 in __memcmp_avx2_movbe () from /lib64/libc.so.6
>    (gdb) where
>    #0  0x00007f96411f9780 in __memcmp_avx2_movbe () from /lib64/libc.so.6
>    #1  0x0000000000482727 in nrresetlinks (rp=0x22c5550) at nr3.c:1441
>    #2  0x000000000047ca22 in doobsotick () at nrcmd.c:1316
> 
> It is very consistent, so I am running into cases where I am not getting
> the final netrom layer 4 disconnect, so the callback remains, but JNOS
> needs to loop through the whole circuit table to find valid ones to 
> match up with, and this invalid one just happens to still be in the
> table and kablewee :]
> 
> Anyways, I hope to have a fix of sorts for this 'soon', very 
> frustrating. But again, why has this suddenly started happening
> at the frequency it has for the past 3 months, possibly more ?
> 
> Jack, this is probably what you are experiencing as well.
> 
> Maiko / VE4KLM
> 



More information about the nos-bbs mailing list