<div dir="auto">It sounds like you we’re having the same problem as I was.</div><div dir="auto">Only in my case jnos might run for 15 minutes. I too couldn’t find the problem. Then I happened to notice that the three links I had to Quebec were mysteriously causing it to crash. For whatever reason the owner of them was having everything run through his bpq and as a consequence when I tried to get to his jnos I couldn’t because it was hiding behind bpq and I told him about it. His jnos was trying to communicate directly to me. I don’t think that was the only case but it certainly was a contributing factor. So i dropped all three links and have had minimal issues since. I told him once he fixed his problems I’ll allow the links again. I have yet to hear from him.  I noticed a couple of weeks ago his bpq is using the forwarding to me  but so far okay.</div><div dir="auto">73, Don</div><div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Fri, Dec 9, 2022 at 11:11 AM <<a href="mailto:maiko@pcsinternet.ca">maiko@pcsinternet.ca</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Good day,<br>

<br>

Shaking my head I am, crashes have mysteriously disappeared, and<br>

so far my uptime has been 11 days and counting. The crash dumps<br>

tell me where the crash occurs, but it make no sense.<br>

<br>

Perhaps a link to another system that came and went over the<br>

past few months ? perhaps malformed netrom packets, or netrom<br>

code not dealing properly with a netrom feature that is rarely<br>

used (IP over NETROM for example), could be stack related, this<br>

one is a doozy, welcome to the world of software development.<br>

<br>

So now when you want a crash, you don't get one ...<br>

<br>

Maiko / VE4KLM<br>

<br>

On 2022-11-19 11:23, Maiko (Personal) wrote:<br>

> Nothing like jinxing yourself, forget it, it gets fixed when<br>

> it gets fixed. No idea now as to why, could be newer compiler<br>

> for all I know (I did upgrade my OS a while ago), so perhaps<br>

> it's exposing something in the JNOS code, sigh, not the 1rst<br>

> time it's happened.<br>

> <br>

> Last post on this, sorry for filling up your mailboxes :]<br>

> <br>

> M<br>

> <br>

> On 2022-11-18 1:42 p.m., <a href="mailto:maiko@pcsinternet.ca" target="_blank">maiko@pcsinternet.ca</a> wrote:<br>

>> Interesting enough the crashing 'seems' to have stopped.<br>

>> <br>

>> All of this started a while ago after I added a new wormhole<br>

>> to another system (FBB over BPQ netrom), but that system seems<br>

>> to have suddenly disappeared from the ether. I think they are<br>

>> having amprnet connectivity issues, so this will be much more<br>

>> difficult to track down now as I don't have a source to figure<br>

>> this out with. I am trying to track down the version of BPQ,<br>

>> hoping it will help me figure out what to do on my end.<br>

>> <br>

>> There is something about the netrom traffic or states that is<br>

>> causing JNOS to crash in the NR4 level code, but I have yet to<br>

>> figure it out ... it's very confusing what is going on ...<br>

>> <br>

>> Maiko / VE4KLM<br>

>> <br>

>> On 2022-11-13 11:12, Maiko (Personal) wrote:<br>

>>> Okay, last one for now, and learning as I go ...<br>

>>> <br>

>>> Perhaps I need to set the NR4TDISC a lot lower (default) ?<br>

>>> <br>

>>>    jnos> netrom tdisc<br>

>>>    NR4 redundancy timer (sec): 120<br>

>>> <br>

>>> Experiences anyone ? But still, even with a smaller timeout value,<br>

>>> there is a 'risk' of a crash, making me think the current way of<br>

>>> doing a circuit table lookup and reusing entries, seems not be<br>

>>> the brightest way of doing it ? thinking a 'rewrite', ugh, no.<br>

>>> <br>

>>> Maiko / VE4KLM<br>

>>> <br>

>>> On 2022-11-13 10:47 a.m., Maiko (Personal) wrote:<br>

>>>> I am guessing (hopefully this shows up in my debugs) ...<br>

>>>> <br>

>>>> IF the local side requests a netrom layer 4 disconnect, then JNOS<br>

>>>> should probably free the callback there and then, instead of waiting<br>

>>>> for the final disconnect (which may not get to us). I figure it <br>

>>>> would<br>

>>>> not hurt to remove it at that point, since effectively it is done.<br>

>>>> <br>

>>>> I could put in a timer based garbage collection, but I think it's<br>

>>>> best to get rid of the callback data ASAP or else it will crash.<br>

>>>> <br>

>>>> Anyways ...<br>

>>>> <br>

>>>> Maiko / VE4KLM<br>

>>>> <br>

>>>> On 2022-11-13 10:37 a.m., Maiko (Personal) wrote:<br>

>>>>> Good morning,<br>

>>>>> <br>

>>>>> Slightly technical post ...<br>

>>>>> <br>

>>>>> This has been driving me nuts the past few months, it just seems<br>

>>>>> to have started, perhaps because I took on a new netrom neighbour<br>

>>>>> or two, I just don't know, but I think I know the reasons for all<br>

>>>>> the crashes. After a few days of inserting some very heavy debugs<br>

>>>>> into the code, this is where I am at this morning :<br>

>>>>> <br>

>>>>> JNOS keeps a table of netrom callbacks, the default is 20. When a<br>

>>>>> new connection happens, it gets put into the table, and when it's<br>

>>>>> done with, it is supposed to be removed from the table. However,<br>

>>>>> this removal is ONLY DONE when the state of the connection becomes<br>

>>>>> disconnected. What is happening, is that it appears the entry in<br>

>>>>> the table for a specific connection looks valid, but in fact it<br>

>>>>> has disappeared, but JNOS did not remove it, so crash !!!<br>

>>>>> <br>

>>>>> What this suggests to me is that I did not get the final NETROM<br>

>>>>> disconnected, so JNOS still thinks the callback data is valid, but<br>

>>>>> in fact it is not, the memory has disappeared, so what happens is<br>

>>>>> you get every few days a crash in the nr4subr.c functions, like :<br>

>>>>> <br>

>>>>>    Program received signal SIGSEGV, Segmentation fault.<br>

>>>>>    0x000000000047fdd9 in match_n4circ (index=23, id=71, <br>

>>>>> user=0x2081457<br>

>>>>>    "\236\226d\240\212\234b\236\226d\240\212\234b", node=0x208145e<br>

>>>>>    "\236\226d\240\212\234b") at nr4subr.c:138<br>

>>>>>     138  if ((int)(cb->yournum) == index && (int)(cb->yourid) == id<br>

>>>>> <br>

>>>>> AND<br>

>>>>> <br>

>>>>>    Program received signal SIGSEGV, Segmentation fault.<br>

>>>>>    0x00007f96411f9780 in __memcmp_avx2_movbe () from <br>

>>>>> /lib64/libc.so.6<br>

>>>>>    (gdb) where<br>

>>>>>    #0  0x00007f96411f9780 in __memcmp_avx2_movbe () from <br>

>>>>> /lib64/libc.so.6<br>

>>>>>    #1  0x0000000000482727 in nrresetlinks (rp=0x22c5550) at <br>

>>>>> nr3.c:1441<br>

>>>>>    #2  0x000000000047ca22 in doobsotick () at nrcmd.c:1316<br>

>>>>> <br>

>>>>> It is very consistent, so I am running into cases where I am not <br>

>>>>> getting<br>

>>>>> the final netrom layer 4 disconnect, so the callback remains, but <br>

>>>>> JNOS<br>

>>>>> needs to loop through the whole circuit table to find valid ones to <br>

>>>>> match up with, and this invalid one just happens to still be in the<br>

>>>>> table and kablewee :]<br>

>>>>> <br>

>>>>> Anyways, I hope to have a fix of sorts for this 'soon', very <br>

>>>>> frustrating. But again, why has this suddenly started happening<br>

>>>>> at the frequency it has for the past 3 months, possibly more ?<br>

>>>>> <br>

>>>>> Jack, this is probably what you are experiencing as well.<br>

>>>>> <br>

>>>>> Maiko / VE4KLM<br>

>>>>> <br>

<br>

_______________________________________________<br>

nos-bbs mailing list<br>

<a href="mailto:nos-bbs@lists.tapr.org" target="_blank">nos-bbs@lists.tapr.org</a><br>

<a href="http://lists.tapr.org/mailman/listinfo/nos-bbs_lists.tapr.org" rel="noreferrer" target="_blank">http://lists.tapr.org/mailman/listinfo/nos-bbs_lists.tapr.org</a><br>

</blockquote></div></div>-- <br><div dir="ltr" class="gmail_signature" data-smartmail="gmail_signature">Regards,<br>Don</div>