[nos-bbs] JNOS 2.0k memory issue?

Sat Aug 13 00:35:51 EDT 2016

I've been trying to nail down what condition might be causing the hangs with
2.0k.  I discovered what may be a memory problem with JNOS.  This may be why
I haven't noticed the problem on my test machine (12GB) but I did notice on
my production machine (4GB).  Here's what I found:

I have a machine with 4GB of RAM.  I boot it up (without starting JNOS), and
free memory (reported by "free") will stay at about 2.3G for as long as you
want to watch it.  I used "watch free" with the default 2 second refresh
rate.

I start up JNOS and free memory drops to about 1.8G or lower.

As forwarding starts to occur, the amount of free memory drops lower and
lower, and will bottom out at about 100K bytes.  So JNOS is taking about
2.2G of RAM at that point (2.3G - 100K).

When memory goes this low, the JNOS console window becomes unresponsive.
And the status line isn't updated.  For example, as I watch "tail -f
nos.log", I can see new forwarding connections happen, but the BBS call
signs don't appear on the status display.

About the time that certain events happen in nos.log, such as "reintegrating
data [F] from eol issue", free memory will jump back up - not as high as
before, but up to about 1.4G.  This goes on in cycles of lower and higher
memory utilization as different forwarding sessions happen.

If I wait until all forwarding sessions have finished and JNOS is quiescent,
and then use exit 0, my linux free memory is at 1.4G and it will stay there
for as long as you watch it.  (Remember, it was at 2.3G before starting
JNOS!)

If I start up JNOS again, free memory will drop to about 1.0G after JNOS
completes its startup.  If I exit 0, free memory stays at 1.0G for as long
as you want to watch it.

After reboot, free memory is back to 2.3G and will stay there (as long as
JNOS is not started!).  This cycle is repeatable.

This seems to suggest that:

*  JNOS is not returning all memory to the OS.  Each invocation of JNOS
leaves less free memory in the system. 

*  While JNOS is running, certain conditions during forwarding cause JNOS to
consume a large amount of memory and hold it for an extended period until
some other condition happens.  Even then, not all is returned.

Maiko:

To help with troubleshooting, I prepare a little script that will log the
output from "free" to a file with timestamps, so it can be compared
alongside the nos.log.  I captured the two scenarios above -- initial run
(ending with 1.4G free) and a second run (ending on 1.0G Free) - using 2
second time intervals in the free memory log.  Maybe you have development
tools that provide better info.  But if you'd like the data, let me know.

Also, this may not be limited to 2.0k.  I found similar behavior in 2.0j.
However, the condition seems to be exacerbated by compression.  And because
of some of the B2F forwarding errors that were fixed in 2.0k, I was not
using compression in the 4G machines.  Therefore, perhaps the exact same
conditions aren't present in 2.0j.  So maybe that's why 2.0j hasn't
completely hung in the lower-memory units like 2.0k did.  

Michael

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.tapr.org/pipermail/nos-bbs_lists.tapr.org/attachments/20160812/2324b49a/attachment.html>