[aprssig] Periodic Disconnects from APRS-IS

Thu Jan 18 23:44:38 EST 2007

> (not delayed) delivery.  Assuming that the minimum buffering 
> time is the
> maximum buffering time is not understanding how the Nagel algorithm
> works and I recommend you review the extensive studies on this subject
> that have been done on protocols similar to APRS.

Nagle's algorithm alone is not the problem here.  Read RFC896 - done right,
it'll result in only one MSS worth of data being buffered.  Last I checked,
that works out to about 1/10 of a second worth of APRS IS data.  The real
problem is when it's combined with delayed ACKs.  See below.

As for the added overhead being negligible, I'm not sure about that.  A
typical APRS packet is what, around 60 bytes?  Call it 80 to be optimistic.
TCP and IP add 40 bytes of overhead to that.  That's a 50% increase.  But
you can fit 18 of those packets in one datagram - with 1440 bytes of data,
that 40 bytes of overhead is now less than a 3% increase.  It's probably not
an issue for most users, but at hubs you're dealing with a significant
number of connections, and a one third increase in traffic is a lot.

But back to Nagle's algorithm.  Here's John Nagle's own description of the
problem:

---

I really should fix the bad interaction between the "Nagle algorithm" and
"delayed ACKs". Both ideas went into TCP around the same time, and the
interaction is terrible. That fixed timer for ACKs is all wrong.

Here's the real problem, and its solution.

The concept behind delayed ACKs is to bet, when receiving some data from the
net, that the local application will send a reply very soon. So there's no
need to send an ACK immediately; the ACK can be piggybacked on the next data
going the other way. If that doesn't happen, after a 500ms delay, an ACK is
sent anyway.

The concept behind the Nagle algorithm is that if the sender is doing very
tiny writes (like single bytes, from Telnet), there's no reason to have more
than one packet outstanding on the connection. This prevents slow links from
choking with huge numbers of outstanding tinygrams.

Both are reasonable. But they interact badly in the case where an
application does two or more small writes to a socket, then waits for a
reply. (X-Windows is notorious for this.) When an application does that, the
first write results in an immediate packet send. The second write is held up
until the first is acknowledged. But because of the delayed ACK strategy,
that acknowledgement is held up for 500ms. This adds 500ms of latency to the
transaction, even on a LAN.

The real problem is that 500ms unconditional delay. (Why 500ms? That was a
reasonable response time for a time-sharing system of the 1980s.) As
mentioned above, delaying an ACK is a bet that the local application will
reply to the data just received. Some apps, like character echo in Telnet
servers, do respond every time. Others, like X-Windows "clients" (really
servers, but X is backwards about this), only reply some of the time.

TCP has no strategy to decide whether it's winning or losing those bets.
That's the real problem.

The right answer is that TCP should keep track of whether delayed ACKs are
"winning" or "losing". A "win" is when, before the 500ms timer runs out, the
application replies. Any needed ACK is then coalesced with the next outgoing
data packet. A "lose" is when the 500ms timer runs out and the delayed ACK
has to be sent anyway. There should be a counter in TCP, incremented on
"wins", and reset to 0 on "loses". Only when the counter exceeds some number
(5 or so), should ACKs be delayed. That would eliminate the problem
automatically, and the need to turn the "Nagle algorithm" on and off.

So that's the proper fix, at the TCP internals level. But I haven't done TCP
internals in years, and really don't want to get back into that. If anyone
is working on TCP internals for Linux today, I can be reached at the e-mail
address above. This really should be fixed, since it's been annoying people
for 20 years and it's not a tough thing to fix.

The user-level solution is to avoid write-write-read sequences on sockets.
write-read-write-read is fine. write-write-write is fine. But
write-write-read is a killer. So, if you can, buffer up your little writes
to TCP and send them all at once. Using the standard UNIX I/O package and
flushing write before each read usually works.

John Nagle

---

Scott
N1VG