[TangerineSDR] Updated Detailed Design Specification for PSWS Local Host (v0.1)

Wed Nov 20 12:25:58 EST 2019

On Wed, Nov 20, 2019 at 16:38:36 +0000, Engelke, Bill wrote:
> Rob - OK, I have a handle on what to do for system updating. I will
> take out references to doing any of this using Central Control.

Well, Central Control can still initiate an unscheduled update.  I
imagine we'll want to be able to do that, if only for testing purposes.

> - I'd like your recommendation on precisely how to handle the
>   communication between the LH and the DE. The plan is to use sockets,
>   but I need some more detail on that.
>
>   1.  Let's say we use TCP. We need an approach that handles making
>       connections, dealing with lost connections, ensuring
>       communications get thru & recover from errors, thread safety,
>       etc. I have seen packages such as Zeromq<https://zeromq.org/>
>       and things like it, but those would require that the DE use an
>       exactly compatible method (implemented in FPGA, I guess) -
>       probably not possible, maybe not even desirable. What would you
>       recommend?

Just plain TCP.  Managing connections isn't that hard, and thread safety
is pretty easy (wherever possible, I tend to use a single-threaded
process consisting of a poll(2) or select(2) loop, which alleviates any
potential locking I'd have to do otherwise).

I would recommend not adding any more complexity on top of TCP by way of
library proliferation.  In particular, I'd *strongly* recommend against
using ZeroMQ for *anything* - it has its uses, but most of those are
within datacenters where server processes are not expected to stop and
restart, say, in the middle of a connection attempt (*).  My experience
trying to use ZeroMQ as a way to manage connections and reconnections at
work is that trying to make ZeroMQ manage connections and reconnections
properly is an excercise in futility, race conditions, and locked
processes.  ZeroMQ's strength lies in simplifying control loops, and
it's biggest weakness is connection management.  We gave up on ZMQ and
went back to using Unix domain sockets, which took me an afternoon, plus
a week of automated regression testing.  Now our control messages don't
lock up for stupid reasons, and our commandline control remotes don't
take seconds to import ZMQ (on an armhf board not dissimilar to the
BeagleBone Black) before sending their commands.

>   2.  Scotty has said he want to use UDP as much as possible. To me,
>       this makes sense for sending acquired data, but for commands, I
>       don't know.  Would UDP work for everything?  If so, do you know
>       of something we can use that is an open source package, or do we
>       need to write everything from scratch? Don't want to re-invent,
>       etc.  Any thoughts?

If we use UDP, we'll have to reimplement large parts of TCP.  I think I
said this at DCC, but TCP should work fine for us.  Since we have a
reliable wired connection, we won't get caught up in retransmission, so
the biggest reason to favor UDP over TCP is already not a thing.  With
TCP, we'll get proper in-order receives, which is always a better thing
than it looks like (for example, when sending data with TCP, we can
timestamp the first sample, and then the rest follows, but with UDP,
we'd have to timestamp every single packet).  If nothing else, any
command stream absolutely must use TCP instead of UDP (ordering really
matters).

*:
For those who are curious, ZMQ queues up connections automatically, and
orders receives and sends according to a number of specific lockstep
communications models (for instance, REQ/RSP lets the REQ side send() to
the RSP side, then the next recv() command will specifically return the
response to the send(), and the RSP side is required to recv() first,
and then send(), which routes the response back to the peer that was
recv()d from).  This means that there's always some connection-handling
logic going on in the background, where ZMQ is accepting connections
from clients, maybe even accepting their requests, before their turn
comes around and they get a response.  We found that ZMQ (maybe this is
fixed in newer versions?) doesn't bother cleaning up after these
pre-accepted clients (and is incapable of prohibiting them), and the
server code that uses ZMQ is not able to enumerate these clients and
terminate their connections at shutdown (which you'd think would be the
responsibility of ZMQ, but apparently not).