[TangerineSDR] Seeking your Review and Comment on Tangerine SBC Functional Specification

Mon Oct 14 11:22:30 EDT 2019

Rob, I added the distribution list as you suggested. I greatly value your knowledgeable feedback, so please keep it coming.

Regarding Celery: I am not crazy about Celery; I mention it only because it is used in the SatNOGS system.  We have decided that we wish base the PSWS Central Control system/database on SatNOGS (at least for Phase 1), as its functionality and UI seems to overlap so much on our needs. Having said that (since you mention that Celery may not fit the need), we should probably discuss what data is going to be generated, and then seek the right package for managing the uploads.  (As an aside, I have the entire SatNOGS system running on a server; if you like, we can look at it together on a Zoom session, and see what they are actually using Celery for).

On the topic of heterogeneous vs homogeneous data, right now we envision two distinct data types:

1. Raw I/Q data from ring buffer ("Use Case 1").  These data are stored in ring buffer in Digital RF format. On request from Central Control, a selected chunk of data (from start time A to end time B) are to uploaded. Upon receipt, Central Control stores these as Digital RF files, with the file name being generated automatically, and the file name being stored in the database.

2. Pre-processed data.  ("Use Case 3") This case is for users with low bandwidth. An upload occurs about once per second (or could be as low as once per minute). GNURadio samples the bands (normally a few Hz around each WWV station), and does an FFT. The FFT is uploaded (along with magnetometer data) and stored directly in a database table.

(There is a third situation, "Use Case 2," or "firehose," where raw data is streamed directly to a server. Central Control does not support that; it would be received by the institution's server farm or supercomputer).

Use Case 3, best I can tell, matches what SatNOGS does: they pre-process received spectrum into a waterfall, and that's what they upload. We can certainly ask them why they picked Celery to use (for all I know, maybe they regret it). Maybe they are using Celery for something else (i.e., not for uploading). Its complexity definitely increases project risk.

There are numerous other queueing packages. I have used IBM's Websphere MQ (works great, but costs a fortune) and MQTT, both with good success; however, I also gather we need to distinguish between task queueing and data queueing. Maybe there is an apples-to-oranges here.  Anyway , let me know your thoughts.    

-73- Bill, AB4EJ

-----Original Message-----
From: Rob Wiesler <robert.wiesler at case.edu> 
Sent: Saturday, October 12, 2019 6:03 PM
To: Engelke, Bill <bill.engelke at ua.edu>
Subject: Re: Seeking your Review and Comment on Tangerine SBC Functional Specification

Bill, I won't re-add the mailing list to the distribution list without your permission, but in general please keep responses on-list, as it becomes difficult for anyone else to participate otherwise.

On Wed, Oct 9, 2019 at 4:33 PM Engelke, Bill <bill.engelke at ua.edu> wrote:
> Rob - for file backlogs, the plan is to use Celery, 
> (http://www.celeryproject.org/) , this works very well for uploading 
> the large # of files for SatNOGS, and integrates well with Django.  I 
> had been hoping to continue to use that. (I will still research 
> inotify so I know what it can do as well ) -

I highly recommend not adding a dependency on Celery for this particular thing (or, preferably, at all).  Celery is a queue for asynchronous processing of heterogeneous tasks, while what we need is a queue for serial processing of homogeneous data.  We don't want our individual data files to end up in different tasks, because that implies a separate connection to upload each file (versus a single TCP stream where each data file is sent in serial for better throughput).
We want one upload task, and a data queue (not a task queue) sitting in front of it.  If SatNOGs uses Celery, it's probably not involved in the step you're thinking of, and if I'm wrong about that, then that's a giant, fluorescent red flag saying that we shouldn't be following their lead on this.

I'll also point out that Celery is a incredibly complicated library
(>7000 lines without counting dependencies) that introduces a ton of dependencies.  The Celery FAQ gives lame, half-baked excuses for why this doesn't matter, but it definitely does matter (which I could get into, but for now that's outside the scope of this document).  The only upside is that Celery is actually packaged for Debian, so if we did decide to use it (for a purpose to which it's actually suited), those dependencies are somewhat more manageable.

Why do you think that Django integration is relevant?  I wasn't aware that there was any interaction between the store-forward mechanism and a web page running on the SBC.  Even if we want to publish how many stored, unsent data files there are (and how much space they're taking up (In memory? On disk?) and how much space is left), there are significantly better ways to do that.

The first draft of this email had more reasons not to use Celery, because it isn't at all suited as a store-forward system.  Now that I'm awake and thinking clearly, we don't necessarily want to store every single data file to an actual disk in case of power loss before the upload catches up with the backlog (since that's currently a sticking point for the ringbuffer), so those points are moot.
However, I will say that if we overcome the problems associated with the ringbuffer, then Celery is *definitely* not what we want to use.
Note also that inotify still works in a memory-backed tmpfs, so if we wanted to write an in-memory data queue first, then turn it into a on-disk ringbuffer later, that would be a trivial change.