[TangerineSDR] Filename structure, Node info contents, other stuff to ponder

Ryan Volz ryan.volz at gmail.com
Thu May 7 15:26:10 EDT 2020


Hi all,

I'm an outside observer, but with the discussion having now touched on an HDF5 format, I think it might be helpful to point toward the NetCDF Climate and Forecast conventions:

http://cfconventions.org/cf-conventions/cf-conventions.html

I don't know that it exactly fits, but it provides a standard format for geoscience data stored in NetCDF files, which in the modern case is a strict subset of HDF5. It's called "Climate and Forecast", but the format pretty adaptable to any data where geodetic coordinates are important (I use it for meteor data, for example). It's also nicely integrated into the 'xarray' Python package for working with labeled/indexed multidimensional array data:

https://xarray.pydata.org/en/stable/weather-climate.html

There would still be work to do defining how exactly to fit PSWS data into this convention, but by no means would you have to throw out what's already been done.

Cheers,
Ryan

On 5/7/20 2:38 PM, Gerry Creager - NOAA Affiliate via TangerineSDR wrote:
> Aidan
> 
> Sure, but this is likely to get me into lecture mode, and to be honest, I am not as current as I used to be on the subject. I've been off the OGC technical committee for 8 years now.
> 
> SensorML will be the official topic, as StarFishFL is a subset, for the most part, and designed to remove a lot of cruft from SensorML.
> SensorML was designed to provide a comprehensive set of metadata about a sensor, AND about the data it can collect. The format is XML, and there should, by now. be an accepted schema to serve as a template. Not all the fields are required, and metadata can be inherited as needed. In the original vision, SensorML was designed to describe a satellite sensor platform in terms of orbital parameters and attitude parametrics, as well as describing the sensor characteristics (e.g., for a line-scanner, scan and dwell rate, nadir angle, etc).
> 
> The elements that should be attractive are the concept of inheritance. I'm not a fan of overloading a filename with all the metadata possible. Also, while it's wordy, using a schema to create a metadata XML config just makes sense. Passing the metadata periodically (once/day) and/or when it changed is, to me, a more reasonable way to handle this. I'm also a fan of streaming the data as it comes in. I can tell you now that the lightning data will have to talk to a central server in real time to trilaterate the CG strike information.
> 
> Since the data files are apparently intended to come to a central server anyway, streaming the data using atcp, udp or multicast transport would be straightforward using several extant protocols, or we could reinvent the wheel. Similarly, poorly connected sites could send their file data in as able, daily, etc.
> 
> While we're on the subject of data files, I've got to ask the obvious question to me (although a different note from Phil triggered this question), which is, why store the data in csv? Wouldn't compressed HDF5 make more sense in the long term? And, if we do that, and the HDF5 file is appropriately created and populated, the metadata for each observation is inherent to the HDF5 file format.
> 
> What I'm suggesting is that we're spending a lot of effort to create both a data file format and a naming convention for sensor data when there's already a standard out there. It's a little obscure unless you've ventured out into the world of geospatial or geosciences world. I just think that looking at the standards available to us already might be a better use of time.
> 
> I can expand further, but I'm multitasking right now with a Cray storage problem. If I need to go further, please let me know.
> 
> gerry
> 
> On Thu, May 7, 2020 at 4:41 PM Aidan Montare <aam141 at case.edu <mailto:aam141 at case.edu>> wrote:
> 
>     Gerry,
> 
>     Could you provide an overview of the features of SensorML and related systems that are relevant to our discussion, or the parts that you want us to pay attention to? I took a look at the wikipedia pages for those topics, but there's a lot that I don't understand, so having a guide would be helpful.
> 
> 
> 
>     On Wed, May 6, 2020 at 11:40 AM Gerry Creager - NOAA Affiliate via TangerineSDR <tangerinesdr at lists.tapr.org <mailto:tangerinesdr at lists.tapr.org>> wrote:
> 
>         I'm coming into this a bit late but: Maintaining so much of the metadata in the file name is both attractive and painful. Allowing you to see who's sending, just by looking at the file name is the obvious attractive element. The potential for corruption of that element of metadata is a potential pitfall I've seen in meteorological data even from automated systems. In addition, I've evolved to a point where I prefer to store the metadata in a database (I note the reference to a central db) or periodic (e.g., daily) transmission of a "current-metadata" update with a lot of the info, e.g. grid square, PII, etc. that can be checked against the database.
> 
>         I'll also suggest a quick review of the Open Geospatial Consortium's work on SensorML (overkill for this project, but instructive) and Starfish Fungus ML, a simplification and streamlining of SensorML. These were designed specifically for overhead imagery system, but quickly tweaked for a variety of other sensors.
>         73
>         Gerry N5JXS
> 
>         On Wed, May 6, 2020 at 3:00 PM Phil Erickson via TangerineSDR <tangerinesdr at lists.tapr.org <mailto:tangerinesdr at lists.tapr.org>> wrote:
> 
>             Hi John,
> 
>                That works for "fldigi" - but it seems to me that the discussion here is more general and should not be tied to a particular program behavior.  We have had totally disastrous results in reopening a file and adding to it - including fseek(0) due to program error and blasting away all the earlier data.  File namespace is cheap so I guess I don't get the desire to conserve it these days.
> 
>             ---- Phil
> 
> 
>             On Wed, May 6, 2020 at 10:56 AM John Gibbons <jcg66 at case.edu <mailto:jcg66 at case.edu>> wrote:
> 
>                 Phil,
> 
>                 The issue of restarting anytime during the day was one of the major modifications I made to the program.  It just starts up and continues where it left off.  Same file, same date, etc.
> 
>                 So it's not a problem.  People will also stop the data collection to use their ham stations for their normal operations, so this had to be addressed.
> 
>                 John N8OBJ
> 
>                 John C. Gibbons
>                 Director - Sears Undergraduate Design Laboratory
>                 Dept. of Electrical Engineering and Computer Science
>                 Case Western Reserve University
>                 10900 Euclid Ave, Glennan 314
>                 Cleveland, Ohio  44106-7071
>                 Phone (216) 368-2816 <tel:216-368-2816> FAX (216) 368-6888 <tel:216-368-6888>
>                 E-mail: jcg66 at case.edu <mailto:jcg66 at case.edu>
> 
> 
> 
>                 On Tue, May 5, 2020 at 11:01 PM Phil Erickson <phil.erickson at gmail.com <mailto:phil.erickson at gmail.com>> wrote:
> 
>                     Hi John,
> 
>                        Your files only have the YYYY-MM-DD form of the ISO date in them.  This seems to imply that there can be only one file per day.  What if the instrument dies for a while and then restarts on the same day?  I guess I'm wondering why you wouldn't use the fully qualified ISO (e.g. 2020-05-05T03:00:00).  Maybe I'm missing something obvious.
> 
>                     Cheers
>                     Phil
> 
>                     On Tue, May 5, 2020 at 10:05 PM John Gibbons via TangerineSDR <tangerinesdr at lists.tapr.org <mailto:tangerinesdr at lists.tapr.org>> wrote:
> 
>                         Rob,
> 
>                         Thank You for the feedback!
> 
>                         Yes, the ISO date in the filename is for that UTC days data - will mod the doc to reflect that.
> 
>                         I stayed away from . and - in the filename as Windows users would get into trouble here (I stuck to just the _ char) and I think we have to cater to that limitation of Windows 7/10/whatever.
> 
>                         For the RasPi OS (Linux), I use a system call to define ~ which is the BASE of the user's directory structure (root was a BAD choice here - not intended to confuse it with root user)
> 
>                         ALL Node numbers are real nodes (except N00000) like the rest of the higher numbered ones - I just allocated 1-99 for the development team(s) use to help set them apart visually
> 
>                         I originally defined nodes 1-49 for the low cost PSWS and 51-99 for the high cost PSWS, with 50 being the test case for the high cost PSWS. I may throw that back in and see what shakes out.
> 
>                         This will be available online when we get it into version control.
> 
>                         Thanks for your help and I will send out v0.03 shortly.
> 
>                         John N8OBJ
> 
> 
> 
> 
> 
> 
>                         John C. Gibbons
>                         Director - Sears Undergraduate Design Laboratory
>                         Dept. of Electrical Engineering and Computer Science
>                         Case Western Reserve University
>                         10900 Euclid Ave, Glennan 314
>                         Cleveland, Ohio  44106-7071
>                         Phone (216) 368-2816 <tel:216-368-2816> FAX (216) 368-6888 <tel:216-368-6888>
>                         E-mail: jcg66 at case.edu <mailto:jcg66 at case.edu>
> 
> 
> 
>                         On Tue, May 5, 2020 at 8:38 PM Rob Wiesler via TangerineSDR <tangerinesdr at lists.tapr.org <mailto:tangerinesdr at lists.tapr.org>> wrote:
> 
>                             (David is probably on the TangerineSDR list, but I don't know that for
>                             sure, so he may get an extra copy of this message (sorry).)
> 
>                             On Tue, May 05, 2020 at 19:06:38 -0400, John Gibbons via TangerineSDR wrote:
>                              > This is intended as a starting point to generate input for further
>                              > refinement, so comments are welcome and encouraged.
> 
>                             I like the base filename structure enough.  In particular, I like that
>                             it's sortable by date in any sane locale, and both the date and the node
>                             ID will align vertically (until 6 chars stops being enough for node IDs,
>                             or we start worrying about the Y10K problem).
> 
>                             Does the date in the filename refer to:
>                             - the beginning of the record, or
>                             - the end of the record, or
>                             - a single day in its entirety, or
>                             - at most a single day, but no (significant) part of any other day?
> 
>                             It's probably obvious to everyone that a "day" is a UTC day, but it's
>                             not in the specification, and it wouldn't hurt to add.
> 
>                             I agree that the zero node ought to be set aside.
> 
>                             We should use another letter or two for the "testing" and high/low-cost
>                             PSWS bits instead of allocating node IDs within specific ranges.  How
>                             about we (where X is 0-9 and * is any (possibly empty) sequence of A-Z):
>                             - use either NXXXXXL or LXXXXX for low-cost  PSWS nodes
>                             - use either NXXXXXH or HXXXXX for high-cost PSWS nodes
>                             - set aside  N00000*, L00000*, and/or H00000* as a test nodes with
>                                invalid data and/or for other purposes
>                             - not explicitly denote valid data from testing systems in the filename
> 
>                             A couple questions to answer on that subject:
>                             - What makes testing systems with valid data more/less important/notable
>                                than other systems?
>                             - Can a testing system migrate to a production system without changing
>                                its node ID?
>                             - Is it sufficient that testing systems will necessarily have low node
>                                IDs in most cases?
> 
>                             Let's at least specify WWVdata/ as existing relative to "the user"'s
>                             home directory, instead of /home/pi (it can't hurt to be explicit).
>                             Also, you have a typo, where you say that "~" is the "root filesystem
>                             for user", which is not a thing (you mean "home directory").
> 
>                             Is there a reason you're avoiding a second '.' in the filename?  It's a
>                             little awkward to use 2p5 for 2.5, and that second period isn't going to
>                             confuse any software or upend the sorting.
> 
>                              > We should probably create a mechanism for additions / refinements to this
>                              > document for further work rather than this email thread.
> 
>                             We can always have both :)
> 
>                              > I have the original .doc that created this - let me how we should handle
>                              > version control from here.  (Nathaniel?)
> 
>                             Please turn this into a plain text file.  I can read PDFs, but it's not
>                             ideal, and I'm getting sick and tired of specification documents in
>                             other non-textual formats.  A plain text specification file has these
>                             properties:
> 
>                             - Small file size (because this is 2020 and it totally matters)
>                             - Less wasted visual space when the document isn't all that long (again,
>                                wishlist-grade)
>                             - Universally readable
>                             - Universally modifiable by the recipient (so recipients can offer
>                                suggestions formatted as a pull request)
>                             - Diffs between revisions can be generated trivially, so that recipients
>                                can:
>                                - offer suggestions formatted as a patch
>                                - figure out what changed without scanning the entire document for
>                                  thin red/yellow lines on a white background (very important to me)
>                             - Mergeable when put in version control (in addition to all the other
>                                properties above you would want for version-controlled documents)
> 
>                             -- 
>                             TangerineSDR mailing list
>                             TangerineSDR at lists.tapr.org <mailto:TangerineSDR at lists.tapr.org>
>                             http://lists.tapr.org/mailman/listinfo/tangerinesdr_lists.tapr.org
> 
>                         -- 
>                         TangerineSDR mailing list
>                         TangerineSDR at lists.tapr.org <mailto:TangerineSDR at lists.tapr.org>
>                         http://lists.tapr.org/mailman/listinfo/tangerinesdr_lists.tapr.org
> 
> 
> 
>                     -- 
>                     ----
>                     Phil Erickson
>                     phil.erickson at gmail.com <mailto:phil.erickson at gmail.com>
> 
> 
> 
>             -- 
>             ----
>             Phil Erickson
>             phil.erickson at gmail.com <mailto:phil.erickson at gmail.com>
>             -- 
>             TangerineSDR mailing list
>             TangerineSDR at lists.tapr.org <mailto:TangerineSDR at lists.tapr.org>
>             http://lists.tapr.org/mailman/listinfo/tangerinesdr_lists.tapr.org
> 
> 
> 
>         -- 
>         Gerry Creager
>         NSSL/CIMMS
>         405.325.6371
>         ++++++++++++++++++++++
>         /The way to get started is to quit talking and begin doing./
>         /   Walt Disney/
>         -- 
>         TangerineSDR mailing list
>         TangerineSDR at lists.tapr.org <mailto:TangerineSDR at lists.tapr.org>
>         http://lists.tapr.org/mailman/listinfo/tangerinesdr_lists.tapr.org
> 
> 
> 
>     -- 
>     Sincerely,
> 
>     Aidan Montare
>     CWRU Class of 2021
> 
> 
> 
> -- 
> Gerry Creager
> NSSL/CIMMS
> 405.325.6371
> ++++++++++++++++++++++
> /The way to get started is to quit talking and begin doing./
> /   Walt Disney/
> 



More information about the TangerineSDR mailing list