[TangerineSDR] Filename structure, Node info contents, other stuff to ponder

Gerry Creager - NOAA Affiliate gerry.creager at noaa.gov
Sat May 9 11:23:08 EDT 2020


Ryan, all

I spend a fair portion of my life with NetCDF4 (with and without HDF5).
I'll have to check to see what use of NetCDF and HDFx are being used
already by folks like the Space Weather Prediction Center (I know the Air
Force converts everything to grib2 to make life difficult). That said, I've
used NetCDF-CF for years on a variety of projects. It doesn't always adapt
and play nicely. For example, unstructured grid products are poorly
handled, although this is something that's been on the NetCDF to-do list
for at least 15 years and was elevated in an NSF program review about 6
years ago. That doesn't mean it can't be kludged, and there are several
groups with working approaches, but almost everyone rolls their own without
acceptance by the CF community.

HDF was originally created by the satellite community and is much more
adept at handling unstructured data although I find it somewhat harder to
deal with. NetCDF and the NASA HDF community started working and playing
better with the advent of NetCDF4, acceptance and merger of HDF5 into
NetCDF as the underlying data structure, and incorporation of robust
compression from HDF.

I'm more than happy to work with you if we want to form an ad hoc data
standards subcommittee to look at file formats. I've been down this road
with weather and ocean products in the past.

Gerry

On Thu, May 7, 2020 at 7:26 PM Ryan Volz <ryan.volz at gmail.com> wrote:

> Hi all,
>
> I'm an outside observer, but with the discussion having now touched on an
> HDF5 format, I think it might be helpful to point toward the NetCDF Climate
> and Forecast conventions:
>
> http://cfconventions.org/cf-conventions/cf-conventions.html
>
> I don't know that it exactly fits, but it provides a standard format for
> geoscience data stored in NetCDF files, which in the modern case is a
> strict subset of HDF5. It's called "Climate and Forecast", but the format
> pretty adaptable to any data where geodetic coordinates are important (I
> use it for meteor data, for example). It's also nicely integrated into the
> 'xarray' Python package for working with labeled/indexed multidimensional
> array data:
>
> https://xarray.pydata.org/en/stable/weather-climate.html
>
> There would still be work to do defining how exactly to fit PSWS data into
> this convention, but by no means would you have to throw out what's already
> been done.
>
> Cheers,
> Ryan
>
> On 5/7/20 2:38 PM, Gerry Creager - NOAA Affiliate via TangerineSDR wrote:
> > Aidan
> >
> > Sure, but this is likely to get me into lecture mode, and to be honest,
> I am not as current as I used to be on the subject. I've been off the OGC
> technical committee for 8 years now.
> >
> > SensorML will be the official topic, as StarFishFL is a subset, for the
> most part, and designed to remove a lot of cruft from SensorML.
> > SensorML was designed to provide a comprehensive set of metadata about a
> sensor, AND about the data it can collect. The format is XML, and there
> should, by now. be an accepted schema to serve as a template. Not all the
> fields are required, and metadata can be inherited as needed. In the
> original vision, SensorML was designed to describe a satellite sensor
> platform in terms of orbital parameters and attitude parametrics, as well
> as describing the sensor characteristics (e.g., for a line-scanner, scan
> and dwell rate, nadir angle, etc).
> >
> > The elements that should be attractive are the concept of inheritance.
> I'm not a fan of overloading a filename with all the metadata possible.
> Also, while it's wordy, using a schema to create a metadata XML config just
> makes sense. Passing the metadata periodically (once/day) and/or when it
> changed is, to me, a more reasonable way to handle this. I'm also a fan of
> streaming the data as it comes in. I can tell you now that the lightning
> data will have to talk to a central server in real time to trilaterate the
> CG strike information.
> >
> > Since the data files are apparently intended to come to a central server
> anyway, streaming the data using atcp, udp or multicast transport would be
> straightforward using several extant protocols, or we could reinvent the
> wheel. Similarly, poorly connected sites could send their file data in as
> able, daily, etc.
> >
> > While we're on the subject of data files, I've got to ask the obvious
> question to me (although a different note from Phil triggered this
> question), which is, why store the data in csv? Wouldn't compressed HDF5
> make more sense in the long term? And, if we do that, and the HDF5 file is
> appropriately created and populated, the metadata for each observation is
> inherent to the HDF5 file format.
> >
> > What I'm suggesting is that we're spending a lot of effort to create
> both a data file format and a naming convention for sensor data when
> there's already a standard out there. It's a little obscure unless you've
> ventured out into the world of geospatial or geosciences world. I just
> think that looking at the standards available to us already might be a
> better use of time.
> >
> > I can expand further, but I'm multitasking right now with a Cray storage
> problem. If I need to go further, please let me know.
> >
> > gerry
> >
> > On Thu, May 7, 2020 at 4:41 PM Aidan Montare <aam141 at case.edu <mailto:
> aam141 at case.edu>> wrote:
> >
> >     Gerry,
> >
> >     Could you provide an overview of the features of SensorML and
> related systems that are relevant to our discussion, or the parts that you
> want us to pay attention to? I took a look at the wikipedia pages for those
> topics, but there's a lot that I don't understand, so having a guide would
> be helpful.
> >
> >
> >
> >     On Wed, May 6, 2020 at 11:40 AM Gerry Creager - NOAA Affiliate via
> TangerineSDR <tangerinesdr at lists.tapr.org <mailto:
> tangerinesdr at lists.tapr.org>> wrote:
> >
> >         I'm coming into this a bit late but: Maintaining so much of the
> metadata in the file name is both attractive and painful. Allowing you to
> see who's sending, just by looking at the file name is the obvious
> attractive element. The potential for corruption of that element of
> metadata is a potential pitfall I've seen in meteorological data even from
> automated systems. In addition, I've evolved to a point where I prefer to
> store the metadata in a database (I note the reference to a central db) or
> periodic (e.g., daily) transmission of a "current-metadata" update with a
> lot of the info, e.g. grid square, PII, etc. that can be checked against
> the database.
> >
> >         I'll also suggest a quick review of the Open Geospatial
> Consortium's work on SensorML (overkill for this project, but instructive)
> and Starfish Fungus ML, a simplification and streamlining of SensorML.
> These were designed specifically for overhead imagery system, but quickly
> tweaked for a variety of other sensors.
> >         73
> >         Gerry N5JXS
> >
> >         On Wed, May 6, 2020 at 3:00 PM Phil Erickson via TangerineSDR <
> tangerinesdr at lists.tapr.org <mailto:tangerinesdr at lists.tapr.org>> wrote:
> >
> >             Hi John,
> >
> >                That works for "fldigi" - but it seems to me that the
> discussion here is more general and should not be tied to a particular
> program behavior.  We have had totally disastrous results in reopening a
> file and adding to it - including fseek(0) due to program error and
> blasting away all the earlier data.  File namespace is cheap so I guess I
> don't get the desire to conserve it these days.
> >
> >             ---- Phil
> >
> >
> >             On Wed, May 6, 2020 at 10:56 AM John Gibbons <jcg66 at case.edu
> <mailto:jcg66 at case.edu>> wrote:
> >
> >                 Phil,
> >
> >                 The issue of restarting anytime during the day was one
> of the major modifications I made to the program.  It just starts up and
> continues where it left off.  Same file, same date, etc.
> >
> >                 So it's not a problem.  People will also stop the data
> collection to use their ham stations for their normal operations, so this
> had to be addressed.
> >
> >                 John N8OBJ
> >
> >                 John C. Gibbons
> >                 Director - Sears Undergraduate Design Laboratory
> >                 Dept. of Electrical Engineering and Computer Science
> >                 Case Western Reserve University
> >                 10900 Euclid Ave, Glennan 314
> >                 Cleveland, Ohio  44106-7071
> >                 Phone (216) 368-2816 <tel:216-368-2816> FAX (216)
> 368-6888 <tel:216-368-6888>
> >                 E-mail: jcg66 at case.edu <mailto:jcg66 at case.edu>
> >
> >
> >
> >                 On Tue, May 5, 2020 at 11:01 PM Phil Erickson <
> phil.erickson at gmail.com <mailto:phil.erickson at gmail.com>> wrote:
> >
> >                     Hi John,
> >
> >                        Your files only have the YYYY-MM-DD form of the
> ISO date in them.  This seems to imply that there can be only one file per
> day.  What if the instrument dies for a while and then restarts on the same
> day?  I guess I'm wondering why you wouldn't use the fully qualified ISO
> (e.g. 2020-05-05T03:00:00).  Maybe I'm missing something obvious.
> >
> >                     Cheers
> >                     Phil
> >
> >                     On Tue, May 5, 2020 at 10:05 PM John Gibbons via
> TangerineSDR <tangerinesdr at lists.tapr.org <mailto:
> tangerinesdr at lists.tapr.org>> wrote:
> >
> >                         Rob,
> >
> >                         Thank You for the feedback!
> >
> >                         Yes, the ISO date in the filename is for that
> UTC days data - will mod the doc to reflect that.
> >
> >                         I stayed away from . and - in the filename as
> Windows users would get into trouble here (I stuck to just the _ char) and
> I think we have to cater to that limitation of Windows 7/10/whatever.
> >
> >                         For the RasPi OS (Linux), I use a system call to
> define ~ which is the BASE of the user's directory structure (root was a
> BAD choice here - not intended to confuse it with root user)
> >
> >                         ALL Node numbers are real nodes (except N00000)
> like the rest of the higher numbered ones - I just allocated 1-99 for the
> development team(s) use to help set them apart visually
> >
> >                         I originally defined nodes 1-49 for the low cost
> PSWS and 51-99 for the high cost PSWS, with 50 being the test case for
> the high cost PSWS. I may throw that back in and see what shakes out.
> >
> >                         This will be available online when we get it
> into version control.
> >
> >                         Thanks for your help and I will send out v0.03
> shortly.
> >
> >                         John N8OBJ
> >
> >
> >
> >
> >
> >
> >                         John C. Gibbons
> >                         Director - Sears Undergraduate Design Laboratory
> >                         Dept. of Electrical Engineering and Computer
> Science
> >                         Case Western Reserve University
> >                         10900 Euclid Ave, Glennan 314
> >                         Cleveland, Ohio  44106-7071
> >                         Phone (216) 368-2816 <tel:216-368-2816> FAX
> (216) 368-6888 <tel:216-368-6888>
> >                         E-mail: jcg66 at case.edu <mailto:jcg66 at case.edu>
> >
> >
> >
> >                         On Tue, May 5, 2020 at 8:38 PM Rob Wiesler via
> TangerineSDR <tangerinesdr at lists.tapr.org <mailto:
> tangerinesdr at lists.tapr.org>> wrote:
> >
> >                             (David is probably on the TangerineSDR list,
> but I don't know that for
> >                             sure, so he may get an extra copy of this
> message (sorry).)
> >
> >                             On Tue, May 05, 2020 at 19:06:38 -0400, John
> Gibbons via TangerineSDR wrote:
> >                              > This is intended as a starting point to
> generate input for further
> >                              > refinement, so comments are welcome and
> encouraged.
> >
> >                             I like the base filename structure enough.
> In particular, I like that
> >                             it's sortable by date in any sane locale,
> and both the date and the node
> >                             ID will align vertically (until 6 chars
> stops being enough for node IDs,
> >                             or we start worrying about the Y10K problem).
> >
> >                             Does the date in the filename refer to:
> >                             - the beginning of the record, or
> >                             - the end of the record, or
> >                             - a single day in its entirety, or
> >                             - at most a single day, but no (significant)
> part of any other day?
> >
> >                             It's probably obvious to everyone that a
> "day" is a UTC day, but it's
> >                             not in the specification, and it wouldn't
> hurt to add.
> >
> >                             I agree that the zero node ought to be set
> aside.
> >
> >                             We should use another letter or two for the
> "testing" and high/low-cost
> >                             PSWS bits instead of allocating node IDs
> within specific ranges.  How
> >                             about we (where X is 0-9 and * is any
> (possibly empty) sequence of A-Z):
> >                             - use either NXXXXXL or LXXXXX for low-cost
> PSWS nodes
> >                             - use either NXXXXXH or HXXXXX for high-cost
> PSWS nodes
> >                             - set aside  N00000*, L00000*, and/or
> H00000* as a test nodes with
> >                                invalid data and/or for other purposes
> >                             - not explicitly denote valid data from
> testing systems in the filename
> >
> >                             A couple questions to answer on that subject:
> >                             - What makes testing systems with valid data
> more/less important/notable
> >                                than other systems?
> >                             - Can a testing system migrate to a
> production system without changing
> >                                its node ID?
> >                             - Is it sufficient that testing systems will
> necessarily have low node
> >                                IDs in most cases?
> >
> >                             Let's at least specify WWVdata/ as existing
> relative to "the user"'s
> >                             home directory, instead of /home/pi (it
> can't hurt to be explicit).
> >                             Also, you have a typo, where you say that
> "~" is the "root filesystem
> >                             for user", which is not a thing (you mean
> "home directory").
> >
> >                             Is there a reason you're avoiding a second
> '.' in the filename?  It's a
> >                             little awkward to use 2p5 for 2.5, and that
> second period isn't going to
> >                             confuse any software or upend the sorting.
> >
> >                              > We should probably create a mechanism for
> additions / refinements to this
> >                              > document for further work rather than
> this email thread.
> >
> >                             We can always have both :)
> >
> >                              > I have the original .doc that created
> this - let me how we should handle
> >                              > version control from here.  (Nathaniel?)
> >
> >                             Please turn this into a plain text file.  I
> can read PDFs, but it's not
> >                             ideal, and I'm getting sick and tired of
> specification documents in
> >                             other non-textual formats.  A plain text
> specification file has these
> >                             properties:
> >
> >                             - Small file size (because this is 2020 and
> it totally matters)
> >                             - Less wasted visual space when the document
> isn't all that long (again,
> >                                wishlist-grade)
> >                             - Universally readable
> >                             - Universally modifiable by the recipient
> (so recipients can offer
> >                                suggestions formatted as a pull request)
> >                             - Diffs between revisions can be generated
> trivially, so that recipients
> >                                can:
> >                                - offer suggestions formatted as a patch
> >                                - figure out what changed without
> scanning the entire document for
> >                                  thin red/yellow lines on a white
> background (very important to me)
> >                             - Mergeable when put in version control (in
> addition to all the other
> >                                properties above you would want for
> version-controlled documents)
> >
> >                             --
> >                             TangerineSDR mailing list
> >                             TangerineSDR at lists.tapr.org <mailto:
> TangerineSDR at lists.tapr.org>
> >
> http://lists.tapr.org/mailman/listinfo/tangerinesdr_lists.tapr.org
> >
> >                         --
> >                         TangerineSDR mailing list
> >                         TangerineSDR at lists.tapr.org <mailto:
> TangerineSDR at lists.tapr.org>
> >
> http://lists.tapr.org/mailman/listinfo/tangerinesdr_lists.tapr.org
> >
> >
> >
> >                     --
> >                     ----
> >                     Phil Erickson
> >                     phil.erickson at gmail.com <mailto:
> phil.erickson at gmail.com>
> >
> >
> >
> >             --
> >             ----
> >             Phil Erickson
> >             phil.erickson at gmail.com <mailto:phil.erickson at gmail.com>
> >             --
> >             TangerineSDR mailing list
> >             TangerineSDR at lists.tapr.org <mailto:
> TangerineSDR at lists.tapr.org>
> >
> http://lists.tapr.org/mailman/listinfo/tangerinesdr_lists.tapr.org
> >
> >
> >
> >         --
> >         Gerry Creager
> >         NSSL/CIMMS
> >         405.325.6371
> >         ++++++++++++++++++++++
> >         /The way to get started is to quit talking and begin doing./
> >         /   Walt Disney/
> >         --
> >         TangerineSDR mailing list
> >         TangerineSDR at lists.tapr.org <mailto:TangerineSDR at lists.tapr.org>
> >
> http://lists.tapr.org/mailman/listinfo/tangerinesdr_lists.tapr.org
> >
> >
> >
> >     --
> >     Sincerely,
> >
> >     Aidan Montare
> >     CWRU Class of 2021
> >
> >
> >
> > --
> > Gerry Creager
> > NSSL/CIMMS
> > 405.325.6371
> > ++++++++++++++++++++++
> > /The way to get started is to quit talking and begin doing./
> > /   Walt Disney/
> >
>


-- 
Gerry Creager
NSSL/CIMMS
405.325.6371
++++++++++++++++++++++
*The way to get started is to quit talking and begin doing.*
*   Walt Disney*
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.tapr.org/pipermail/tangerinesdr_lists.tapr.org/attachments/20200509/4c7913ca/attachment-0001.html>


More information about the TangerineSDR mailing list