[aprssig] APRS IS Issue ?

Gerry Creager gerry.creager at tamu.edu
Mon Mar 3 08:16:35 EST 2008


Howdy, Stan,

Stan N0YXV wrote:
> Ok you can all shoot me now but I thought I'd add a little fuel to the 
> fire. Honestly I'm not trying to make people upset but a couple of 
> thoughts came to mind that I thought might be interesting for future 
> growth and learning. Some times we learn the most from our mistakes.
> 
> 1.) We knew for a month that 3rd and 4th would be gone. Why wasn't a new 
> THIRD put into place before the deadline? Why did we have to wait until 
> the two remaining servers were full before we figured out that we needed 
> to increase the threshold on FIRST? Wonder what would have happended if 
> we had run out of capacity during a big disaster like a Katrina? The 
> object of servers shouldn't be to meet the demand but to be able to have 
> enough overhead for the unexpected.

My fault.  I've been traveling too much, and when home, I've been doing 
little things like, literally, putting out the RAID shelf fire that cost 
me 4TB of disk and data.  And doing sysadmin work on a few systems 'til 
I can get a new-hire in place.  So, simply:
I dropped the ball.

I'll try to do better in the future.  I'd coordinated to get the new 
server up but didn't get everything done.

Sorry!  Really!  I'll try to take care in the future.

> 2.) When it was decided that the core take on the CWOP program it would 
> appear that there wasn't a lot of preplaining. I mean we knew long 
> before the core took the project on that weather stations reported in a 
> 5 minute (or less) fashion. Couldn't a person calculate the amount of 
> traffic and server resources from what was already an exsisting service? 
> Not to anticipate a growth in CWOP data and plan ahead would seam to be 
> a hugh mistake. (Sounds like that was already admitted to but just 
> wanted to bring it up to get it back into the discussion.)

Looking simply at the traffic, we did calculate that there'd be adequate 
hardware and software resources to handle the CWOP data.  We missed a 
bit of processing information dealing with filter software and hashing. 
    When the Christmas spike occurred, we identified that a problem had 
been unmasked and started the evaluation process, did additional 
troubleshooting and determined what the real problems were.

A recommendation to split CWOP off to its own servers was made a year 
ago but for a variety of reasons, not the least of which was, at the 
time we thought load wouldn't be an issue, we (APRS Core sysops and CWOP 
management) tabled it.  This year, it wasn't really an optional issue 
anymore.

I had started doing some planning along these lines and was able to do 
some quick work to get volunteer servers up 'til CWOP servers can be 
funded and managed in a different manner.  Again for some good reasons, 
continuing to fund and operate CWOP servers as pure volunteer efforts 
isn't a great choice.  We're working on a more permanent arrangement for 
CWOP servers and management.  We're also faced with the imminent 
retirement of a key individual in CWOP and his duties will have to be 
assumed somehow.

> I'll get my flame suit ready but my intent isn't to put more salt into 
> old wounds but to see if there isn't something we can learn from our 
> past mistakes so that we don't repeat them.

I'm trying to make sure we don't make these mistakes again while I'm 
involved.

gerry
-- 
Gerry Creager -- gerry.creager at tamu.edu
Texas Mesonet -- AATLT, Texas A&M University
Cell: 979.229.5301 Office: 979.862.3982 FAX: 979.862.3983
Office: 1700 Research Parkway Ste 160, TAMU, College Station, TX 77843





More information about the aprssig mailing list