[aprssig] distributed findu possible ?

Steve Dimse steve at dimse.com
Sat Aug 9 18:05:43 EDT 2008


On Aug 8, 2008, at 7:33 PM, Matti Aarnio wrote:
>
> If the "dump last 30 minutes of traffic" -feature could be thrown  
> away,
> the APRS-IS server memory footprint would shrink considerably, but  
> that
> does not help the "findu"-like systems at all, they would need N  
> gigabytes
> of memory if it all should be kept on memory...  (and "N" is very much
> larger than 4 ...)

n=248 GB for eight years of weather data and satellite QSOs, 120 days  
of positions, and 60 days of messages, telemetry, errors, and a number  
of other tables.

> I have ran test extracts of full APRS-IS feed a few times.
> I recall it being 115 000 - 130 000 APRS data records per hour.
> Another view:  10-11 MB/hour of network traffic.  (250 MB per 24h,  
> etc.
> plus lookup indices.)
>
> Anyway: 30-40 APRS records per second all day around.    That begins  
> to
> make a serious challenge on database insert unless there is really  
> smart
> indexing.

Keep in mind that one line of APRS data can, and usually does, have  
more than one kind of data. One Mic-E packet might have telemetry,  
position, status byte, and comment. Weather can have position, weather  
data, and comment, etc. There are specialty tables needed for some  
functions, for example an incoming position is stored twice, once in  
the 120 day table, and once in a last posit table which stores a  
single position for every call. This is essential for efficiently  
executing the near function.

Don't forget, as fast as data comes in, you have to delete the  
obsolete data. I run this at night, when the load on findU is lower.  
Night is only about 20 percent lower though as findU has a significant  
user base around the world. You cannot simply issue a single delete  
command, because of the database is locked during the delete. Instead  
you need to delete a couple hundred at a time, wait for everything  
else to catch up, and loop. The routine watches the disk io wait time  
and skips a cycle if the system is backed up. It takes about 7 hours  
to delete the 24 hours of data (excepting that which is retained  
permanently).

So, for an average, I'd say take the APRS-IS packet rate and multiply  
times 8 for the database insert transaction rate. You need an index to  
cover any search you want to make, as a full table search of a 40 GB  
table would take hours. With a non-compacted table (like where old  
data is delected, and new data can be added anywhere in the table  
there is free space) the table must be locked from writes during a  
read. No new data can be written while a long read happens. This is  
why I do not allow direct access to the database.

So yes, the database insert load is significant. So is the read load,  
the near.cgi page showing 50 stations is about 200 separate reads. A  
weather graph data is obtained in a single transaction, but to answer  
the query the database needs to seek on each data point.

You'd probably have to split the incoming and outgoing loads by a  
factor of 10 before it comes in range of a typical PC.

Steve K4HG




More information about the aprssig mailing list