[aprssig] distributed findu possible ?
Steve Dimse
steve at dimse.com
Sat Aug 9 18:05:43 EDT 2008
On Aug 8, 2008, at 7:33 PM, Matti Aarnio wrote:
>
> If the "dump last 30 minutes of traffic" -feature could be thrown
> away,
> the APRS-IS server memory footprint would shrink considerably, but
> that
> does not help the "findu"-like systems at all, they would need N
> gigabytes
> of memory if it all should be kept on memory... (and "N" is very much
> larger than 4 ...)
n=248 GB for eight years of weather data and satellite QSOs, 120 days
of positions, and 60 days of messages, telemetry, errors, and a number
of other tables.
> I have ran test extracts of full APRS-IS feed a few times.
> I recall it being 115 000 - 130 000 APRS data records per hour.
> Another view: 10-11 MB/hour of network traffic. (250 MB per 24h,
> etc.
> plus lookup indices.)
>
> Anyway: 30-40 APRS records per second all day around. That begins
> to
> make a serious challenge on database insert unless there is really
> smart
> indexing.
Keep in mind that one line of APRS data can, and usually does, have
more than one kind of data. One Mic-E packet might have telemetry,
position, status byte, and comment. Weather can have position, weather
data, and comment, etc. There are specialty tables needed for some
functions, for example an incoming position is stored twice, once in
the 120 day table, and once in a last posit table which stores a
single position for every call. This is essential for efficiently
executing the near function.
Don't forget, as fast as data comes in, you have to delete the
obsolete data. I run this at night, when the load on findU is lower.
Night is only about 20 percent lower though as findU has a significant
user base around the world. You cannot simply issue a single delete
command, because of the database is locked during the delete. Instead
you need to delete a couple hundred at a time, wait for everything
else to catch up, and loop. The routine watches the disk io wait time
and skips a cycle if the system is backed up. It takes about 7 hours
to delete the 24 hours of data (excepting that which is retained
permanently).
So, for an average, I'd say take the APRS-IS packet rate and multiply
times 8 for the database insert transaction rate. You need an index to
cover any search you want to make, as a full table search of a 40 GB
table would take hours. With a non-compacted table (like where old
data is delected, and new data can be added anywhere in the table
there is free space) the table must be locked from writes during a
read. No new data can be written while a long read happens. This is
why I do not allow direct access to the database.
So yes, the database insert load is significant. So is the read load,
the near.cgi page showing 50 stations is about 200 separate reads. A
weather graph data is obtained in a single transaction, but to answer
the query the database needs to seek on each data point.
You'd probably have to split the incoming and outgoing loads by a
factor of 10 before it comes in range of a typical PC.
Steve K4HG
More information about the aprssig
mailing list