[aprssig] distributed findu possible ?
Steve Dimse
steve at dimse.com
Sun Aug 10 11:34:57 EDT 2008
On Aug 10, 2008, at 9:02 AM, Matti Aarnio wrote:
>
> One of the reasons that people have no idea of what findU can do,
> is its
> "user interface". Indeed you have supplied only backend of
> things, no
> frontend at all,
Frontend and backend have specific meanings in dynamic web systems,
typically the backend is the database and the frontend is the web
server. In larger systems these are often on different physical
machines. Under that standard definition there is indeed a front end
on findU. And I specifically disallow anyone from using findU as a
backend.
I take your meaning to be that findU does not have a user-friendly way
to generate the URLs. That is very intentional. findU is a worldwide
system, I do not have the resources to localize a user interface in
different languages. On the other hand, it is relatively simple to
create forms that generate the URLs. It was and is my hope to get more
people involved in creating APRS internet resources by allowing them
to create their own form pages to generate the findU URLs. A handful
of people have, in a few languages, and I link the ones I know of on
my front page. I'd still like to see more.
> and on some details like how long data is retained the
> information is not given anywhere that I can spot
That's because I need to vary it from time to time, as disk space
wanes and the APRS IS traffic rises. I can't even keep up the info
already on there, just noticed the front page says the database is 58
GB in size, that is really old info ;-)
>
> It is much much easier to point aprs.fi's map for the general area
> of interest,
> and then look at what happens around there.
Of course it is. I wish I had time to program a full gmap
implementation on findU. My point though is you cannot call something
a distributed findU if it only has the easy features of findU. The
database aspect of the aprs.fi front page is trivial. In fact, I was
doing it in memory, without a database backend 12 years ago as part of
APRServ, the original APRS hub program.
I'm not saying aprs.fi is not useful, or wrong, or anything of any
sort negative. I'm simply saying that you cannot talk about something
as a findU analog if it only cherry-picks the easy stuff.
> In the end the raw data may not live in the system for very long, but
> those end-product views are longer-term data.
> Like:
>
> http://aprs.fi/weather/OH2KXH/year
> http://aprs.fi/telemetry/OH2RDK-5/month
Talk about hard to find info, there is nothing on the home page that
indicates this is available on aprs.fi. At least findU has a list of
available cgi's and their parameters. This is better than I though was
available there, though I still don't see a way to get anything other
than the handful of preset views. Is there a way to show a detailed
plot of high resolution data for an arbitrary time?
>> How are you going to show month+ long tracks?
>
> That all means that:
> - Data is kept on persistent database (no ram-only nodes)
> - Its insertion must be cheap (as "quick")
> - Its retrieval must be cheap (which may make the insertion less
> cheap...)
>
> Disk space keeps growing, still the disks can handle only so many IO
> operations per second because moving IO heads along the disk surface
> and
> spinning the disks themselves do take roughly the same time now that
> they
> took 10 years ago. Thus a single terabyte disk is no _faster_ to
> do IOs
> than single 10 GB disk.
Assuming all other parameters are identical, that is true. My first
hard drive was a 16 MB (yes, megabyte) drive (the size of a shoebox) I
paid $3000 for in 1979. I can assure you, its throughput was far below
even the slowest drive you can buy today. All drives are not created
equal.
High end servers use drives that spin faster (less waiting for the
data you want to rotate under the head and short time needed to read
and write a chunk of data) and have faster seek times (shorter time
until the drive can start looking for the right sector), which are
much faster than consumer class drives. findU uses six 146GB drives in
a RAID 1+0 array. Data is evenly split between three pairs of drives.
The striping of data into three groups in the RAID 1. Each bit of
incoming data is written onto both drives of a pair, this mirroring is
the RAID 0. Since each drive in a pair has identical data on it, reads
can happen from either drive. So, each drive must handle one third of
the writes and only one sixth of the reads. Combine this RAID
performance with the high end disk performance, and you get a system
that can handle maybe 10 times the throughput of a consumer drive. Not
cheap, and not high capacity (my desktop Mac has 4 times the storage
space of the findU servers), but fast.
And I disagree that there is no change over 10 years in speed. At the
low end, while the emphasis has indeed been on increasing capacity,
there have been improvements in speed. Ten years ago even a desktop
did not often have 7200 RPM drives, now I have one in my laptop. At
the high end there have been large improvements in speed and less in
capacity.
>
>
> One needs to have multiple disks for: data mirrors so that single
> disk can
> fail without data loss or even service loss, _and_ for IO
> parallellism.
IO parallelism is about speed. Once you have parallelism that travels
the internet, you lose a lot of speed. The fastest ping time is longer
than the slowest seek time. If you use a distributed database that is
not within a single data center, user experience will suffer. I don't
consider alexa.com reliable for traffic rankings because of their non-
random sample, but they have a good metric for response time. I'm
proud they rank findU as very fast, at 0.7 seconds it beats 87% of web
sites. As reference arrl.net is 3 seconds and qrz.com is 5 seconds.
aprs.fi and aprsworld do not have numbers because they fall below the
rankings at which alexa performs speed test. There are many studies
that show more than a couple seconds response time adversely colors
users' perception of a web site.
When looking at reliability for a distributed system, you need to look
at the reliability of each server to decide how much redundancy you
need. No matter what, you need two copies of each bit. With a low
reliability system (not just hardware, but with this volunteer system
Joe goes on vacation and turns off his computer, or there is an ice
storm and he loses power or internet), you probably want at least
three copies. So if you want each server to have a hundredth of a
findU amount of data, now you need 300 machines. Plus you need a way
to recognize when one becomes unavailable and mirror the data onto
another server. Just another feature to add into the magical central
control of the system.
I haven't heard, who is going to write this? ;-)
Steve K4HG
More information about the aprssig
mailing list