Re: [Tech] Re datastore simulations

Top Page
Delete this message
Reply to this message
Author: Michael Rogers
Date:  
To: Matthew Toseland
Subject: Re: [Tech] Re datastore simulations
On Apr 30 2008, Matthew Toseland wrote:
> Keys to block number. Block numbers to keys is handled by the on disk
> structure. So we can actually pick a random block number to dump - but at
> the cost of having to keep a key index.


Cool, I see what you mean now - I'll simulate that too.

> I'm surprised that hashing works so well, it has some big disadvantages
> e.g. once the datastore is say half full, half of all new incoming keys
> will overwrite old data rather than being added to the end. So we end up
> storing less data: it takes a much longer time for the datastore to fill
> up.


Hmm, good point. On the other hand filling the store (or 99% filling it)
would typically only take a few days, so maybe it's more important to
optimise the steady state behaviour than the startup behaviour?

> What is the approximate ratio of store filling rates for the same size
> store on LRU versus on a direct hashing implementation? Can you simulate
> this?


So far I've been allowing the simulations to reach a steady state before
making any measurements, but it shouldn't be a problem to simulate it.

> IMHO most of it will be filesharing, just as a massive chunk of the total
> internet bandwidth is filesharing.


OK, I'll simulate filesharing two popularity distributions, uniform and
Zipf. Each file will contain a lognormally distributed number of blocks,
and the downloader will randomly choose 2/3 of them to request. I won't
bother with splitfile healing, inserts, churn, congestion, swapping, phase
of the moon, etc.

> SSK polling for messages obviously
> will also be huge, right now we have 2.5 SSKs for every CHK (but SSKs are
> ~ 10x than CHKs). That should reduce a bit in future with some new
> measures such as RecentlyFailed ... but it will increase as FMS is more
> widely adopted... So no idea really... I do know that if we spend all our
> bandwidth on SSK polling, filesharing will not work well. :| Also, SSKs
> are kept in a separate store from CHKs, this is not likely to change.


I'll stick to simulating CHKs for the moment - RecentlyFailed and ULPRs
will affect the way SSKs are cached, but I don't have time to dig into the
code to find out how they work (and into Frost and FMS to find out what
kind of traffic patterns they produce).

Cheers,
Michael