Re: [Tech] Re datastore simulations

Top Page
Author: Matthew Toseland
Date:  
To: Michael Rogers
Subject: Re: [Tech] Re datastore simulations
Delete this message
Reply to this message
gpg: Signature made Wed Apr 30 14:45:44 2008 UTC using DSA key ID E43DA450
gpg: Good signature from "Matthew John Toseland <toad@amphibian.dyndns.org>"
On Wednesday 30 April 2008 13:19, Michael Rogers wrote:
> On Apr 30 2008, Matthew Toseland wrote:
> > Keys to block number. Block numbers to keys is handled by the on disk
> > structure. So we can actually pick a random block number to dump - but at
> > the cost of having to keep a key index.
>
> Cool, I see what you mean now - I'll simulate that too.
>
> > I'm surprised that hashing works so well, it has some big disadvantages
> > e.g. once the datastore is say half full, half of all new incoming keys
> > will overwrite old data rather than being added to the end. So we end up
> > storing less data: it takes a much longer time for the datastore to fill
> > up.
>
> Hmm, good point. On the other hand filling the store (or 99% filling it)
> would typically only take a few days, so maybe it's more important to
> optimise the steady state behaviour than the startup behaviour?


Depends on how big it is.
>
> > What is the approximate ratio of store filling rates for the same size
> > store on LRU versus on a direct hashing implementation? Can you simulate
> > this?
>
> So far I've been allowing the simulations to reach a steady state before
> making any measurements, but it shouldn't be a problem to simulate it.


Ok.
>
> > IMHO most of it will be filesharing, just as a massive chunk of the total
> > internet bandwidth is filesharing.
>
> OK, I'll simulate filesharing two popularity distributions, uniform and
> Zipf. Each file will contain a lognormally distributed number of blocks,
> and the downloader will randomly choose 2/3 of them to request. I won't
> bother with splitfile healing, inserts, churn, congestion, swapping, phase
> of the moon, etc.
>
> > SSK polling for messages obviously
> > will also be huge, right now we have 2.5 SSKs for every CHK (but SSKs are
> > ~ 10x than CHKs). That should reduce a bit in future with some new
> > measures such as RecentlyFailed ... but it will increase as FMS is more
> > widely adopted... So no idea really... I do know that if we spend all our
> > bandwidth on SSK polling, filesharing will not work well. :| Also, SSKs
> > are kept in a separate store from CHKs, this is not likely to change.
>
> I'll stick to simulating CHKs for the moment - RecentlyFailed and ULPRs
> will affect the way SSKs are cached, but I don't have time to dig into the
> code to find out how they work (and into Frost and FMS to find out what
> kind of traffic patterns they produce).


Sensible imho.
>
> Cheers,
> Michael