Re: [Tech] Distributed file system using routing inspired by…

Top Page
Author: Matthew Toseland
Date:  
To: tech
CC: Ian Clarke
Subject: Re: [Tech] Distributed file system using routing inspired by Freenet
Delete this message
Reply to this message
gpg: Signature made Tue Apr 8 12:00:12 2008 UTC using DSA key ID E43DA450
gpg: Good signature from "Matthew John Toseland <toad@amphibian.dyndns.org>"
On Tuesday 08 April 2008 12:36, Matthew Toseland wrote:
> On Tuesday 08 April 2008 00:36, Ian Clarke wrote:
> > http://video.google.com/videoplay?docid=-2372664863607209585
> >
> > He mentions Freenet's use of this technique about 10-15 minutes in,
> > they also use erasure codes, so it seems they are using a few
> > techniques that we also use (unclear about whether we were the direct
> > source of this inspiration).
>
> They use 500% redundancy in their RS codes. Right now we use 150% (including
> the original 100%). Maybe we should increase this? A slight increase to say
> 200% or 250% may give significantly better performance, despite the

increased
> overhead...


In fact, I think I can justify a figure of 200% (the original plus 100%, so
128 -> 255 blocks to fit within the 8 bit fast encoding limit). On average,
in the long term, a block will be stored on 3 nodes. Obviously a lot of
popular data will be stored on more nodes than 3, but in terms of the
datastore, this is the approximate figure. On an average node with a 1GB
datastore, the 512MB cache has a lifetime of less than a day; stuff lasts a
lot longer in the store, and on average data is stored on 3 nodes (by
design).

We then multiply that by two from splitfile redundancy, to get a total
redundancy of 6. Wuala works well with a factor of 5 redundancy... but that's
entirely due to FEC. They simulated ordinary redundancy and needed a factor
of 24 to be reliable, but a factor of 5 for FEC.

So maybe what we need is less network level redundancy and more FEC level
redundancy? So we're talking about the data itself. IMHO we can't reduce the
network level redundancy much below the current store-in-3-nodes, because we
do use freenet for things other than splitfiles - frost posts, the top level
block, ... The top level block is a special case, it will usually be
fetchable because anyone trying to fetch the splitfile will fetch it even if
they give up afterwards, and even if they just followed a link in fproxy and
got a size warning and changed their mind...

Wuala's simulations assume 25% uptime, and they don't allow nodes to have
extra storage unless they have at least 17% uptime. Can we implement
something similar? We would have to not take low uptime nodes into account
when determining whether we are a sink for a key, the problem with this is
that we'd have to reliably tell whether nodes are low uptime... On opennet,
there is enough connection churn that we're unlikely to have had a node for
the many days necessary to measure this. We could reduce the connection churn
but this would come at the cost of reduced connectivity - when a node
disconnects, we give it a few minutes to reconnect, and then we move on. A
full blown reputation system as Wuala uses would be a lot of work and a lot
of debugging...
>
> Also, they discourage low uptime nodes by not giving them any extra storage.
> I'm not sure exactly what we can do about this, but it's a problem we need

to
> deal with.
>
> We should also think about randomising locations less frequently. It can

take
> a while to recover, and the current code randomizes roughly every 13 to 22
> hours. It may be useful to increase this significantly? Unfortunately this
> parameter is very dependant on the network size and so on, it's not really
> something we can get a good value for from simulations... I suggest we
> increase it by say a factor of 4, and if we get major location distribution
> issues, we can reduce it again.


This may be important.
> >
> > Ian.