Re: [Tech] Distributed file system using routing inspired by…

Top Page
Delete this message
Reply to this message
Author: Jano
Date:  
To: tech
Subject: Re: [Tech] Distributed file system using routing inspired by Freenet
Matthew Toseland wrote:

> On Wednesday 09 April 2008 05:28, Daniel Cheng wrote:
>> 2008/4/8 Matthew Toseland
>> <toad@???>:
>> > On Tuesday 08 April 2008 12:36, Matthew Toseland wrote:
>> > > On Tuesday 08 April 2008 00:36, Ian Clarke wrote:
>> > > > http://video.google.com/videoplay?docid=-2372664863607209585
>> > > >
>> > > > He mentions Freenet's use of this technique about 10-15 minutes in,
>> > > > they also use erasure codes, so it seems they are using a few
>> > > > techniques that we also use (unclear about whether we were the direct
>> > > > source of this inspiration).
>> > >
>> > > They use 500% redundancy in their RS codes. Right now we use 150%
> (including
>> > > the original 100%). Maybe we should increase this? A slight increase to
> say
>> > > 200% or 250% may give significantly better performance, despite the
>> > increased
>> > > overhead...
>> >
>> > In fact, I think I can justify a figure of 200% (the original plus 100%,
> so
>> > 128 -> 255 blocks to fit within the 8 bit fast encoding limit). On
> average,
>> > in the long term, a block will be stored on 3 nodes. Obviously a lot of
>> > popular data will be stored on more nodes than 3, but in terms of the
>> > datastore, this is the approximate figure. On an average node with a 1GB
>> > datastore, the 512MB cache has a lifetime of less than a day; stuff lasts
> a
>> > lot longer in the store, and on average data is stored on 3 nodes (by
>> > design).
>>
>> I think the downloader would "heal" a broken file by re-inserting the
>> missing FEC blocks, right?
>>
>> If that is the case, I think we can use 300% (or higher) redundancy,
>> but only insert a random portion of them. When a downloader download
>> this file, he insert (some other) random blocks of FEC for this file.
>> Under this scheme, the inserter don't have to pay for a high bandwidth
>> overhead cost, while increasing the redundancy.
>
> I'm not worried about inserters paying a high bandwidth cost actually. Right
> now inserts are a lot faster than requests. What I'm worried about is if we
> have too much redundancy, our overhead in terms of data storage will be
> rather high, and that reduces the amount of data that is fetchable.


FWIW, my store is 50% full (of 8GB) after several several weeks of mostly 24/7
uptime, whereas the cache gets filled pretty fast.

Plus reinsertions are requested quite often in frost shortly after an
announcement. Could this mean that stores aren't being currently fully
exploited?