Quantcast

simple virtual file system in Erlang?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

simple virtual file system in Erlang?

Marco Molteni
Hello colleagues,

I would like to start from the presentation "Build an FTP Server with Ranch in 30 Minutes" and really build an FTP server, so at a certain point I need to hit the filesystem.

Since the idea is still as a presentation (as opposed to building a production FTP server) I don't want to touch the real filesystem of the host, I want to use a as simple as possible virtual filesystem. It could be backed by DETS or Mnesia for example, or it could stay only in memory.

On the other hand, it must still behave as a filesystem, that is, it must be hierarchical, so a direct mapping to a key/value store would not be enough. In the spirit of simplicity, I don't need any concept of read/write permission, I simply need a sort of graph with two types: inner nodes are directories, leaf nodes are files or empty directories.

I am thinking to use DETS and somehow introduce a very simple intermediate layer that would offer the impression to be in a graph (each non-leaf node a directory) and map it to the DETS key/value API.

Any suggestions?

thanks
marco

[1] https://ninenines.eu/articles/ranch-ftp/
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: simple virtual file system in Erlang?

Jesper Louis Andersen-2
The simplest way is probably to map a path to its data, where the path is a list of components. That is, the UNIX file ls in /usr/bin would be represented as ["usr", "bin", "ls"].

But you could also go the way of Plan9's venti. Map something like ["usr", "bin", "ls"] into a sha3 checksum of the data currently residing in the file and keep the data inside a storage suitable for storing data by its content addressing. venti also extends this by storing a tree of 64 kilobyte blocks which can then be regarded as the data when taken together. Thus, you have two components: the data store, and the path mapping service, mapping a path to the underlying data.

This idea has the advantage of being practically useful in many situations beside a toy example :)

On Sun, Mar 26, 2017 at 2:46 PM Marco Molteni <[hidden email]> wrote:
Hello colleagues,

I would like to start from the presentation "Build an FTP Server with Ranch in 30 Minutes" and really build an FTP server, so at a certain point I need to hit the filesystem.

Since the idea is still as a presentation (as opposed to building a production FTP server) I don't want to touch the real filesystem of the host, I want to use a as simple as possible virtual filesystem. It could be backed by DETS or Mnesia for example, or it could stay only in memory.

On the other hand, it must still behave as a filesystem, that is, it must be hierarchical, so a direct mapping to a key/value store would not be enough. In the spirit of simplicity, I don't need any concept of read/write permission, I simply need a sort of graph with two types: inner nodes are directories, leaf nodes are files or empty directories.

I am thinking to use DETS and somehow introduce a very simple intermediate layer that would offer the impression to be in a graph (each non-leaf node a directory) and map it to the DETS key/value API.

Any suggestions?

thanks
marco

[1] https://ninenines.eu/articles/ranch-ftp/
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: simple virtual file system in Erlang?

Joe Armstrong-2
Yes - mapping the path to a content addressable store address would be
great. It would also be great in a wider context.

A fun extension would be to add a an http interface. Something like

   POST <blob>

To store <blob>,  then later

   GET /blob/sha256/<SHA>

to recover the blob. <SHA> is the SHA256 checksum of the data (the
path contains the
type of the checksum - so you might say GETblob/md5/<MD5CHECKSUM>

then add a DHT - then run on every webserver on the planet
then store all data forever.

The nice thing about this is that it's self-securing - a person-in-the-middle
cannot change the data without it being detected (since you can check
the SHA of the
data when you get it back)


Such a system has three relatively simple layers

- transport (say HTTP over TCP)
- DHT
- storage

If you ask Jesper nicely he'll tell you all about DHT's :-)

Have fun with the storage layer for this (and no it's NOT a toy project)

/Joe

On Sun, Mar 26, 2017 at 7:57 AM, Jesper Louis Andersen
<[hidden email]> wrote:

> The simplest way is probably to map a path to its data, where the path is a
> list of components. That is, the UNIX file ls in /usr/bin would be
> represented as ["usr", "bin", "ls"].
>
> But you could also go the way of Plan9's venti. Map something like ["usr",
> "bin", "ls"] into a sha3 checksum of the data currently residing in the file
> and keep the data inside a storage suitable for storing data by its content
> addressing. venti also extends this by storing a tree of 64 kilobyte blocks
> which can then be regarded as the data when taken together. Thus, you have
> two components: the data store, and the path mapping service, mapping a path
> to the underlying data.
>
> This idea has the advantage of being practically useful in many situations
> beside a toy example :)
>
> On Sun, Mar 26, 2017 at 2:46 PM Marco Molteni <[hidden email]>
> wrote:
>>
>> Hello colleagues,
>>
>> I would like to start from the presentation "Build an FTP Server with
>> Ranch in 30 Minutes" and really build an FTP server, so at a certain point I
>> need to hit the filesystem.
>>
>> Since the idea is still as a presentation (as opposed to building a
>> production FTP server) I don't want to touch the real filesystem of the
>> host, I want to use a as simple as possible virtual filesystem. It could be
>> backed by DETS or Mnesia for example, or it could stay only in memory.
>>
>> On the other hand, it must still behave as a filesystem, that is, it must
>> be hierarchical, so a direct mapping to a key/value store would not be
>> enough. In the spirit of simplicity, I don't need any concept of read/write
>> permission, I simply need a sort of graph with two types: inner nodes are
>> directories, leaf nodes are files or empty directories.
>>
>> I am thinking to use DETS and somehow introduce a very simple intermediate
>> layer that would offer the impression to be in a graph (each non-leaf node a
>> directory) and map it to the DETS key/value API.
>>
>> Any suggestions?
>>
>> thanks
>> marco
>>
>> [1] https://ninenines.eu/articles/ranch-ftp/
>> _______________________________________________
>> erlang-questions mailing list
>> [hidden email]
>> http://erlang.org/mailman/listinfo/erlang-questions
>
>
> _______________________________________________
> erlang-questions mailing list
> [hidden email]
> http://erlang.org/mailman/listinfo/erlang-questions
>
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: simple virtual file system in Erlang?

Joe Armstrong-2
Somebody  tweeted me me a link to upspin.io

https://upspin.io/doc/overview.md

This has a very good way of naming files - well worth reading

/Joe

On Sun, Mar 26, 2017 at 8:31 AM, Joe Armstrong <[hidden email]> wrote:

> Yes - mapping the path to a content addressable store address would be
> great. It would also be great in a wider context.
>
> A fun extension would be to add a an http interface. Something like
>
>    POST <blob>
>
> To store <blob>,  then later
>
>    GET /blob/sha256/<SHA>
>
> to recover the blob. <SHA> is the SHA256 checksum of the data (the
> path contains the
> type of the checksum - so you might say GETblob/md5/<MD5CHECKSUM>
>
> then add a DHT - then run on every webserver on the planet
> then store all data forever.
>
> The nice thing about this is that it's self-securing - a person-in-the-middle
> cannot change the data without it being detected (since you can check
> the SHA of the
> data when you get it back)
>
>
> Such a system has three relatively simple layers
>
> - transport (say HTTP over TCP)
> - DHT
> - storage
>
> If you ask Jesper nicely he'll tell you all about DHT's :-)
>
> Have fun with the storage layer for this (and no it's NOT a toy project)
>
> /Joe
>
> On Sun, Mar 26, 2017 at 7:57 AM, Jesper Louis Andersen
> <[hidden email]> wrote:
>> The simplest way is probably to map a path to its data, where the path is a
>> list of components. That is, the UNIX file ls in /usr/bin would be
>> represented as ["usr", "bin", "ls"].
>>
>> But you could also go the way of Plan9's venti. Map something like ["usr",
>> "bin", "ls"] into a sha3 checksum of the data currently residing in the file
>> and keep the data inside a storage suitable for storing data by its content
>> addressing. venti also extends this by storing a tree of 64 kilobyte blocks
>> which can then be regarded as the data when taken together. Thus, you have
>> two components: the data store, and the path mapping service, mapping a path
>> to the underlying data.
>>
>> This idea has the advantage of being practically useful in many situations
>> beside a toy example :)
>>
>> On Sun, Mar 26, 2017 at 2:46 PM Marco Molteni <[hidden email]>
>> wrote:
>>>
>>> Hello colleagues,
>>>
>>> I would like to start from the presentation "Build an FTP Server with
>>> Ranch in 30 Minutes" and really build an FTP server, so at a certain point I
>>> need to hit the filesystem.
>>>
>>> Since the idea is still as a presentation (as opposed to building a
>>> production FTP server) I don't want to touch the real filesystem of the
>>> host, I want to use a as simple as possible virtual filesystem. It could be
>>> backed by DETS or Mnesia for example, or it could stay only in memory.
>>>
>>> On the other hand, it must still behave as a filesystem, that is, it must
>>> be hierarchical, so a direct mapping to a key/value store would not be
>>> enough. In the spirit of simplicity, I don't need any concept of read/write
>>> permission, I simply need a sort of graph with two types: inner nodes are
>>> directories, leaf nodes are files or empty directories.
>>>
>>> I am thinking to use DETS and somehow introduce a very simple intermediate
>>> layer that would offer the impression to be in a graph (each non-leaf node a
>>> directory) and map it to the DETS key/value API.
>>>
>>> Any suggestions?
>>>
>>> thanks
>>> marco
>>>
>>> [1] https://ninenines.eu/articles/ranch-ftp/
>>> _______________________________________________
>>> erlang-questions mailing list
>>> [hidden email]
>>> http://erlang.org/mailman/listinfo/erlang-questions
>>
>>
>> _______________________________________________
>> erlang-questions mailing list
>> [hidden email]
>> http://erlang.org/mailman/listinfo/erlang-questions
>>
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: simple virtual file system in Erlang?

Michael Truog
In reply to this post by Marco Molteni
Hi Marco,

There is an undocumented module in the kernel application called ram_file which could allow you to use file functions while operating on data that is only in-memory.  Despite how advantageous this can be, to switch from in-memory file operations to filesystem operations, you shouldn't use that approach until it becomes documented, since it could be removed at any time.  So far, it appears the ram_file module will be removed based on the closed bug at https://bugs.erlang.org/browse/ERL-36

Best Regards,
Michael

On 03/26/2017 05:45 AM, Marco Molteni wrote:

> Hello colleagues,
>
> I would like to start from the presentation "Build an FTP Server with Ranch in 30 Minutes" and really build an FTP server, so at a certain point I need to hit the filesystem.
>
> Since the idea is still as a presentation (as opposed to building a production FTP server) I don't want to touch the real filesystem of the host, I want to use a as simple as possible virtual filesystem. It could be backed by DETS or Mnesia for example, or it could stay only in memory.
>
> On the other hand, it must still behave as a filesystem, that is, it must be hierarchical, so a direct mapping to a key/value store would not be enough. In the spirit of simplicity, I don't need any concept of read/write permission, I simply need a sort of graph with two types: inner nodes are directories, leaf nodes are files or empty directories.
>
> I am thinking to use DETS and somehow introduce a very simple intermediate layer that would offer the impression to be in a graph (each non-leaf node a directory) and map it to the DETS key/value API.
>
> Any suggestions?
>
> thanks
> marco
>
> [1] https://ninenines.eu/articles/ranch-ftp/
> _______________________________________________
> erlang-questions mailing list
> [hidden email]
> http://erlang.org/mailman/listinfo/erlang-questions
>

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: <DKIM> simple virtual file system in Erlang?

Marco Molteni
In reply to this post by Marco Molteni
Hello all,

thanks for the helpful answers, especially the idea of a two-level lookup, path to hash and hash to blob. This would also naturally take care of data deduplication. On the other hand, it would require to keep a reverse index to map a given checksum to all the paths pointing to it. Storing only the path, without notion of intermediate node, would also make very time consuming doing simple operations such as listing the contents of a given directory, I would have to lookup _all_ the paths... So the idea is nice, but it requires refinements :-)

Now time to think a bit more about it.

marco

> On 26 Mar 2017, at 14:45, Marco Molteni <[hidden email]> wrote:
>
> Hello colleagues,
>
> I would like to start from the presentation "Build an FTP Server with Ranch in 30 Minutes" [1] and really build an FTP server, so at a certain point I need to hit the filesystem.
>
> Since the idea is still as a presentation (as opposed to building a production FTP server) I don't want to touch the real filesystem of the host, I want to use a as simple as possible virtual filesystem. It could be backed by DETS or Mnesia for example, or it could stay only in memory.
>
> On the other hand, it must still behave as a filesystem, that is, it must be hierarchical, so a direct mapping to a key/value store would not be enough. In the spirit of simplicity, I don't need any concept of read/write permission, I simply need a sort of graph with two types: inner nodes are directories, leaf nodes are files or empty directories.
>
> I am thinking to use DETS and somehow introduce a very simple intermediate layer that would offer the impression to be in a graph (each non-leaf node a directory) and map it to the DETS key/value API.
>
> Any suggestions?
>
> thanks
> marco
>
> [1] https://ninenines.eu/articles/ranch-ftp/

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: <DKIM> simple virtual file system in Erlang?

Joe Armstrong-2
As regards your comments of performance - I would be extremely cautious about
predicting performance before any code is written and measured. This is the
one area where predictions can be wildly incorrect.

Any operating system worth the name caches large parts of the working set of the
file system in memory which makes predicting performance very difficult.

Even using a RAM file buffer might not help since the OS is itself
caching file blocks
in RAM - I'd go for the simplest and most beautiful design and then measure.

Cheers

/Joe

On Mon, Mar 27, 2017 at 11:07 AM, Marco Molteni
<[hidden email]> wrote:

> Hello all,
>
> thanks for the helpful answers, especially the idea of a two-level lookup, path to hash and hash to blob. This would also naturally take care of data deduplication. On the other hand, it would require to keep a reverse index to map a given checksum to all the paths pointing to it. Storing only the path, without notion of intermediate node, would also make very time consuming doing simple operations such as listing the contents of a given directory, I would have to lookup _all_ the paths... So the idea is nice, but it requires refinements :-)
>
> Now time to think a bit more about it.
>
> marco
>
>> On 26 Mar 2017, at 14:45, Marco Molteni <[hidden email]> wrote:
>>
>> Hello colleagues,
>>
>> I would like to start from the presentation "Build an FTP Server with Ranch in 30 Minutes" [1] and really build an FTP server, so at a certain point I need to hit the filesystem.
>>
>> Since the idea is still as a presentation (as opposed to building a production FTP server) I don't want to touch the real filesystem of the host, I want to use a as simple as possible virtual filesystem. It could be backed by DETS or Mnesia for example, or it could stay only in memory.
>>
>> On the other hand, it must still behave as a filesystem, that is, it must be hierarchical, so a direct mapping to a key/value store would not be enough. In the spirit of simplicity, I don't need any concept of read/write permission, I simply need a sort of graph with two types: inner nodes are directories, leaf nodes are files or empty directories.
>>
>> I am thinking to use DETS and somehow introduce a very simple intermediate layer that would offer the impression to be in a graph (each non-leaf node a directory) and map it to the DETS key/value API.
>>
>> Any suggestions?
>>
>> thanks
>> marco
>>
>> [1] https://ninenines.eu/articles/ranch-ftp/
>
> _______________________________________________
> erlang-questions mailing list
> [hidden email]
> http://erlang.org/mailman/listinfo/erlang-questions
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: simple virtual file system in Erlang?

Jesper Louis Andersen-2
In reply to this post by Joe Armstrong-2
On Sun, Mar 26, 2017 at 7:31 PM Joe Armstrong <[hidden email]> wrote:
Somebody  tweeted me me a link to upspin.io

https://upspin.io/doc/overview.md

This has a very good way of naming files - well worth reading



Upspin does two things right:

1. It uses protobufs for its protocol on top of a simple HTTP layer:


(it is very close to UBF in many ways)

2. It exposes file systems, not final solutions, through Key, Directory and Storage servers.

The latter is especially powerful. It means you can evolve the system as long as it fits inside the rules of a file-system-like object. It abstracts the problem away and provides just a protocol for a client to follow.

The naming scheme in Upspin is indeed brilliant since it allows everyone their own name space.

As a side note: you don't need something like hex.pm if you have something like upspin. Really, all you have to provide is a nice UI layer and the underlying data layer will take care of the rest of the stuff.


_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: simple virtual file system in Erlang?

Lloyd R. Prentice-2
Hi all,

This is flying over my head, but sounds super interesting.

How I'd love to see a tutorial that demonstrates a simple application based on upspin.

Jesper, I know you're hyper busy, but I ask stupid questions can you help sketch out such a tutorial?

Many thanks,

LRP

-----Original Message-----
From: "Jesper Louis Andersen" <[hidden email]>
Sent: Monday, March 27, 2017 4:35pm
To: "Joe Armstrong" <[hidden email]>
Cc: "Erlang-Questions Questions" <[hidden email]>
Subject: Re: [erlang-questions] simple virtual file system in Erlang?

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
On Sun, Mar 26, 2017 at 7:31 PM Joe Armstrong <[hidden email]> wrote:

> Somebody  tweeted me me a link to upspin.io
>
> https://upspin.io/doc/overview.md
>
> This has a very good way of naming files - well worth reading
>
>
>
Upspin does two things right:

1. It uses protobufs for its protocol on top of a simple HTTP layer:

https://github.com/upspin/upspin/blob/master/upspin/proto/upspin.proto

(it is very close to UBF in many ways)

2. It exposes file systems, not final solutions, through Key, Directory and
Storage servers.

The latter is especially powerful. It means you can evolve the system as
long as it fits inside the rules of a file-system-like object. It abstracts
the problem away and provides just a protocol for a client to follow.

The naming scheme in Upspin is indeed brilliant since it allows everyone
their own name space.

As a side note: you don't need something like hex.pm if you have something
like upspin. Really, all you have to provide is a nice UI layer and the
underlying data layer will take care of the rest of the stuff.


_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: simple virtual file system in Erlang?

Richard A. O'Keefe-2
In reply to this post by Joe Armstrong-2

> On 27/03/2017, at 4:31 AM, Joe Armstrong <[hidden email]> wrote:
> To store <blob>,  then later
>
>   GET /blob/sha256/<SHA>
>
> to recover the blob. <SHA> is the SHA256 checksum of the data (the
> path contains the
> type of the checksum - so you might say GETblob/md5/<MD5CHECKSUM>

It seems to me that this amounts to using a checksum of the contents
of a file instead of an inode number.

What I don't understand is how I would ever use this.
When I go looking for a file, I want its *contents*,
which I don't know, otherwise I wouldn't need it.
So I *can't* compute its checksum.  With things like
Spotlight, I can use a known *part* of the content to
look for the rest of the content, but again, if I
already knew enough of the content to compute the
checksum, I wouldn't bother looking at the file.

The only way I can think of to use something like this
is to maintain some sort of name-based directory
structure on top, or some sort of IR-like inverted index
based on part of the content.


_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: simple virtual file system in Erlang?

Michael Truog
On 03/27/2017 02:23 PM, Richard A. O'Keefe wrote:

>> On 27/03/2017, at 4:31 AM, Joe Armstrong <[hidden email]> wrote:
>> To store <blob>,  then later
>>
>>    GET /blob/sha256/<SHA>
>>
>> to recover the blob. <SHA> is the SHA256 checksum of the data (the
>> path contains the
>> type of the checksum - so you might say GETblob/md5/<MD5CHECKSUM>
> It seems to me that this amounts to using a checksum of the contents
> of a file instead of an inode number.
>
> What I don't understand is how I would ever use this.
> When I go looking for a file, I want its *contents*,
> which I don't know, otherwise I wouldn't need it.
> So I *can't* compute its checksum.  With things like
> Spotlight, I can use a known *part* of the content to
> look for the rest of the content, but again, if I
> already knew enough of the content to compute the
> checksum, I wouldn't bother looking at the file.
>
> The only way I can think of to use something like this
> is to maintain some sort of name-based directory
> structure on top, or some sort of IR-like inverted index
> based on part of the content.
>
>
> _______________________________________________
> erlang-questions mailing list
> [hidden email]
> http://erlang.org/mailman/listinfo/erlang-questions
>

Yes, it is normally for a name-based directory related to peer-to-peer file sharing.  The links have been called "magnet links" as described at https://en.wikipedia.org/wiki/Magnet_URI_scheme .

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: simple virtual file system in Erlang?

Joe Armstrong-2
In reply to this post by Richard A. O'Keefe-2
You're quite right - you need to know in advance the checksum of the data.

But obtaining the checksum from "somewhere" is orthogonal to the problem
of storing or obtaining the data. Indexing etc. is completely different problem.

/Joe

On Mon, Mar 27, 2017 at 11:23 PM, Richard A. O'Keefe <[hidden email]> wrote:

>
>> On 27/03/2017, at 4:31 AM, Joe Armstrong <[hidden email]> wrote:
>> To store <blob>,  then later
>>
>>   GET /blob/sha256/<SHA>
>>
>> to recover the blob. <SHA> is the SHA256 checksum of the data (the
>> path contains the
>> type of the checksum - so you might say GETblob/md5/<MD5CHECKSUM>
>
> It seems to me that this amounts to using a checksum of the contents
> of a file instead of an inode number.
>
> What I don't understand is how I would ever use this.
> When I go looking for a file, I want its *contents*,
> which I don't know, otherwise I wouldn't need it.
> So I *can't* compute its checksum.  With things like
> Spotlight, I can use a known *part* of the content to
> look for the rest of the content, but again, if I
> already knew enough of the content to compute the
> checksum, I wouldn't bother looking at the file.
>
> The only way I can think of to use something like this
> is to maintain some sort of name-based directory
> structure on top, or some sort of IR-like inverted index
> based on part of the content.
>
>
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: simple virtual file system in Erlang?

Robert Raschke
Two papers worth reading when thinking about storage and file systems:

Venti, a content addressable archival storage: http://doc.cat-v.org/plan_9/4th_edition/papers/venti/
Fossil, a filesystem that can use venti for snapshot archives: http://doc.cat-v.org/plan_9/4th_edition/papers/fossil/


On 28 March 2017 at 12:45, Joe Armstrong <[hidden email]> wrote:
You're quite right - you need to know in advance the checksum of the data.

But obtaining the checksum from "somewhere" is orthogonal to the problem
of storing or obtaining the data. Indexing etc. is completely different problem.

/Joe

On Mon, Mar 27, 2017 at 11:23 PM, Richard A. O'Keefe <[hidden email]> wrote:
>
>> On 27/03/2017, at 4:31 AM, Joe Armstrong <[hidden email]> wrote:
>> To store <blob>,  then later
>>
>>   GET /blob/sha256/<SHA>
>>
>> to recover the blob. <SHA> is the SHA256 checksum of the data (the
>> path contains the
>> type of the checksum - so you might say GETblob/md5/<MD5CHECKSUM>
>
> It seems to me that this amounts to using a checksum of the contents
> of a file instead of an inode number.
>
> What I don't understand is how I would ever use this.
> When I go looking for a file, I want its *contents*,
> which I don't know, otherwise I wouldn't need it.
> So I *can't* compute its checksum.  With things like
> Spotlight, I can use a known *part* of the content to
> look for the rest of the content, but again, if I
> already knew enough of the content to compute the
> checksum, I wouldn't bother looking at the file.
>
> The only way I can think of to use something like this
> is to maintain some sort of name-based directory
> structure on top, or some sort of IR-like inverted index
> based on part of the content.
>
>
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions


_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Loading...