Quantcast

Trie fun

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Trie fun

Mike Oxford
Erlang's MR against a trie seems ideal for parallelizing a word filter.
However, the filter-terms becomes the bottleneck.

In many languages we'd put it into memory and acquire a
pointer/reference to it, spawn the worker threads and pass them the
address of the filter list.
This is efficient as it's static-data and only in memory in one place.

In Erlang we build up the trie and then send it messages, leading to a
bottleneck at the filter-list's message queue.
Okay, so we can spawn multiple filters, and ship words to them, but
every filter has to build up the filter-list from storage (compiled
in, in this case, to minimize overhead.)
Now we have 'm' words going to 'n' filters.  Except for the startup
costs, that doesn't seem too bad.

The Trie is static, prebuild it and store it as a binary "variable",
so it's just blasted into memory instead of actually being built?  Any
downsides? (Theory; I don't know if you
can "image" memory this way in Erlang yet.)

Is there a better way?  Yes, I could do it in C and link it in but I'm
interested in the Erlang way.  :)

Thanks!

-mox
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: Trie fun

Gleb Peregud
Maybe creating a module in runtime, where a function returns a
prebuild Trie as a static term, will be a good solution? This way
processess will be able to share the same Trie in the same area of
memory. AFAIK static terms from modules are not copied from module's
"static term storage" to each process which uses it.

On Mon, Jun 27, 2011 at 20:38, Mike Oxford <[hidden email]> wrote:

> Erlang's MR against a trie seems ideal for parallelizing a word filter.
> However, the filter-terms becomes the bottleneck.
>
> In many languages we'd put it into memory and acquire a
> pointer/reference to it, spawn the worker threads and pass them the
> address of the filter list.
> This is efficient as it's static-data and only in memory in one place.
>
> In Erlang we build up the trie and then send it messages, leading to a
> bottleneck at the filter-list's message queue.
> Okay, so we can spawn multiple filters, and ship words to them, but
> every filter has to build up the filter-list from storage (compiled
> in, in this case, to minimize overhead.)
> Now we have 'm' words going to 'n' filters.  Except for the startup
> costs, that doesn't seem too bad.
>
> The Trie is static, prebuild it and store it as a binary "variable",
> so it's just blasted into memory instead of actually being built?  Any
> downsides? (Theory; I don't know if you
> can "image" memory this way in Erlang yet.)
>
> Is there a better way?  Yes, I could do it in C and link it in but I'm
> interested in the Erlang way.  :)
>
> Thanks!
>
> -mox
> _______________________________________________
> erlang-questions mailing list
> [hidden email]
> http://erlang.org/mailman/listinfo/erlang-questions
>
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: Trie fun

Michael Truog
In reply to this post by Mike Oxford
If you use https://github.com/okeuday/trie with https://github.com/esl/parse_trans/blob/master/src/ct_expand.erl you could have a static prebuild trie at compile-time.  That should be the best solution for a static trie in Erlang.

On 06/27/2011 11:38 AM, Mike Oxford wrote:

> Erlang's MR against a trie seems ideal for parallelizing a word filter.
> However, the filter-terms becomes the bottleneck.
>
> In many languages we'd put it into memory and acquire a
> pointer/reference to it, spawn the worker threads and pass them the
> address of the filter list.
> This is efficient as it's static-data and only in memory in one place.
>
> In Erlang we build up the trie and then send it messages, leading to a
> bottleneck at the filter-list's message queue.
> Okay, so we can spawn multiple filters, and ship words to them, but
> every filter has to build up the filter-list from storage (compiled
> in, in this case, to minimize overhead.)
> Now we have 'm' words going to 'n' filters.  Except for the startup
> costs, that doesn't seem too bad.
>
> The Trie is static, prebuild it and store it as a binary "variable",
> so it's just blasted into memory instead of actually being built?  Any
> downsides? (Theory; I don't know if you
> can "image" memory this way in Erlang yet.)
>
> Is there a better way?  Yes, I could do it in C and link it in but I'm
> interested in the Erlang way.  :)
>
> Thanks!
>
> -mox
> _______________________________________________
> erlang-questions mailing list
> [hidden email]
> http://erlang.org/mailman/listinfo/erlang-questions
>

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Loading...