# How to split a single huge server?

## How to split a single huge server?

 Hi,   I have a single-process solution for a server that, I feel, could be implementet in a more concurrent fashion.  Its internal state is a vector that consists of n numbers. It receives casts without parameters.  Every cast is basically a matrix multiplication. The matrices are known in advance.  So, if the internal state is v, and the server process receives a cast, and it has the matrix M associated with that cast, then the new state will be M*v (M is a matrix, * is matrix multiplication, and v is a column vector representing the previous internal state of the server process).   My problem is that the size of v (and the matrices) can be very large, and my server runs in a single process.  It would be great to split it apart and use several smaller processes to calculate the new state.  I would like to use separate processes for each number in v.  But because of the nature of matrix multiplication, that's not so easy to achieve, because in order to be able to calculate a single number in the new state, I need to know all the numbers in the previous state. The prev state could be shared between processes in advance, but that would require large messages containing all the old values to all the processes around. I believe that is a wrong idea and there must be a better one.   My question is how would you crach this problem if efficiency matters?     I have one possible solution in mind, and would like to know your opinion:  there could be a main server process and size(v) calculator processes, one for every number in v.  The server process handles the casts, and it has the whole v vector in it.  The calculator processes have only one number from v, and they have the proper lines from M.  When a cast arrives, the server process builds a fun that has the whole v encoded in it as a clojure, and this anon function gets sent to the calculator processes.  Then the calculator processes apply the fun to the appropriate line of M, and they have the new number that they have to send back to the server.  I'm not sure if sending a huge function with a large body is cheaper than sending a large list of numbers, but I hope there's some optimisation going on in BEAM with funs...  Am I right with all this?   THX
## Re: How to split a single huge server?

 How large are these matrices ? It all sounds like you'd end up wasting resources on inter-process communication and just about everything else. Remember we've had desktop PCs with instructions (MMX, SSE) that do maybe 8 FLOPS per cycle for about twenty years now but you're not going to be able to take advantage of this by re-inventing matrix multiplication. Jon
## Re: How to split a single huge server?

 How large are these matrices ?   They can be huge.  2^16 x 2^16 is an average one but can be much larger. Does the anonymous function trick help me out?   So basically what you suggest is that I need an external piece of software written in a low level language that uses low level SIMD instructions on the bare CPU or maybe uses GPU.
## Re: How to split a single huge server?

 There are languages such as Python (numpy, numba) or Haskell (repa, accelerate) that have high level and fast implementations of these operations in libraries, supporting CPU and/or GPU. I'm not aware of such popular bindings for Erlang, most of the ones I've come across don't seem to be maintained but cl looks like it might be promising for working with the GPU from Erlang: https://github.com/tonyrog/cl On Tue, Aug 5, 2014 at 7:12 AM, semmit mondo wrote:   How large are these matrices ?   They can be huge.  2^16 x 2^16 is an average one but can be much larger. Does the anonymous function trick help me out?   So basically what you suggest is that I need an external piece of software written in a low level language that uses low level SIMD instructions on the bare CPU or maybe uses GPU.
## Re: How to split a single huge server?

## Re: How to split a single huge server?

 Hello, I think the original question was how to run a matrix to vector multiplication efficiently in parallel. The usage of C libraries is one way to handle it. However, we do not see the description of actual use-case. At least, it was not clear to me… was the question about pure "math" multiplication or something else. All-in-all Here is very good description how to run matrix to vert multiplication in parallel. I guess you can dap at the technique for your need. http://www.hpcc.unn.ru/mskurs/ENG/DOC/pp07.pdf- Dmitry
