Erlang on openstack VM with numa awareness

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Erlang on openstack VM with numa awareness

Satish Patel
Hi,

I am trying to run erlang application on openstack vm and getting very
poor performance and after testing i found something going on with
NUMA, This is what i observe in my test.

My openstack compute host with 32 core so i have created 30 vCPU core
vm on it which has all NUMA awareness, when i am running Erlang
application benchmark on this VM getting worst performance but then i
create new VM with 16 vCPU core (In this case my all VM cpu pinned
with Numa-0 node) and in this case benchmark result was great.

based on above test its clear if i keep VM on single numa node then
performance is much better but when i spread it out to multiple numa
zone it get worse.

But interesting thing is when i run same erlang application run on
bare metal then performance is really good, so trying to understand
why same application running on VM doesn't perform well?

Is there any setting in erlang to better fit with NUMA when running on
virtual machine?
Reply | Threaded
Open this post in threaded view
|

Re: Erlang on openstack VM with numa awareness

Ameretat Reith


On Fri, Feb 7, 2020, 4:47 PM Satish Patel <[hidden email]> wrote:
My openstack compute host with 32 core so i have created 30 vCPU core
vm on it which has all NUMA awareness, when i am running Erlang
application benchmark on this VM getting worst performance but then i
create new VM with 16 vCPU core (In this case my all VM cpu pinned
with Numa-0 node) and in this case benchmark result was great.

I don't know about how OpenStack utilize NUMA, but I had same experience benchmarking QEMU VMs; best VMs were the ones all vCPUs pinned to cores on single host NUMA node. Next to them, were VMs that half CPUs pinned to cpuset in one NUMA node and other to another node, Of course I set proper NUMA topology by QEMU args. Worst performance were VMs with no pinning/affinity setting which means, QEMU defined two NUMA nodes (as host) while vCPUs getting swapped over different cores on different real NUMA nodes. I see it expectable. My tests were targetting nothing Erlang based, it was redis and some other things.

But interesting thing is when i run same erlang application run on
bare metal then performance is really good, so trying to understand
why same application running on VM doesn't perform well?

I think because Erlang is smart about NUMA but in virtualized environment, its knowledge about NUMA nodes are not reliable without CPU pinning.

Is there any setting in erlang to better fit with NUMA when running on
virtual machine?

I don't believe it's something Erlang could offer. I would just test pinning VM vCPUs to different CPU sets based their NUMA node and set proper topology on VM. Then i expect Erlang utilize NUMA nodes as smart as it does on bare metal.
Reply | Threaded
Open this post in threaded view
|

Re: Erlang on openstack VM with numa awareness

Satish Patel
Thanks,

I did what you suggested, I’m getting best result on CPU pinning on single NUMA0 but in that case I’m wasting my CPU resources. When I trying to CPU pinning with dual NUMA then performance is 50% less, I have use all available option to correct CPU Topology, threads sibling on same core etc. but still erlang doesn’t like dual NUMA VM. 

It feels like erlang understand where I’m running and trying to adjust itself or restrict for something. Same erlang working fine on bare metal but not on VM with equal amount of CPU and memory.  

Sent from my iPhone

On Feb 7, 2020, at 8:45 AM, Ameretat Reith <[hidden email]> wrote:




On Fri, Feb 7, 2020, 4:47 PM Satish Patel <[hidden email]> wrote:
My openstack compute host with 32 core so i have created 30 vCPU core
vm on it which has all NUMA awareness, when i am running Erlang
application benchmark on this VM getting worst performance but then i
create new VM with 16 vCPU core (In this case my all VM cpu pinned
with Numa-0 node) and in this case benchmark result was great.

I don't know about how OpenStack utilize NUMA, but I had same experience benchmarking QEMU VMs; best VMs were the ones all vCPUs pinned to cores on single host NUMA node. Next to them, were VMs that half CPUs pinned to cpuset in one NUMA node and other to another node, Of course I set proper NUMA topology by QEMU args. Worst performance were VMs with no pinning/affinity setting which means, QEMU defined two NUMA nodes (as host) while vCPUs getting swapped over different cores on different real NUMA nodes. I see it expectable. My tests were targetting nothing Erlang based, it was redis and some other things.

But interesting thing is when i run same erlang application run on
bare metal then performance is really good, so trying to understand
why same application running on VM doesn't perform well?

I think because Erlang is smart about NUMA but in virtualized environment, its knowledge about NUMA nodes are not reliable without CPU pinning.

Is there any setting in erlang to better fit with NUMA when running on
virtual machine?

I don't believe it's something Erlang could offer. I would just test pinning VM vCPUs to different CPU sets based their NUMA node and set proper topology on VM. Then i expect Erlang utilize NUMA nodes as smart as it does on bare metal.
Reply | Threaded
Open this post in threaded view
|

Re: Erlang on openstack VM with numa awareness

Ameretat Reith

I did what you suggested, I’m getting best result on CPU pinning on single NUMA0 but in that case I’m wasting my CPU resources. When I trying to CPU pinning with dual NUMA then performance is 50% less, I have use all available option to correct CPU Topology, threads sibling on same core etc. but still erlang doesn’t like dual NUMA VM. 
I know libvirt can do CPU pinning so OpenStack can leverage that but never tested that. Until someone come with better idea, I suggest fire up Qemu instances with two NUMA nodes and two cpusets. Then find Qemu VCPu processes (pstree is your friend) and pin them to actual cores in according to NUMA nodes. If you still got poor performance, I suggest testing something non-Erlang based, like Redis.
Reply | Threaded
Open this post in threaded view
|

Re: Erlang on openstack VM with numa awareness

Satish Patel
This is what we did for experiments, we told erlang scheduler to bind with numa0 cores with +s 16 option and we got good result 

After that we told erlang to bind numa1 cores and we got good result. 

But when I tell erlang bind both NUMA cores then getting worst result, it seems erlang is messing with NUMA when it try to run on both NUMA. 

We are running 18.x version. 

Sent from my iPhone

On Feb 17, 2020, at 8:28 AM, Ameretat Reith <[hidden email]> wrote:



I did what you suggested, I’m getting best result on CPU pinning on single NUMA0 but in that case I’m wasting my CPU resources. When I trying to CPU pinning with dual NUMA then performance is 50% less, I have use all available option to correct CPU Topology, threads sibling on same core etc. but still erlang doesn’t like dual NUMA VM. 
I know libvirt can do CPU pinning so OpenStack can leverage that but never tested that. Until someone come with better idea, I suggest fire up Qemu instances with two NUMA nodes and two cpusets. Then find Qemu VCPu processes (pstree is your friend) and pin them to actual cores in according to NUMA nodes. If you still got poor performance, I suggest testing something non-Erlang based, like Redis.