locked up system using :ets.match_object

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

locked up system using :ets.match_object

Vans S
I am having some performance trouble in a system that does a few queries on a small ets table of around 10,000 records.

Basically with around 500 concurrent processes, everything is fine, 1500 I start to notice some small degradation, at around 3000 concurrent processes the schedulers grind to a halt, TOP system CPU usage is around 50%, but Erlang scheduler usage (scheduler:utilization) is 100% and capped out on all 40 threads.

I am guessing the schedulers are all waiting on locks on the ets table.  I thought match_object and ets was quite optimized these days, using R22, I am wondering if there is some synchronization/locking issues that could be addressed.  Because I mean at 3000 processes maybe hitting that table 10 times per second on average, does not seem like much. 30k match_objects per second, with ongoing inserts. 

Also would there be a way to debug/pinpoint this is the exact issue?  I just did A/B testing where I turned off parts of the system, when I turned off the part that does the match_objects on the ETS table, the system ran fine and never deadlocked at 100% scheduler usage.  Its also hard to profile, as the system is so locked up the profiler barely runs.

For now it seems the solution is to rework the architecture and put a second cached view ETS table, so the match_objects can be replaced with key lookups.  Which gets filled by a single process running that pulls via match_object from the main table and fills the cache.
Led
Reply | Threaded
Open this post in threaded view
|

Re: locked up system using :ets.match_object

Led
I am having some performance trouble in a system that does a few queries on a small ets table of around 10,000 records.

Basically with around 500 concurrent processes, everything is fine, 1500 I start to notice some small degradation, at around 3000 concurrent processes the schedulers grind to a halt, TOP system CPU usage is around 50%, but Erlang scheduler usage (scheduler:utilization) is 100% and capped out on all 40 threads.

I am guessing the schedulers are all waiting on locks on the ets table.  I thought match_object and ets was quite optimized these days, using R22, I am wondering if there is some synchronization/locking issues that could be addressed.  Because I mean at 3000 processes maybe hitting that table 10 times per second on average, does not seem like much. 30k match_objects per second, with ongoing inserts. 

Also would there be a way to debug/pinpoint this is the exact issue?  I just did A/B testing where I turned off parts of the system, when I turned off the part that does the match_objects on the ETS table, the system ran fine and never deadlocked at 100% scheduler usage.  Its also hard to profile, as the system is so locked up the profiler barely runs.

For now it seems the solution is to rework the architecture and put a second cached view ETS table, so the match_objects can be replaced with key lookups.  Which gets filled by a single process running that pulls via match_object from the main table and fills the cache.

You didn't specify parameters of your table.

--
Led.
Reply | Threaded
Open this post in threaded view
|

Re: locked up system using :ets.match_object

Vans S
Table parameters are ordered_set, concurrent read and write.

On Friday, January 17, 2020, 01:10:17 p.m. EST, Led <[hidden email]> wrote:


I am having some performance trouble in a system that does a few queries on a small ets table of around 10,000 records.

Basically with around 500 concurrent processes, everything is fine, 1500 I start to notice some small degradation, at around 3000 concurrent processes the schedulers grind to a halt, TOP system CPU usage is around 50%, but Erlang scheduler usage (scheduler:utilization) is 100% and capped out on all 40 threads.

I am guessing the schedulers are all waiting on locks on the ets table.  I thought match_object and ets was quite optimized these days, using R22, I am wondering if there is some synchronization/locking issues that could be addressed.  Because I mean at 3000 processes maybe hitting that table 10 times per second on average, does not seem like much. 30k match_objects per second, with ongoing inserts. 

Also would there be a way to debug/pinpoint this is the exact issue?  I just did A/B testing where I turned off parts of the system, when I turned off the part that does the match_objects on the ETS table, the system ran fine and never deadlocked at 100% scheduler usage.  Its also hard to profile, as the system is so locked up the profiler barely runs.

For now it seems the solution is to rework the architecture and put a second cached view ETS table, so the match_objects can be replaced with key lookups.  Which gets filled by a single process running that pulls via match_object from the main table and fills the cache.

You didn't specify parameters of your table.

--
Led.
Reply | Threaded
Open this post in threaded view
|

Re: locked up system using :ets.match_object

Sverker Eriksson-5
In reply to this post by Led
On fre, 2020-01-17 at 20:09 +0200, Led wrote:
I am having some performance trouble in a system that does a few queries on a small ets table of around 10,000 records.

Basically with around 500 concurrent processes, everything is fine, 1500 I start to notice some small degradation, at around 3000 concurrent processes the schedulers grind to a halt, TOP system CPU usage is around 50%, but Erlang scheduler usage (scheduler:utilization) is 100% and capped out on all 40 threads.

I am guessing the schedulers are all waiting on locks on the ets table.  I thought match_object and ets was quite optimized these days, using R22, I am wondering if there is some synchronization/locking issues that could be addressed.  Because I mean at 3000 processes maybe hitting that table 10 times per second on average, does not seem like much. 30k match_objects per second, with ongoing inserts. 

Also would there be a way to debug/pinpoint this is the exact issue?  I just did A/B testing where I turned off parts of the system, when I turned off the part that does the match_objects on the ETS table, the system ran fine and never deadlocked at 100% scheduler usage.  Its also hard to profile, as the system is so locked up the profiler barely runs.

For now it seems the solution is to rework the architecture and put a second cached view ETS table, so the match_objects can be replaced with key lookups.  Which gets filled by a single process running that pulls via match_object from the main table and fills the cache.


You didn't specify parameters of your table.



And what's the frequency of those inserts that you mention.

ets:match_object is a read-only operation and should only inflict lock contention with other write operations, such as ets:insert.


/Sverker

Reply | Threaded
Open this post in threaded view
|

Re: locked up system using :ets.match_object

Vans S

I really want to measure this so I can have some facts, IMO the performance is degrading way too much for such a small workload.  The frequency is these 3000 processes do 1 write to the table every 15 minutes, so about 3.3 writes per second. (as the processes start at different times). The processes match_object on the table about 30000 times per second, but in bursts, so 10 operations can happen in a single function then it would back off for a few seconds or more.
On Friday, January 17, 2020, 02:20:05 p.m. EST, Sverker Eriksson <[hidden email]> wrote:


On fre, 2020-01-17 at 20:09 +0200, Led wrote:
I am having some performance trouble in a system that does a few queries on a small ets table of around 10,000 records.

Basically with around 500 concurrent processes, everything is fine, 1500 I start to notice some small degradation, at around 3000 concurrent processes the schedulers grind to a halt, TOP system CPU usage is around 50%, but Erlang scheduler usage (scheduler:utilization) is 100% and capped out on all 40 threads.

I am guessing the schedulers are all waiting on locks on the ets table.  I thought match_object and ets was quite optimized these days, using R22, I am wondering if there is some synchronization/locking issues that could be addressed.  Because I mean at 3000 processes maybe hitting that table 10 times per second on average, does not seem like much. 30k match_objects per second, with ongoing inserts. 

Also would there be a way to debug/pinpoint this is the exact issue?  I just did A/B testing where I turned off parts of the system, when I turned off the part that does the match_objects on the ETS table, the system ran fine and never deadlocked at 100% scheduler usage.  Its also hard to profile, as the system is so locked up the profiler barely runs.

For now it seems the solution is to rework the architecture and put a second cached view ETS table, so the match_objects can be replaced with key lookups.  Which gets filled by a single process running that pulls via match_object from the main table and fills the cache.


You didn't specify parameters of your table.



And what's the frequency of those inserts that you mention.

ets:match_object is a read-only operation and should only inflict lock contention with other write operations, such as ets:insert.


/Sverker

Reply | Threaded
Open this post in threaded view
|

Re: locked up system using :ets.match_object

Sverker Eriksson-5
Have you tried without read_concurrency?

What does ets:info(T, stats) after running for a while?



On fre, 2020-01-17 at 19:27 +0000, Vans S wrote:

I really want to measure this so I can have some facts, IMO the performance is degrading way too much for such a small workload.  The frequency is these 3000 processes do 1 write to the table every 15 minutes, so about 3.3 writes per second. (as the processes start at different times). The processes match_object on the table about 30000 times per second, but in bursts, so 10 operations can happen in a single function then it would back off for a few seconds or more.
On Friday, January 17, 2020, 02:20:05 p.m. EST, Sverker Eriksson <[hidden email]> wrote:


On fre, 2020-01-17 at 20:09 +0200, Led wrote:
I am having some performance trouble in a system that does a few queries on a small ets table of around 10,000 records.

Basically with around 500 concurrent processes, everything is fine, 1500 I start to notice some small degradation, at around 3000 concurrent processes the schedulers grind to a halt, TOP system CPU usage is around 50%, but Erlang scheduler usage (scheduler:utilization) is 100% and capped out on all 40 threads.

I am guessing the schedulers are all waiting on locks on the ets table.  I thought match_object and ets was quite optimized these days, using R22, I am wondering if there is some synchronization/locking issues that could be addressed.  Because I mean at 3000 processes maybe hitting that table 10 times per second on average, does not seem like much. 30k match_objects per second, with ongoing inserts. 

Also would there be a way to debug/pinpoint this is the exact issue?  I just did A/B testing where I turned off parts of the system, when I turned off the part that does the match_objects on the ETS table, the system ran fine and never deadlocked at 100% scheduler usage.  Its also hard to profile, as the system is so locked up the profiler barely runs.

For now it seems the solution is to rework the architecture and put a second cached view ETS table, so the match_objects can be replaced with key lookups.  Which gets filled by a single process running that pulls via match_object from the main table and fills the cache.


You didn't specify parameters of your table.



And what's the frequency of those inserts that you mention.

ets:match_object is a read-only operation and should only inflict lock contention with other write operations, such as ets:insert.


/Sverker

Reply | Threaded
Open this post in threaded view
|

Re: locked up system using :ets.match_object

Vans S
The table is a mnesia table so ets:info/2 does not seem to work.  I narrowed it down and it seemed to indeed be match_object just costing too much cpu time and perhaps locking the table. Ended up rewriting the table scanning algo (instead of match_object running around 100 * 2000 times, dump full table once and use Process dictionary to manipulate / filter / organize) and building a cache.

The runtime seems stable, it would still be interesting to diagnose those locks does mnesia have something similar to ets:info/2 ?

On Friday, January 17, 2020, 03:06:07 p.m. EST, Sverker Eriksson <[hidden email]> wrote:


Have you tried without read_concurrency?

What does ets:info(T, stats) after running for a while?



On fre, 2020-01-17 at 19:27 +0000, Vans S wrote:

I really want to measure this so I can have some facts, IMO the performance is degrading way too much for such a small workload.  The frequency is these 3000 processes do 1 write to the table every 15 minutes, so about 3.3 writes per second. (as the processes start at different times). The processes match_object on the table about 30000 times per second, but in bursts, so 10 operations can happen in a single function then it would back off for a few seconds or more.
On Friday, January 17, 2020, 02:20:05 p.m. EST, Sverker Eriksson <[hidden email]> wrote:


On fre, 2020-01-17 at 20:09 +0200, Led wrote:
I am having some performance trouble in a system that does a few queries on a small ets table of around 10,000 records.

Basically with around 500 concurrent processes, everything is fine, 1500 I start to notice some small degradation, at around 3000 concurrent processes the schedulers grind to a halt, TOP system CPU usage is around 50%, but Erlang scheduler usage (scheduler:utilization) is 100% and capped out on all 40 threads.

I am guessing the schedulers are all waiting on locks on the ets table.  I thought match_object and ets was quite optimized these days, using R22, I am wondering if there is some synchronization/locking issues that could be addressed.  Because I mean at 3000 processes maybe hitting that table 10 times per second on average, does not seem like much. 30k match_objects per second, with ongoing inserts. 

Also would there be a way to debug/pinpoint this is the exact issue?  I just did A/B testing where I turned off parts of the system, when I turned off the part that does the match_objects on the ETS table, the system ran fine and never deadlocked at 100% scheduler usage.  Its also hard to profile, as the system is so locked up the profiler barely runs.

For now it seems the solution is to rework the architecture and put a second cached view ETS table, so the match_objects can be replaced with key lookups.  Which gets filled by a single process running that pulls via match_object from the main table and fills the cache.


You didn't specify parameters of your table.



And what's the frequency of those inserts that you mention.

ets:match_object is a read-only operation and should only inflict lock contention with other write operations, such as ets:insert.


/Sverker

Reply | Threaded
Open this post in threaded view
|

Re: locked up system using :ets.match_object

Dan Gudmundsson-2

mnesia:table_info(..)

But mnesia is implemented with ets tables so ets:info should work just fine :-)

On Sat, Jan 18, 2020 at 8:01 AM Vans S <[hidden email]> wrote:
The table is a mnesia table so ets:info/2 does not seem to work.  I narrowed it down and it seemed to indeed be match_object just costing too much cpu time and perhaps locking the table. Ended up rewriting the table scanning algo (instead of match_object running around 100 * 2000 times, dump full table once and use Process dictionary to manipulate / filter / organize) and building a cache.

The runtime seems stable, it would still be interesting to diagnose those locks does mnesia have something similar to ets:info/2 ?

On Friday, January 17, 2020, 03:06:07 p.m. EST, Sverker Eriksson <[hidden email]> wrote:


Have you tried without read_concurrency?

What does ets:info(T, stats) after running for a while?



On fre, 2020-01-17 at 19:27 +0000, Vans S wrote:

I really want to measure this so I can have some facts, IMO the performance is degrading way too much for such a small workload.  The frequency is these 3000 processes do 1 write to the table every 15 minutes, so about 3.3 writes per second. (as the processes start at different times). The processes match_object on the table about 30000 times per second, but in bursts, so 10 operations can happen in a single function then it would back off for a few seconds or more.
On Friday, January 17, 2020, 02:20:05 p.m. EST, Sverker Eriksson <[hidden email]> wrote:


On fre, 2020-01-17 at 20:09 +0200, Led wrote:
I am having some performance trouble in a system that does a few queries on a small ets table of around 10,000 records.

Basically with around 500 concurrent processes, everything is fine, 1500 I start to notice some small degradation, at around 3000 concurrent processes the schedulers grind to a halt, TOP system CPU usage is around 50%, but Erlang scheduler usage (scheduler:utilization) is 100% and capped out on all 40 threads.

I am guessing the schedulers are all waiting on locks on the ets table.  I thought match_object and ets was quite optimized these days, using R22, I am wondering if there is some synchronization/locking issues that could be addressed.  Because I mean at 3000 processes maybe hitting that table 10 times per second on average, does not seem like much. 30k match_objects per second, with ongoing inserts. 

Also would there be a way to debug/pinpoint this is the exact issue?  I just did A/B testing where I turned off parts of the system, when I turned off the part that does the match_objects on the ETS table, the system ran fine and never deadlocked at 100% scheduler usage.  Its also hard to profile, as the system is so locked up the profiler barely runs.

For now it seems the solution is to rework the architecture and put a second cached view ETS table, so the match_objects can be replaced with key lookups.  Which gets filled by a single process running that pulls via match_object from the main table and fills the cache.


You didn't specify parameters of your table.



And what's the frequency of those inserts that you mention.

ets:match_object is a read-only operation and should only inflict lock contention with other write operations, such as ets:insert.


/Sverker

Reply | Threaded
Open this post in threaded view
|

Re: locked up system using :ets.match_object

Vans S
I ended up using microstate accounts plus lock counting. ETS locks and MSAC was quite low.  Surprisingly I ended up having very high contention for the memory allocator,  msac showed aux and allocator using around 15-30% each (not sure if these 2 values combine or are seperate).  Again to refresh, my problem was I had 100% scheduler CPU usage (backed up runqueue) with only 50% system cpu usage.

I ended up bumping the minimal heapsize by a large order of magnitude to the average words count the worker processes were using.  Dropped the lock contention on alloc and scheduler usage in alloc to near 0, also aux dropped by a large amount.

I noticed when doing this some processes spiked insanely in their memory allocated, I am wondering if theres a way to profile this? The ideal scenario would be, line number + amount of objects allocated / type of object.  A still good scenario, process pid + objects (so can inspect them for what they are).  A pretty hard to debug scenario, just the memory consumption and type.

On Saturday, January 18, 2020, 03:15:41 a.m. EST, Dan Gudmundsson <[hidden email]> wrote:



mnesia:table_info(..)

But mnesia is implemented with ets tables so ets:info should work just fine :-)

On Sat, Jan 18, 2020 at 8:01 AM Vans S <[hidden email]> wrote:
The table is a mnesia table so ets:info/2 does not seem to work.  I narrowed it down and it seemed to indeed be match_object just costing too much cpu time and perhaps locking the table. Ended up rewriting the table scanning algo (instead of match_object running around 100 * 2000 times, dump full table once and use Process dictionary to manipulate / filter / organize) and building a cache.

The runtime seems stable, it would still be interesting to diagnose those locks does mnesia have something similar to ets:info/2 ?

On Friday, January 17, 2020, 03:06:07 p.m. EST, Sverker Eriksson <[hidden email]> wrote:


Have you tried without read_concurrency?

What does ets:info(T, stats) after running for a while?



On fre, 2020-01-17 at 19:27 +0000, Vans S wrote:

I really want to measure this so I can have some facts, IMO the performance is degrading way too much for such a small workload.  The frequency is these 3000 processes do 1 write to the table every 15 minutes, so about 3.3 writes per second. (as the processes start at different times). The processes match_object on the table about 30000 times per second, but in bursts, so 10 operations can happen in a single function then it would back off for a few seconds or more.
On Friday, January 17, 2020, 02:20:05 p.m. EST, Sverker Eriksson <[hidden email]> wrote:


On fre, 2020-01-17 at 20:09 +0200, Led wrote:
I am having some performance trouble in a system that does a few queries on a small ets table of around 10,000 records.

Basically with around 500 concurrent processes, everything is fine, 1500 I start to notice some small degradation, at around 3000 concurrent processes the schedulers grind to a halt, TOP system CPU usage is around 50%, but Erlang scheduler usage (scheduler:utilization) is 100% and capped out on all 40 threads.

I am guessing the schedulers are all waiting on locks on the ets table.  I thought match_object and ets was quite optimized these days, using R22, I am wondering if there is some synchronization/locking issues that could be addressed.  Because I mean at 3000 processes maybe hitting that table 10 times per second on average, does not seem like much. 30k match_objects per second, with ongoing inserts. 

Also would there be a way to debug/pinpoint this is the exact issue?  I just did A/B testing where I turned off parts of the system, when I turned off the part that does the match_objects on the ETS table, the system ran fine and never deadlocked at 100% scheduler usage.  Its also hard to profile, as the system is so locked up the profiler barely runs.

For now it seems the solution is to rework the architecture and put a second cached view ETS table, so the match_objects can be replaced with key lookups.  Which gets filled by a single process running that pulls via match_object from the main table and fills the cache.


You didn't specify parameters of your table.



And what's the frequency of those inserts that you mention.

ets:match_object is a read-only operation and should only inflict lock contention with other write operations, such as ets:insert.


/Sverker