Freeze in re:run

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Freeze in re:run

Miroslav Urbanek
Hi,

I'm using Erlang 17.5 and I'm experiencing freezes in re:run that take
tens of seconds. I've isolated the problematic part into the following
code:

----
#!/usr/bin/env escript

log(Term) ->
    {_ , {Hour, Minute, Second}} = calendar:now_to_universal_time(os:timestamp()),
    io:format("~2..0w:~2..0w:~2..0w ~p~n", [Hour, Minute, Second, Term]).

test(Filename) ->
    {ok, IoDevice} = file:open(Filename, [binary, read]),
    {ok, Data} = file:read(IoDevice, 1048576),
    Regexp = "(?i)(?m)(?s)(?U)(?<=\n)--(?|.*^content-disposition:(?:(?!\n\\S).)*filename=\"[^\"]*\\.([^\".]*)\".*|.*^content-type:(?:(?!\n\\S).)*name=\"[^\"]*\\.([^\".]*)\").*\n\n",
    re:run(Data, Regexp, [report_errors]).

main([Filename]) ->
    spawn_link(fun Fun() ->
                       log(heartbeat),
                       timer:sleep(1000),
                       Fun()
               end),
    log(test(Filename)).
----

Processing the complex regexp above takes about 50 seconds for my test
data. Sometimes, the whole Erlang VM freezes for several seconds. The
output then looks like this:

----
$ ./test.erl test.eml
18:27:48 heartbeat
18:27:49 heartbeat
18:27:50 heartbeat
18:27:51 heartbeat
18:27:52 heartbeat
18:27:53 heartbeat
18:27:54 heartbeat
18:27:55 heartbeat
18:27:56 heartbeat
18:27:57 heartbeat
18:27:58 heartbeat
18:27:59 heartbeat
18:28:00 heartbeat
18:28:01 heartbeat
18:28:02 heartbeat
18:28:03 heartbeat
18:28:04 heartbeat
18:28:05 heartbeat
18:28:06 heartbeat
18:28:07 heartbeat
18:28:08 heartbeat
18:28:36 heartbeat
18:28:36 nomatch
----

The VM froze between 18:28:08 and 18:28:36. The manual page re(3)
states that "re:run always give control back to the scheduler of
Erlang processes at intervals that ensures the real time properties of
the Erlang system". I believe this is a bug, because re:run clearly
didn't give control back to the scheduler during the interval above.

Why does it happen? Is there any option to limit the number of time VM
spends in re:run?

The program and the data file can be downloaded here:

http://miroslavurbanek.com/re.tgz

Thanks,
Miro
_______________________________________________
erlang-bugs mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-bugs
Reply | Threaded
Open this post in threaded view
|

Re: Freeze in re:run

PAILLEAU Eric
Hi,

Your regexp is compiled at each call.
Your should compile it in your shell and set it in your code by a define.
It is probably not your problem, but it should increase perfs...
Regards

Le 22 mai 2015 00:42, Miroslav Urbanek <[hidden email]> a écrit :

>
> Hi,
>
> I'm using Erlang 17.5 and I'm experiencing freezes in re:run that take
> tens of seconds. I've isolated the problematic part into the following
> code:
>
> ----
> #!/usr/bin/env escript
>
> log(Term) ->
>     {_ , {Hour, Minute, Second}} = calendar:now_to_universal_time(os:timestamp()),
>     io:format("~2..0w:~2..0w:~2..0w ~p~n", [Hour, Minute, Second, Term]).
>
> test(Filename) ->
>     {ok, IoDevice} = file:open(Filename, [binary, read]),
>     {ok, Data} = file:read(IoDevice, 1048576),
>     Regexp = "(?i)(?m)(?s)(?U)(?<=\n)--(?|.*^content-disposition:(?:(?!\n\\S).)*filename=\"[^\"]*\\.([^\".]*)\".*|.*^content-type:(?:(?!\n\\S).)*name=\"[^\"]*\\.([^\".]*)\").*\n\n",
>     re:run(Data, Regexp, [report_errors]).
>
> main([Filename]) ->
>     spawn_link(fun Fun() ->
>                        log(heartbeat),
>                        timer:sleep(1000),
>                        Fun()
>                end),
>     log(test(Filename)).
> ----
>
> Processing the complex regexp above takes about 50 seconds for my test
> data. Sometimes, the whole Erlang VM freezes for several seconds. The
> output then looks like this:
>
> ----
> $ ./test.erl test.eml
> 18:27:48 heartbeat
> 18:27:49 heartbeat
> 18:27:50 heartbeat
> 18:27:51 heartbeat
> 18:27:52 heartbeat
> 18:27:53 heartbeat
> 18:27:54 heartbeat
> 18:27:55 heartbeat
> 18:27:56 heartbeat
> 18:27:57 heartbeat
> 18:27:58 heartbeat
> 18:27:59 heartbeat
> 18:28:00 heartbeat
> 18:28:01 heartbeat
> 18:28:02 heartbeat
> 18:28:03 heartbeat
> 18:28:04 heartbeat
> 18:28:05 heartbeat
> 18:28:06 heartbeat
> 18:28:07 heartbeat
> 18:28:08 heartbeat
> 18:28:36 heartbeat
> 18:28:36 nomatch
> ----
>
> The VM froze between 18:28:08 and 18:28:36. The manual page re(3)
> states that "re:run always give control back to the scheduler of
> Erlang processes at intervals that ensures the real time properties of
> the Erlang system". I believe this is a bug, because re:run clearly
> didn't give control back to the scheduler during the interval above.
>
> Why does it happen? Is there any option to limit the number of time VM
> spends in re:run?
>
> The program and the data file can be downloaded here:
>
> http://miroslavurbanek.com/re.tgz 
>
> Thanks,
> Miro
> _______________________________________________
> erlang-bugs mailing list
> [hidden email]
> http://erlang.org/mailman/listinfo/erlang-bugs 
_______________________________________________
erlang-bugs mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-bugs
Reply | Threaded
Open this post in threaded view
|

Re: Freeze in re:run

PAILLEAU Eric
Hello,
Using your code, I got same behavior, ie. 22 secondes then freeze.

Using a pre-compiled regexp , I got freeze only 2/3 secondes after start .

-define(MP,{re_pattern,1,0,0,<<69,82,67,80,73,1,0,0,7,2,0,0,81,8,0,0,255,255,255,255,255,255,255,255,45,0,10,0,1,0,1,0,0,0,56,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,125,1,13,121,0,8,118,0,1,30,10,114,0,8,30,45,30,45,125,0,133,86,13,26,30,99,30,111,30,110,30,116,30,101,30,110,30,116,30,45,30,100,30,105,30,115,30,112,30,111,30,115,30,105,30,116,30,105,30,111,30,110,30,58,141,125,0,13,120,0,6,30,10,8,114,0,6,13,116,0,13,30,102,30,105,30,108,30,101,30,110,30,97,30,109,30,101,30,61,30,34,73,34,30,46,127,0,39,0,1,107,255,255,255,255,251,191,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,99,114,0,39,30,34,86,13,113,0,109,86,13,26,30,99,30,111,30,110,30,116,30,101,30,110,30,116,30,45,30,116,30,121,30,112,30,101,30,58,141,125,0,13,120,0,6,30,10,8,114,0,6,13,116,0,13,30,110,30,97,30,109,30,101,30,61,30,34,73,34,30,46,127,0,39,0,1,107,255,255,255,255,251,191,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255
,255,255,255,255,255,255,99,114,0,39,30,34,114,0,242,86,13,30,10,30,10,114,1,13,0>>}).

and
re:run(Data, ?MP, [report_errors]).

$ ./test2.erl test.eml
21:53:26 heartbeat
21:53:27 heartbeat
21:53:28 heartbeat
21:54:17 heartbeat
21:54:17 nomatch

Using strace :

_____________________________________________________
[...]
sched_yield()                           = 0
sched_yield()                           = 0
sched_yield()                           = 0
sched_yield()                           = 0
sched_yield()                           = 0
sched_yield()                           = 0
sched_yield()                           = 0
futex(0xb6840088, FUTEX_WAIT_PRIVATE, -1, NULL) = 0
rt_sigprocmask(SIG_SETMASK, [CHLD], NULL, 8) = 0
write(7, "\0", 1)                       = 1
select(0, NULL, NULL, NULL, NULL21:59:14 heartbeat
21:59:15 heartbeat
21:59:16 heartbeat
22:00:05 nomatch
22:00:05 heartbeat
  <unfinished ...>
+++ exited with 0 +++
_____________________________________________________

sched_yield() is not called during the 50 secondes processing.

Indeed strange.
Regards



Le 22/05/2015 17:43, Éric Pailleau a écrit :

> Hi,
>
> Your regexp is compiled at each call.
> Your should compile it in your shell and set it in your code by a define.
> It is probably not your problem, but it should increase perfs...
> Regards
>
> Le 22 mai 2015 00:42, Miroslav Urbanek <[hidden email]> a écrit :
>>
>> Hi,
>>
>> I'm using Erlang 17.5 and I'm experiencing freezes in re:run that take
>> tens of seconds. I've isolated the problematic part into the following
>> code:
>>
>> ----
>> #!/usr/bin/env escript
>>
>> log(Term) ->
>>      {_ , {Hour, Minute, Second}} = calendar:now_to_universal_time(os:timestamp()),
>>      io:format("~2..0w:~2..0w:~2..0w ~p~n", [Hour, Minute, Second, Term]).
>>
>> test(Filename) ->
>>      {ok, IoDevice} = file:open(Filename, [binary, read]),
>>      {ok, Data} = file:read(IoDevice, 1048576),
>>      Regexp = "(?i)(?m)(?s)(?U)(?<=\n)--(?|.*^content-disposition:(?:(?!\n\\S).)*filename=\"[^\"]*\\.([^\".]*)\".*|.*^content-type:(?:(?!\n\\S).)*name=\"[^\"]*\\.([^\".]*)\").*\n\n",
>>      re:run(Data, Regexp, [report_errors]).
>>
>> main([Filename]) ->
>>      spawn_link(fun Fun() ->
>>                         log(heartbeat),
>>                         timer:sleep(1000),
>>                         Fun()
>>                 end),
>>      log(test(Filename)).
>> ----
>>
>> Processing the complex regexp above takes about 50 seconds for my test
>> data. Sometimes, the whole Erlang VM freezes for several seconds. The
>> output then looks like this:
>>
>> ----
>> $ ./test.erl test.eml
>> 18:27:48 heartbeat
>> 18:27:49 heartbeat
>> 18:27:50 heartbeat
>> 18:27:51 heartbeat
>> 18:27:52 heartbeat
>> 18:27:53 heartbeat
>> 18:27:54 heartbeat
>> 18:27:55 heartbeat
>> 18:27:56 heartbeat
>> 18:27:57 heartbeat
>> 18:27:58 heartbeat
>> 18:27:59 heartbeat
>> 18:28:00 heartbeat
>> 18:28:01 heartbeat
>> 18:28:02 heartbeat
>> 18:28:03 heartbeat
>> 18:28:04 heartbeat
>> 18:28:05 heartbeat
>> 18:28:06 heartbeat
>> 18:28:07 heartbeat
>> 18:28:08 heartbeat
>> 18:28:36 heartbeat
>> 18:28:36 nomatch
>> ----
>>
>> The VM froze between 18:28:08 and 18:28:36. The manual page re(3)
>> states that "re:run always give control back to the scheduler of
>> Erlang processes at intervals that ensures the real time properties of
>> the Erlang system". I believe this is a bug, because re:run clearly
>> didn't give control back to the scheduler during the interval above.
>>
>> Why does it happen? Is there any option to limit the number of time VM
>> spends in re:run?
>>
>> The program and the data file can be downloaded here:
>>
>> http://miroslavurbanek.com/re.tgz
>>
>> Thanks,
>> Miro
>> _______________________________________________
>> erlang-bugs mailing list
>> [hidden email]
>> http://erlang.org/mailman/listinfo/erlang-bugs
> _______________________________________________
> erlang-bugs mailing list
> [hidden email]
> http://erlang.org/mailman/listinfo/erlang-bugs
>

_______________________________________________
erlang-bugs mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-bugs
Reply | Threaded
Open this post in threaded view
|

Re: Freeze in re:run

PAILLEAU Eric
Hi again,

interestingly, when starting observer before spawn_link,
output is not anymore frozen.
I get heartbeat messages every second until nomatch.

I did this several time, and sometime observer is totally unresponsive
(frozen), but hearbeat messages are still coming in the meantime.

Regards.

_______________________________________________
erlang-bugs mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-bugs
Reply | Threaded
Open this post in threaded view
|

Re: Freeze in re:run

Miroslav Urbanek
Hi Eric,

thanks for your investigation. One thing I tried before is to
uncomment the line

#define ERLANG_DEBUG 1

in erts/emulator/pcre/pcre_exec.c. I don't have the output anymore,
but it clearly showed "heartbeat" message near "Loop limit break
detected". I think PCRE library doesn't properly return control to the
VM, but my understanding of Erlang and PCRE internals is rather
limited to see why.

Miro

On Sat, May 23, 2015 at 12:24:51AM +0200, PAILLEAU Eric wrote:

> Hi again,
>
> interestingly, when starting observer before spawn_link, output is
> not anymore frozen. I get heartbeat messages every second until
> nomatch.
>
> I did this several time, and sometime observer is totally
> unresponsive (frozen), but hearbeat messages are still coming in the
> meantime.
>
> Regards.
>
> _______________________________________________
> erlang-bugs mailing list
> [hidden email]
> http://erlang.org/mailman/listinfo/erlang-bugs
_______________________________________________
erlang-bugs mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-bugs