Segfault in do_allocate_logger_message

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Segfault in do_allocate_logger_message

Vince Foley
Hi folks, I recently encountered a pretty strange segfault, and I was wondering if anyone could provide any insight...

The process is running just fine and then disappears. I grabbed the coredump and opened it up and found this output:

```
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00000000004c9511 in do_allocate_logger_message (p=<synthetic pointer>, sz=280577276613695, bp=<synthetic pointer>, ohp=<synthetic pointer>, hp=0x7f97ba57f7f0, gleader=128)
    at beam/utils.c:1958
```

There's a little more context in the backtrace...

```
(gdb) backtrace
#0  0x00000000004c9511 in do_allocate_logger_message (p=<synthetic pointer>, sz=280577276613695, bp=<synthetic pointer>, ohp=<synthetic pointer>, hp=0x7f97ba57f7f0, gleader=128)
    at beam/utils.c:1958
#1  do_send_term_to_logger (tag=843, args=140289643116560, len=-966352886, buf=0x7f97c1f7db72 "", gleader=128) at beam/utils.c:2058
#2  send_error_term_to_logger (args=140289643116560, len=-966352886, buf=0x7f97c1f7db72 "", gleader=128) at beam/utils.c:2106
#3  erts_send_error_term_to_logger (gleader=128, dsbufp=0x7f97c1f7db70, args=140289643116560) at beam/utils.c:2187
#4  0x000000000059d990 in erts_bs_put_utf8 (EBS=0x7f97c1f7db72, arg=140289845403658) at beam/erl_bits.c:850
#5  0x00007f97bfb79898 in ?? ()
#6  0x000000000000001e in ?? ()
#7  0x00007f97b5c675a8 in ?? ()
#8  0x00007f97bb25b5d8 in ?? ()
#9  0x00007f97b5c25ad8 in ?? ()
#10 0x00007f97c1f7b282 in ?? ()
#11 0x8450345d9c222796 in ?? ()
#12 0x00007f97bfc47008 in ?? ()
#13 0x00007f97bc8c3338 in ?? ()
#14 0x0000000000449c14 in process_main (x_reg_array=0x7f97b5c66d30, f_reg_array=0x7f97c666a00a) at x86_64-pc-linux-musl/opt/smp/beam_hot.h:943
#15 0x00007f97bba80100 in ?? ()
#16 0x0000000000000000 in ?? ()
```

I do have an error_handler module added to the error_logger. Although there doesn't appear to be any noticable memory growth or message queue backup in the error_logger process before it dies.

I can't quite trigger it myself but it does happen on a regular basis.

Erlang/OTP 20 [erts-9.3] [source] [64-bit] [smp:16:16] [ds:16:16:10] [async-threads:10] [hipe] [kernel-poll:false]
Elixir 1.6.4 (compiled with OTP 20)
Alpine 3.7

Anyone have any clues?


_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Segfault in do_allocate_logger_message

Mikael Pettersson-5
On Sat, Apr 21, 2018 at 3:25 AM, Vince Foley <[hidden email]> wrote:

> Hi folks, I recently encountered a pretty strange segfault, and I was
> wondering if anyone could provide any insight...
>
> The process is running just fine and then disappears. I grabbed the coredump
> and opened it up and found this output:
>
> ```
> Program terminated with signal SIGSEGV, Segmentation fault.
> #0  0x00000000004c9511 in do_allocate_logger_message (p=<synthetic pointer>,
> sz=280577276613695, bp=<synthetic pointer>, ohp=<synthetic pointer>,
> hp=0x7f97ba57f7f0, gleader=128)
>     at beam/utils.c:1958
> ```
>
> There's a little more context in the backtrace...
>
> ```
> (gdb) backtrace
> #0  0x00000000004c9511 in do_allocate_logger_message (p=<synthetic pointer>,
> sz=280577276613695, bp=<synthetic pointer>, ohp=<synthetic pointer>,
> hp=0x7f97ba57f7f0, gleader=128)
>     at beam/utils.c:1958
> #1  do_send_term_to_logger (tag=843, args=140289643116560, len=-966352886,
> buf=0x7f97c1f7db72 "", gleader=128) at beam/utils.c:2058
> #2  send_error_term_to_logger (args=140289643116560, len=-966352886,
> buf=0x7f97c1f7db72 "", gleader=128) at beam/utils.c:2106
> #3  erts_send_error_term_to_logger (gleader=128, dsbufp=0x7f97c1f7db70,
> args=140289643116560) at beam/utils.c:2187

First, you're clearly on a 64-bit machine, I'm going to assume Linux/x86-64.

What I think happened here is that you generated an error term with an output
representation larger than 2GB.  In erts_send_error_term_to_logger() there is an
unchecked narrowing conversion from size_t to int as dsbufp->str_len is passed
to send_error_term_to_logger().  In the stack dump above, you see a large but
negative 32-bit value in the "len" parameters -- that's a common sign
of this kind
of problem.  If you can, please re-run gdb, "up" to the
erts_send_error_term_to_logger()
frame, and "print *dsbufp"; I suspect dsbufp->str_len will be >=2GB.

Then in do_send_term_to_logger() the negative len is use to compute the size
of the message buffer, which results in sz=280577276613695 which is bogus
(it's about 261 GB).

The actual SIGSEGV probably comes from the new_message_buffer() call on
line 1956 returning NULL for this bogus size.

Fixing this will require code changes in util.c and maybe elsewhere too.

What you can do meanwhile is to try to limit the sizes of terms sent to the
error logger so their output representation is less than 2GB.  (As I type this I
spot another bug in do_send_term_to_logger(), so you'll want to halve that
limit to be less than 1GB.)

/Mikael

> #4  0x000000000059d990 in erts_bs_put_utf8 (EBS=0x7f97c1f7db72,
> arg=140289845403658) at beam/erl_bits.c:850
> #5  0x00007f97bfb79898 in ?? ()
> #6  0x000000000000001e in ?? ()
> #7  0x00007f97b5c675a8 in ?? ()
> #8  0x00007f97bb25b5d8 in ?? ()
> #9  0x00007f97b5c25ad8 in ?? ()
> #10 0x00007f97c1f7b282 in ?? ()
> #11 0x8450345d9c222796 in ?? ()
> #12 0x00007f97bfc47008 in ?? ()
> #13 0x00007f97bc8c3338 in ?? ()
> #14 0x0000000000449c14 in process_main (x_reg_array=0x7f97b5c66d30,
> f_reg_array=0x7f97c666a00a) at x86_64-pc-linux-musl/opt/smp/beam_hot.h:943
> #15 0x00007f97bba80100 in ?? ()
> #16 0x0000000000000000 in ?? ()
> ```
>
> I do have an error_handler module added to the error_logger. Although there
> doesn't appear to be any noticable memory growth or message queue backup in
> the error_logger process before it dies.
>
> I can't quite trigger it myself but it does happen on a regular basis.
>
> Erlang/OTP 20 [erts-9.3] [source] [64-bit] [smp:16:16] [ds:16:16:10]
> [async-threads:10] [hipe] [kernel-poll:false]
> Elixir 1.6.4 (compiled with OTP 20)
> Alpine 3.7
>
> Anyone have any clues?
>
>
> _______________________________________________
> erlang-questions mailing list
> [hidden email]
> http://erlang.org/mailman/listinfo/erlang-questions
>
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Segfault in do_allocate_logger_message

Mikael Pettersson-5
On Sat, Apr 21, 2018 at 10:10 AM, Mikael Pettersson
<[hidden email]> wrote:

> On Sat, Apr 21, 2018 at 3:25 AM, Vince Foley <[hidden email]> wrote:
>> Hi folks, I recently encountered a pretty strange segfault, and I was
>> wondering if anyone could provide any insight...
>>
>> The process is running just fine and then disappears. I grabbed the coredump
>> and opened it up and found this output:
>>
>> ```
>> Program terminated with signal SIGSEGV, Segmentation fault.
>> #0  0x00000000004c9511 in do_allocate_logger_message (p=<synthetic pointer>,
>> sz=280577276613695, bp=<synthetic pointer>, ohp=<synthetic pointer>,
>> hp=0x7f97ba57f7f0, gleader=128)
>>     at beam/utils.c:1958
>> ```
>>
>> There's a little more context in the backtrace...
>>
>> ```
>> (gdb) backtrace
>> #0  0x00000000004c9511 in do_allocate_logger_message (p=<synthetic pointer>,
>> sz=280577276613695, bp=<synthetic pointer>, ohp=<synthetic pointer>,
>> hp=0x7f97ba57f7f0, gleader=128)
>>     at beam/utils.c:1958
>> #1  do_send_term_to_logger (tag=843, args=140289643116560, len=-966352886,
>> buf=0x7f97c1f7db72 "", gleader=128) at beam/utils.c:2058
>> #2  send_error_term_to_logger (args=140289643116560, len=-966352886,
>> buf=0x7f97c1f7db72 "", gleader=128) at beam/utils.c:2106
>> #3  erts_send_error_term_to_logger (gleader=128, dsbufp=0x7f97c1f7db70,
>> args=140289643116560) at beam/utils.c:2187
>
> First, you're clearly on a 64-bit machine, I'm going to assume Linux/x86-64.
>
> What I think happened here is that you generated an error term with an output
> representation larger than 2GB.  In erts_send_error_term_to_logger() there is an
> unchecked narrowing conversion from size_t to int as dsbufp->str_len is passed
> to send_error_term_to_logger().  In the stack dump above, you see a large but
> negative 32-bit value in the "len" parameters -- that's a common sign
> of this kind
> of problem.  If you can, please re-run gdb, "up" to the
> erts_send_error_term_to_logger()
> frame, and "print *dsbufp"; I suspect dsbufp->str_len will be >=2GB.
>
> Then in do_send_term_to_logger() the negative len is use to compute the size
> of the message buffer, which results in sz=280577276613695 which is bogus
> (it's about 261 GB).
>
> The actual SIGSEGV probably comes from the new_message_buffer() call on
> line 1956 returning NULL for this bogus size.
>
> Fixing this will require code changes in util.c and maybe elsewhere too.

Proposed fix: https://github.com/erlang/otp/pull/1795

>
> What you can do meanwhile is to try to limit the sizes of terms sent to the
> error logger so their output representation is less than 2GB.  (As I type this I
> spot another bug in do_send_term_to_logger(), so you'll want to halve that
> limit to be less than 1GB.)
>
> /Mikael
>
>> #4  0x000000000059d990 in erts_bs_put_utf8 (EBS=0x7f97c1f7db72,
>> arg=140289845403658) at beam/erl_bits.c:850
>> #5  0x00007f97bfb79898 in ?? ()
>> #6  0x000000000000001e in ?? ()
>> #7  0x00007f97b5c675a8 in ?? ()
>> #8  0x00007f97bb25b5d8 in ?? ()
>> #9  0x00007f97b5c25ad8 in ?? ()
>> #10 0x00007f97c1f7b282 in ?? ()
>> #11 0x8450345d9c222796 in ?? ()
>> #12 0x00007f97bfc47008 in ?? ()
>> #13 0x00007f97bc8c3338 in ?? ()
>> #14 0x0000000000449c14 in process_main (x_reg_array=0x7f97b5c66d30,
>> f_reg_array=0x7f97c666a00a) at x86_64-pc-linux-musl/opt/smp/beam_hot.h:943
>> #15 0x00007f97bba80100 in ?? ()
>> #16 0x0000000000000000 in ?? ()
>> ```
>>
>> I do have an error_handler module added to the error_logger. Although there
>> doesn't appear to be any noticable memory growth or message queue backup in
>> the error_logger process before it dies.
>>
>> I can't quite trigger it myself but it does happen on a regular basis.
>>
>> Erlang/OTP 20 [erts-9.3] [source] [64-bit] [smp:16:16] [ds:16:16:10]
>> [async-threads:10] [hipe] [kernel-poll:false]
>> Elixir 1.6.4 (compiled with OTP 20)
>> Alpine 3.7
>>
>> Anyone have any clues?
>>
>>
>> _______________________________________________
>> erlang-questions mailing list
>> [hidden email]
>> http://erlang.org/mailman/listinfo/erlang-questions
>>
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Segfault in do_allocate_logger_message

Vince Foley
In reply to this post by Mikael Pettersson-5
Thanks for taking a look and such a quick PR! Here's the output from the gdb commands:

```
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00000000004c9511 in do_allocate_logger_message (p=<synthetic pointer>, sz=280577276613695, bp=<synthetic pointer>, ohp=<synthetic pointer>, hp=0x7f97ba57f7f0, gleader=128) at beam/utils.c:1958
1958    *hp = (*bp)->mem;
[Current thread is 1 (LWP 511)]
(gdb) up
#1  do_send_term_to_logger (tag=843, args=140289643116560, len=-966352886, buf=0x7f97c1f7db72 "", gleader=128) at beam/utils.c:2058
2058    gl = do_allocate_logger_message(gleader, &hp, &ohp, &bp, &p, sz);
(gdb) up
#2  send_error_term_to_logger (args=140289643116560, len=-966352886, buf=0x7f97c1f7db72 "", gleader=128) at beam/utils.c:2106
2106    return do_send_term_to_logger(am_error, gleader, buf, len, args);
(gdb) up
#3  erts_send_error_term_to_logger (gleader=128, dsbufp=0x7f97c1f7db70, args=140289643116560) at beam/utils.c:2187
2187    res = send_error_term_to_logger(gleader, dsbufp->str, dsbufp->str_len, args);
(gdb) list
2182
2183 int
2184 erts_send_error_term_to_logger(Eterm gleader, erts_dsprintf_buf_t *dsbufp, Eterm args)
2185 {
2186    int res;
2187    res = send_error_term_to_logger(gleader, dsbufp->str, dsbufp->str_len, args);
2188    destroy_logger_dsbuf(dsbufp);
2189    return res;
2190 }
2191
(gdb) print *dsbufp
$1 = {str = 0x80 <error: Cannot access memory at address 0x80>, str_len = 1506955, size = 401995, grow = 0x30000000160}
```

I'm not sure if that confirms your hypothesis or not...

I did find some huge size numbers in `args` though:


```
#1  do_send_term_to_logger (tag=843, args=140289643116560, len=-966352886, buf=0x7f97c1f7db72 "", gleader=128) at beam/utils.c:2058
2058    gl = do_allocate_logger_message(gleader, &hp, &ohp, &bp, &p, sz);
(gdb) list
2053    args_sz = size_object(args);
2054    sz = len * 2 /* format */ + args_sz
2055 + 3 /*outer 2-tuple*/ + 4 /* middle 3-tuple */ + 4 /*inner 3-tuple */;
2056
2057    /* gleader size is accounted and allocated next */
2058    gl = do_allocate_logger_message(gleader, &hp, &ohp, &bp, &p, sz);
2059
2060    if(is_nil(gl)) {
2061       /* buf *always* points to a null terminated string */
2062       erts_fprintf(stderr, "(no error logger present) %T: \"%s\" %T\n",
(gdb) print args_sz
$1 = 140289566202896
(gdb) print sz
$2 = 140287633497135
```


Side question.. I am trying to use the etp commands to print out the Erlang terms (https://github.com/erlang/otp/blob/master/erts/etc/unix/etp-commands.in)

I keep getting this error:
```
(gdb) etp *bp
Cannot access memory at address 0xb974e0
```

And this kind of thing:
```
(gdb) etp-processes
No processes, since system isn't initialized!
```

Am I doing something wrong, or is this just not possible with my coredump?


On Sat, Apr 21, 2018 at 1:10 AM, Mikael Pettersson <[hidden email]> wrote:
On Sat, Apr 21, 2018 at 3:25 AM, Vince Foley <[hidden email]> wrote:
> Hi folks, I recently encountered a pretty strange segfault, and I was
> wondering if anyone could provide any insight...
>
> The process is running just fine and then disappears. I grabbed the coredump
> and opened it up and found this output:
>
> ```
> Program terminated with signal SIGSEGV, Segmentation fault.
> #0  0x00000000004c9511 in do_allocate_logger_message (p=<synthetic pointer>,
> sz=280577276613695, bp=<synthetic pointer>, ohp=<synthetic pointer>,
> hp=0x7f97ba57f7f0, gleader=128)
>     at beam/utils.c:1958
> ```
>
> There's a little more context in the backtrace...
>
> ```
> (gdb) backtrace
> #0  0x00000000004c9511 in do_allocate_logger_message (p=<synthetic pointer>,
> sz=280577276613695, bp=<synthetic pointer>, ohp=<synthetic pointer>,
> hp=0x7f97ba57f7f0, gleader=128)
>     at beam/utils.c:1958
> #1  do_send_term_to_logger (tag=843, args=140289643116560, len=-966352886,
> buf=0x7f97c1f7db72 "", gleader=128) at beam/utils.c:2058
> #2  send_error_term_to_logger (args=140289643116560, len=-966352886,
> buf=0x7f97c1f7db72 "", gleader=128) at beam/utils.c:2106
> #3  erts_send_error_term_to_logger (gleader=128, dsbufp=0x7f97c1f7db70,
> args=140289643116560) at beam/utils.c:2187

First, you're clearly on a 64-bit machine, I'm going to assume Linux/x86-64.

What I think happened here is that you generated an error term with an output
representation larger than 2GB.  In erts_send_error_term_to_logger() there is an
unchecked narrowing conversion from size_t to int as dsbufp->str_len is passed
to send_error_term_to_logger().  In the stack dump above, you see a large but
negative 32-bit value in the "len" parameters -- that's a common sign
of this kind
of problem.  If you can, please re-run gdb, "up" to the
erts_send_error_term_to_logger()
frame, and "print *dsbufp"; I suspect dsbufp->str_len will be >=2GB.

Then in do_send_term_to_logger() the negative len is use to compute the size
of the message buffer, which results in sz=280577276613695 which is bogus
(it's about 261 GB).

The actual SIGSEGV probably comes from the new_message_buffer() call on
line 1956 returning NULL for this bogus size.

Fixing this will require code changes in util.c and maybe elsewhere too.

What you can do meanwhile is to try to limit the sizes of terms sent to the
error logger so their output representation is less than 2GB.  (As I type this I
spot another bug in do_send_term_to_logger(), so you'll want to halve that
limit to be less than 1GB.)

/Mikael

> #4  0x000000000059d990 in erts_bs_put_utf8 (EBS=0x7f97c1f7db72,
> arg=140289845403658) at beam/erl_bits.c:850
> #5  0x00007f97bfb79898 in ?? ()
> #6  0x000000000000001e in ?? ()
> #7  0x00007f97b5c675a8 in ?? ()
> #8  0x00007f97bb25b5d8 in ?? ()
> #9  0x00007f97b5c25ad8 in ?? ()
> #10 0x00007f97c1f7b282 in ?? ()
> #11 0x8450345d9c222796 in ?? ()
> #12 0x00007f97bfc47008 in ?? ()
> #13 0x00007f97bc8c3338 in ?? ()
> #14 0x0000000000449c14 in process_main (x_reg_array=0x7f97b5c66d30,
> f_reg_array=0x7f97c666a00a) at x86_64-pc-linux-musl/opt/smp/beam_hot.h:943
> #15 0x00007f97bba80100 in ?? ()
> #16 0x0000000000000000 in ?? ()
> ```
>
> I do have an error_handler module added to the error_logger. Although there
> doesn't appear to be any noticable memory growth or message queue backup in
> the error_logger process before it dies.
>
> I can't quite trigger it myself but it does happen on a regular basis.
>
> Erlang/OTP 20 [erts-9.3] [source] [64-bit] [smp:16:16] [ds:16:16:10]
> [async-threads:10] [hipe] [kernel-poll:false]
> Elixir 1.6.4 (compiled with OTP 20)
> Alpine 3.7
>
> Anyone have any clues?
>
>
> _______________________________________________
> erlang-questions mailing list
> [hidden email]
> http://erlang.org/mailman/listinfo/erlang-questions
>


_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Segfault in do_allocate_logger_message

Mikael Pettersson-5
On Sat, Apr 21, 2018 at 10:51 PM, Vince Foley <[hidden email]> wrote:

> Thanks for taking a look and such a quick PR! Here's the output from the gdb
> commands:
>
> ```
> Program terminated with signal SIGSEGV, Segmentation fault.
> #0  0x00000000004c9511 in do_allocate_logger_message (p=<synthetic pointer>,
> sz=280577276613695, bp=<synthetic pointer>, ohp=<synthetic pointer>,
> hp=0x7f97ba57f7f0, gleader=128) at beam/utils.c:1958
> 1958    *hp = (*bp)->mem;
> [Current thread is 1 (LWP 511)]
> (gdb) up
> #1  do_send_term_to_logger (tag=843, args=140289643116560, len=-966352886,
> buf=0x7f97c1f7db72 "", gleader=128) at beam/utils.c:2058
> 2058    gl = do_allocate_logger_message(gleader, &hp, &ohp, &bp, &p, sz);
> (gdb) up
> #2  send_error_term_to_logger (args=140289643116560, len=-966352886,
> buf=0x7f97c1f7db72 "", gleader=128) at beam/utils.c:2106
> 2106    return do_send_term_to_logger(am_error, gleader, buf, len, args);
> (gdb) up
> #3  erts_send_error_term_to_logger (gleader=128, dsbufp=0x7f97c1f7db70,
> args=140289643116560) at beam/utils.c:2187
> 2187    res = send_error_term_to_logger(gleader, dsbufp->str,
> dsbufp->str_len, args);
> (gdb) list
> 2182
> 2183 int
> 2184 erts_send_error_term_to_logger(Eterm gleader, erts_dsprintf_buf_t
> *dsbufp, Eterm args)
> 2185 {
> 2186    int res;
> 2187    res = send_error_term_to_logger(gleader, dsbufp->str,
> dsbufp->str_len, args);
> 2188    destroy_logger_dsbuf(dsbufp);
> 2189    return res;
> 2190 }
> 2191
> (gdb) print *dsbufp
> $1 = {str = 0x80 <error: Cannot access memory at address 0x80>, str_len =
> 1506955, size = 401995, grow = 0x30000000160}
> ```
>
> I'm not sure if that confirms your hypothesis or not...

The values look weird, I'm pretty sure that *dsbufp got overwritten at
some point.
(My analysis about potential misbehaviour still holds though, just not confirmed
by your core.)

>
> I did find some huge size numbers in `args` though:

"args" is an Eterm encoding a pointer to a list in the heap, so it's
expected to be large.

>
>
> ```
> #1  do_send_term_to_logger (tag=843, args=140289643116560, len=-966352886,
> buf=0x7f97c1f7db72 "", gleader=128) at beam/utils.c:2058
> 2058    gl = do_allocate_logger_message(gleader, &hp, &ohp, &bp, &p, sz);
> (gdb) list
> 2053    args_sz = size_object(args);
> 2054    sz = len * 2 /* format */ + args_sz
> 2055 + 3 /*outer 2-tuple*/ + 4 /* middle 3-tuple */ + 4 /*inner 3-tuple */;
> 2056
> 2057    /* gleader size is accounted and allocated next */
> 2058    gl = do_allocate_logger_message(gleader, &hp, &ohp, &bp, &p, sz);
> 2059
> 2060    if(is_nil(gl)) {
> 2061       /* buf *always* points to a null terminated string */
> 2062       erts_fprintf(stderr, "(no error logger present) %T: \"%s\" %T\n",
> (gdb) print args_sz
> $1 = 140289566202896
> (gdb) print sz
> $2 = 140287633497135
> ```
>
>
> Side question.. I am trying to use the etp commands to print out the Erlang
> terms
> (https://github.com/erlang/otp/blob/master/erts/etc/unix/etp-commands.in)
>
> I keep getting this error:
> ```
> (gdb) etp *bp
> Cannot access memory at address 0xb974e0
> ```
>
> And this kind of thing:
> ```
> (gdb) etp-processes
> No processes, since system isn't initialized!
> ```
>
> Am I doing something wrong, or is this just not possible with my coredump?
>
>
> On Sat, Apr 21, 2018 at 1:10 AM, Mikael Pettersson <[hidden email]>
> wrote:
>>
>> On Sat, Apr 21, 2018 at 3:25 AM, Vince Foley <[hidden email]> wrote:
>> > Hi folks, I recently encountered a pretty strange segfault, and I was
>> > wondering if anyone could provide any insight...
>> >
>> > The process is running just fine and then disappears. I grabbed the
>> > coredump
>> > and opened it up and found this output:
>> >
>> > ```
>> > Program terminated with signal SIGSEGV, Segmentation fault.
>> > #0  0x00000000004c9511 in do_allocate_logger_message (p=<synthetic
>> > pointer>,
>> > sz=280577276613695, bp=<synthetic pointer>, ohp=<synthetic pointer>,
>> > hp=0x7f97ba57f7f0, gleader=128)
>> >     at beam/utils.c:1958
>> > ```
>> >
>> > There's a little more context in the backtrace...
>> >
>> > ```
>> > (gdb) backtrace
>> > #0  0x00000000004c9511 in do_allocate_logger_message (p=<synthetic
>> > pointer>,
>> > sz=280577276613695, bp=<synthetic pointer>, ohp=<synthetic pointer>,
>> > hp=0x7f97ba57f7f0, gleader=128)
>> >     at beam/utils.c:1958
>> > #1  do_send_term_to_logger (tag=843, args=140289643116560,
>> > len=-966352886,
>> > buf=0x7f97c1f7db72 "", gleader=128) at beam/utils.c:2058
>> > #2  send_error_term_to_logger (args=140289643116560, len=-966352886,
>> > buf=0x7f97c1f7db72 "", gleader=128) at beam/utils.c:2106
>> > #3  erts_send_error_term_to_logger (gleader=128, dsbufp=0x7f97c1f7db70,
>> > args=140289643116560) at beam/utils.c:2187
>>
>> First, you're clearly on a 64-bit machine, I'm going to assume
>> Linux/x86-64.
>>
>> What I think happened here is that you generated an error term with an
>> output
>> representation larger than 2GB.  In erts_send_error_term_to_logger() there
>> is an
>> unchecked narrowing conversion from size_t to int as dsbufp->str_len is
>> passed
>> to send_error_term_to_logger().  In the stack dump above, you see a large
>> but
>> negative 32-bit value in the "len" parameters -- that's a common sign
>> of this kind
>> of problem.  If you can, please re-run gdb, "up" to the
>> erts_send_error_term_to_logger()
>> frame, and "print *dsbufp"; I suspect dsbufp->str_len will be >=2GB.
>>
>> Then in do_send_term_to_logger() the negative len is use to compute the
>> size
>> of the message buffer, which results in sz=280577276613695 which is bogus
>> (it's about 261 GB).
>>
>> The actual SIGSEGV probably comes from the new_message_buffer() call on
>> line 1956 returning NULL for this bogus size.
>>
>> Fixing this will require code changes in util.c and maybe elsewhere too.
>>
>> What you can do meanwhile is to try to limit the sizes of terms sent to
>> the
>> error logger so their output representation is less than 2GB.  (As I type
>> this I
>> spot another bug in do_send_term_to_logger(), so you'll want to halve that
>> limit to be less than 1GB.)
>>
>> /Mikael
>>
>> > #4  0x000000000059d990 in erts_bs_put_utf8 (EBS=0x7f97c1f7db72,
>> > arg=140289845403658) at beam/erl_bits.c:850
>> > #5  0x00007f97bfb79898 in ?? ()
>> > #6  0x000000000000001e in ?? ()
>> > #7  0x00007f97b5c675a8 in ?? ()
>> > #8  0x00007f97bb25b5d8 in ?? ()
>> > #9  0x00007f97b5c25ad8 in ?? ()
>> > #10 0x00007f97c1f7b282 in ?? ()
>> > #11 0x8450345d9c222796 in ?? ()
>> > #12 0x00007f97bfc47008 in ?? ()
>> > #13 0x00007f97bc8c3338 in ?? ()
>> > #14 0x0000000000449c14 in process_main (x_reg_array=0x7f97b5c66d30,
>> > f_reg_array=0x7f97c666a00a) at
>> > x86_64-pc-linux-musl/opt/smp/beam_hot.h:943
>> > #15 0x00007f97bba80100 in ?? ()
>> > #16 0x0000000000000000 in ?? ()
>> > ```
>> >
>> > I do have an error_handler module added to the error_logger. Although
>> > there
>> > doesn't appear to be any noticable memory growth or message queue backup
>> > in
>> > the error_logger process before it dies.
>> >
>> > I can't quite trigger it myself but it does happen on a regular basis.
>> >
>> > Erlang/OTP 20 [erts-9.3] [source] [64-bit] [smp:16:16] [ds:16:16:10]
>> > [async-threads:10] [hipe] [kernel-poll:false]
>> > Elixir 1.6.4 (compiled with OTP 20)
>> > Alpine 3.7
>> >
>> > Anyone have any clues?
>> >
>> >
>> > _______________________________________________
>> > erlang-questions mailing list
>> > [hidden email]
>> > http://erlang.org/mailman/listinfo/erlang-questions
>> >
>
>
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions