Re: gdb questions (was Segfault in do_allocate_logger_message)

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: gdb questions (was Segfault in do_allocate_logger_message)

Roger Lipscombe-2
On 21 April 2018 at 21:51, Vince Foley <[hidden email]> wrote:

> Side question.. I am trying to use the etp commands to print out the Erlang
> terms
> (https://github.com/erlang/otp/blob/master/erts/etc/unix/etp-commands.in)
>
> I keep getting this error:
> ```
> (gdb) etp *bp
> Cannot access memory at address 0xb974e0
> ```
>
> And this kind of thing:
> ```
> (gdb) etp-processes
> No processes, since system isn't initialized!
> ```
>
> Am I doing something wrong, or is this just not possible with my coredump?

I've found that success with gdb requires:

- An exactly matching beam. I have to keep an appropriate VM around
for debugging core dumps from production.
- An *unstripped* beam loaded into gdb. Deb packages generally have
stripped binaries, which breaks a bunch of debugging scenarios. We
build our Erlang releases with kerl; we have to keep the unstripped
copies around.
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: gdb questions (was Segfault in do_allocate_logger_message)

Vince Foley
I think you are right here.. I've been experimenting with running gdb with the exact beam file from the dump and getting better results. This is my first time using gdb for anything :)

Do you know of any documentation around building "unstripped" erlang releases? This is my next step...

I'm going to post a new analysis soon.


On Tue, Apr 24, 2018 at 12:33 AM, Roger Lipscombe <[hidden email]> wrote:
On 21 April 2018 at 21:51, Vince Foley <[hidden email]> wrote:
> Side question.. I am trying to use the etp commands to print out the Erlang
> terms
> (https://github.com/erlang/otp/blob/master/erts/etc/unix/etp-commands.in)
>
> I keep getting this error:
> ```
> (gdb) etp *bp
> Cannot access memory at address 0xb974e0
> ```
>
> And this kind of thing:
> ```
> (gdb) etp-processes
> No processes, since system isn't initialized!
> ```
>
> Am I doing something wrong, or is this just not possible with my coredump?

I've found that success with gdb requires:

- An exactly matching beam. I have to keep an appropriate VM around
for debugging core dumps from production.
- An *unstripped* beam loaded into gdb. Deb packages generally have
stripped binaries, which breaks a bunch of debugging scenarios. We
build our Erlang releases with kerl; we have to keep the unstripped
copies around.


_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: gdb questions (was Segfault in do_allocate_logger_message)

Roger Lipscombe-2
On 24 April 2018 at 14:20, Vince Foley <[hidden email]> wrote:
> Do you know of any documentation around building "unstripped" erlang
> releases? This is my next step...

Don't strip them in the first place :)

We build our Erlang/OTP using kerl, which we then embed into our
release, which is then packaged as a .deb file, using the Debian tools
(debbuild, etc.).

It's the debbuild step that strips the binaries as they're put in the
.deb file. We simply keep the unstripped beam.smp binary from before
this happens. Do the ESL .deb files (assuming that's how you're
installing Erlang) also contain stripped binaries?
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: gdb questions (was Segfault in do_allocate_logger_message)

Vince Foley
I'm using Docker to build and run my release (my company has tons of docker infrastructure)..

The official docker image looks like it builds erlang from source with `./otp_build autoconf`



On Tue, Apr 24, 2018 at 7:27 AM, Roger Lipscombe <[hidden email]> wrote:
On 24 April 2018 at 14:20, Vince Foley <[hidden email]> wrote:
> Do you know of any documentation around building "unstripped" erlang
> releases? This is my next step...

Don't strip them in the first place :)

We build our Erlang/OTP using kerl, which we then embed into our
release, which is then packaged as a .deb file, using the Debian tools
(debbuild, etc.).

It's the debbuild step that strips the binaries as they're put in the
.deb file. We simply keep the unstripped beam.smp binary from before
this happens. Do the ESL .deb files (assuming that's how you're
installing Erlang) also contain stripped binaries?


_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: gdb questions (was Segfault in do_allocate_logger_message)

Vince Foley
Ok, after a bit of learning how to use gdb, I now have a set of backtraces that point to a different piece of code. Also I have figured out the proximate condition for the issue to exist...

Now that I have the matching beam file to go with the crash dump, I am consistently seeing the crash in this location:

```
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00000000004c7f11 in eq (a=<optimized out>, b=<optimized out>) at beam/utils.c:2346
2346          if (!is_boxed(b) || *boxed_val(b) != *aa)
[Current thread is 1 (Thread 0x7f6721931700 (LWP 515))]
(gdb) backtrace
#0  0x00000000004c7f11 in eq (a=<optimized out>, b=<optimized out>) at beam/utils.c:2346
#1  0x000000000044a15b in process_main (x_reg_array=0x7f672779c5c0, f_reg_array=0x7f672c07fb82) at beam/beam_emu.c:1568
#2  0x00000000004f4b88 in sched_thread_func (vesdp=0x7f6723d47980) at beam/erl_process.c:8906
#3  0x0000000000675945 in thr_wrapper (vtwd=0x7ffdbbd3e4e0) at pthread/ethread.c:118
#4  0x00007f676acfb494 in start_thread (arg=0x7f6721931700) at pthread_create.c:333
#5  0x00007f676a835acf in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:97
```

So the interesting thing is this: I can completely get rid of the crashes by disabling the feature that I have been working on that uses `seq_trace`

We did a bit of googling and found a very old bug related to `seq_trace` and memory corruption...

OTP-4222 Match spec 'set_seq_token' corrupts heap


I'm wondering if this issue still exists, and we are winding up with corrupted memory which then means that the `eq` check just blows up...


FYI - Sorry I got the beam files mixed up earlier, but also glad that led to a bug fix of it's own!!


On Tue, Apr 24, 2018 at 7:37 AM, Vince Foley <[hidden email]> wrote:
I'm using Docker to build and run my release (my company has tons of docker infrastructure)..

The official docker image looks like it builds erlang from source with `./otp_build autoconf`



On Tue, Apr 24, 2018 at 7:27 AM, Roger Lipscombe <[hidden email]> wrote:
On 24 April 2018 at 14:20, Vince Foley <[hidden email]> wrote:
> Do you know of any documentation around building "unstripped" erlang
> releases? This is my next step...

Don't strip them in the first place :)

We build our Erlang/OTP using kerl, which we then embed into our
release, which is then packaged as a .deb file, using the Debian tools
(debbuild, etc.).

It's the debbuild step that strips the binaries as they're put in the
.deb file. We simply keep the unstripped beam.smp binary from before
this happens. Do the ESL .deb files (assuming that's how you're
installing Erlang) also contain stripped binaries?



_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: gdb questions (was Segfault in do_allocate_logger_message)

Lukas Larsson-8


On Tue, Apr 24, 2018 at 10:31 PM, Vince Foley <[hidden email]> wrote:
Ok, after a bit of learning how to use gdb, I now have a set of backtraces that point to a different piece of code. Also I have figured out the proximate condition for the issue to exist...

Now that I have the matching beam file to go with the crash dump, I am consistently seeing the crash in this location:

```
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00000000004c7f11 in eq (a=<optimized out>, b=<optimized out>) at beam/utils.c:2346
2346          if (!is_boxed(b) || *boxed_val(b) != *aa)
[Current thread is 1 (Thread 0x7f6721931700 (LWP 515))]
(gdb) backtrace
#0  0x00000000004c7f11 in eq (a=<optimized out>, b=<optimized out>) at beam/utils.c:2346
#1  0x000000000044a15b in process_main (x_reg_array=0x7f672779c5c0, f_reg_array=0x7f672c07fb82) at beam/beam_emu.c:1568
#2  0x00000000004f4b88 in sched_thread_func (vesdp=0x7f6723d47980) at beam/erl_process.c:8906
#3  0x0000000000675945 in thr_wrapper (vtwd=0x7ffdbbd3e4e0) at pthread/ethread.c:118
#4  0x00007f676acfb494 in start_thread (arg=0x7f6721931700) at pthread_create.c:333
#5  0x00007f676a835acf in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:97
```

So the interesting thing is this: I can completely get rid of the crashes by disabling the feature that I have been working on that uses `seq_trace`

We did a bit of googling and found a very old bug related to `seq_trace` and memory corruption...

OTP-4222 Match spec 'set_seq_token' corrupts heap


I'm wondering if this issue still exists, and we are winding up with corrupted memory which then means that the `eq` check just blows up...

That specific issue has been fixed, but that doesn't mean that there aren't more similar ones. seq_trace does not get much usage, so there are bound to be a few bugs in there.

Do you have a smallish testcase that can trigger the issue?
 


FYI - Sorry I got the beam files mixed up earlier, but also glad that led to a bug fix of it's own!!


On Tue, Apr 24, 2018 at 7:37 AM, Vince Foley <[hidden email]> wrote:
I'm using Docker to build and run my release (my company has tons of docker infrastructure)..

The official docker image looks like it builds erlang from source with `./otp_build autoconf`



On Tue, Apr 24, 2018 at 7:27 AM, Roger Lipscombe <[hidden email]> wrote:
On 24 April 2018 at 14:20, Vince Foley <[hidden email]> wrote:
> Do you know of any documentation around building "unstripped" erlang
> releases? This is my next step...

Don't strip them in the first place :)

We build our Erlang/OTP using kerl, which we then embed into our
release, which is then packaged as a .deb file, using the Debian tools
(debbuild, etc.).

It's the debbuild step that strips the binaries as they're put in the
.deb file. We simply keep the unstripped beam.smp binary from before
this happens. Do the ESL .deb files (assuming that's how you're
installing Erlang) also contain stripped binaries?



_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions



_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions