Issue with using enif_binary_to_term

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Issue with using enif_binary_to_term

Vincent Siliakus-2
Hi all,

I'm writing a NIF library and can't wrap my head around why the following code makes the erlang runtime hang when called from a shell:

  static ERL_NIF_TERM test(ErlNifEnv* env, int argc, const ERL_NIF_TERM argv[]) {
    ErlNifBinary bin;

    ERL_NIF_TERM list = enif_make_list(env, 0);
    ERL_NIF_TERM in_term = enif_make_uint(env, 42);
    ERL_NIF_TERM out_term1, out_term2, out_term3;

    enif_term_to_binary(env, in_term, &bin);

    enif_binary_to_term(env, bin.data, bin.size, &out_term1, 0);
    list = enif_make_list_cell(env, out_term1, list);

    enif_binary_to_term(env, bin.data, bin.size, &out_term2, 0);
    list = enif_make_list_cell(env, out_term2, list);

    enif_binary_to_term(env, bin.data, bin.size, &out_term3, 0);
    list = enif_make_list_cell(env, out_term3, list);

    return list;
  }

The multiple calls to enif_binary_to_term somehow seem to corrupt memory in the calling environment, so I'm probably using it incorrectly. Could some kind soul point me to the error? I'm running this code on OTP20 / erts-9.2.

Thanks in advance,
Vincent

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Issue with using enif_binary_to_term

Sverker Eriksson-4

This is a bug in enif_binary_to_term which causes heap corruption when the term
is an immediate (atom, small integer, pid, port, empty list).

This should fix it:

diff --git a/erts/emulator/beam/erl_nif.c b/erts/emulator/beam/erl_nif.c
index e208792..0fbf0eb 100644
--- a/erts/emulator/beam/erl_nif.c
+++ b/erts/emulator/beam/erl_nif.c
@@ -1255,8 +1255,10 @@ size_t enif_binary_to_term(ErlNifEnv *dst_env,
     if (is_non_value(*term)) {
         return 0;
     }
-    erts_factory_close(&factory);
-    cache_env(dst_env);
+    if (size > 0) {
+        erts_factory_close(&factory);
+        cache_env(dst_env);
+    }
 
     ASSERT(bp > data);
     return bp - data;



Your usage looks correct. The only nitpick is to test the return value from
enif_binary_to_term, either to handle broken binary or assert it's correct.

/Sverker


On mån, 2018-05-14 at 22:57 +0200, Vincent Siliakus wrote:

> Hi all,
>
> I'm writing a NIF library and can't wrap my head around why the following code
> makes the erlang runtime hang when called from a shell:
>
>   static ERL_NIF_TERM test(ErlNifEnv* env, int argc, const ERL_NIF_TERM
> argv[]) {
>     ErlNifBinary bin;
>
>     ERL_NIF_TERM list = enif_make_list(env, 0);
>     ERL_NIF_TERM in_term = enif_make_uint(env, 42);
>     ERL_NIF_TERM out_term1, out_term2, out_term3;
>
>     enif_term_to_binary(env, in_term, &bin);
>
>     enif_binary_to_term(env, bin.data, bin.size, &out_term1, 0);
>     list = enif_make_list_cell(env, out_term1, list);
>
>     enif_binary_to_term(env, bin.data, bin.size, &out_term2, 0);
>     list = enif_make_list_cell(env, out_term2, list);
>
>     enif_binary_to_term(env, bin.data, bin.size, &out_term3, 0);
>     list = enif_make_list_cell(env, out_term3, list);
>
>     return list;
>   }
>
> The multiple calls to enif_binary_to_term somehow seem to corrupt memory in
> the calling environment, so I'm probably using it incorrectly. Could some kind
> soul point me to the error? I'm running this code on OTP20 / erts-9.2.
>
> Thanks in advance,
> Vincent
> _______________________________________________
> erlang-questions mailing list
> [hidden email]
> http://erlang.org/mailman/listinfo/erlang-questions
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Issue with using enif_binary_to_term

Ross Schlaikjer
In reply to this post by Vincent Siliakus-2
Vincent,

I'm not entirely sure yet why this is happening, but I think I have at least figured out at least some of what is going on.

Heap allocation in erts is done very simply (for the most part), in that it is linearly allocated like so (as of otp git tag OTP-19.2.1): 

 140 static ERTS_INLINE Eterm* alloc_heap(ErlNifEnv* env, size_t need)
 141 {
 142     Eterm* hp = env->hp;
 143     env->hp += need;
 144     if (env->hp <= env->hp_end) {
 145     return hp;
 146     }
 147     return alloc_heap_heavy(env, need, hp);
 148 }

When allocating a new cons cell, we take two 64bit words, store car/cdr and return the address of the base pointer + the list tag (0x1).

1726 ERL_NIF_TERM enif_make_list_cell(ErlNifEnv* env, Eterm car, Eterm cdr)
1727 {
1728     Eterm* hp = alloc_heap(env,2);
1729     Eterm ret = make_list(hp);
1730 
1731     ASSERT_IN_ENV(env, car, 0, "head of list cell");
1732     ASSERT_IN_ENV(env, cdr, 0, "tail of list cell");
1733     CAR(hp) = car;
1734     CDR(hp) = cdr;
1735     return ret;
1736 } 

make_list above epands to the macro
#define _unchecked_make_list(x) ((Uint)(x) + TAG_PRIMARY_LIST)
so all it's doing is adding 1 to that ret pointer.

If we take this all to gdb, we can step through your code and see the list modifications occurring (comments inline):

18     ERL_NIF_TERM list = enif_make_list(env, 0);
(gdb) n
19     ERL_NIF_TERM in_term = enif_make_uint(env, 42);
(gdb) p/x list
$1 = 0xfffffffffffffffb  # This is the 'empty list' value
(gdb) n
22     int success = enif_term_to_binary(env, in_term, &bin);
(gdb) n
24     int bcount = enif_binary_to_term(env, bin.data, 3, &out_term1, 0);
(gdb) p/x env->hp  # Check the heap pointer before we call enif_binary_to_term
$4 = 0x7fff79840fe0
(gdb) n
25     list = enif_make_list_cell(env, out_term1, list);
(gdb) p/x env->hp  # Check again after - notice the hp hasn't changed
$5 = 0x7fff79840fe0
(gdb) n  # Make the cons cell
27     bcount = enif_binary_to_term(env, bin.data, 3, &out_term2, 0);
(gdb) p/x list # As we can see, it has the address of the base pointer + the list tag (1)
$6 = 0x7fff79840fe1
(gdb) x/2g list-1  # If we examine it, we see it has the head (our immediate int value) and the tail (empty list)
0x7fff79840fe0: 0x00000000000002af 0xfffffffffffffffb
(gdb) p/x env->hp  # Our heap pointer has gone up by 16 bytes, as expected (2 64bit words)
$7 = 0x7fff79840ff0
(gdb) n  # Now we execute the second enif_binary_to_term call
28     list = enif_make_list_cell(env, out_term2, list);
(gdb) p/x env->hp
$8 = 0x7fff79840fe0  # This is where our problem begins! binary_to_term has _decremented_ the heap pointer!
(gdb) n  # Create our second cons cell
30     bcount = enif_binary_to_term(env, bin.data, 3, &out_term3, 0);
(gdb) p/x list  # We look at the 'new' list pointer value... but it's the same as before!
$9 = 0x7fff79840fe1
(gdb) x/2g list -1
0x7fff79840fe0: 0x00000000000002af 0x00007fff79840fe1  # This list now references itself!

It looks like when we call binary_to_term, the cache_env call is what's messing with our heap pointer -
(gdb) watch env->hp
Hardware watchpoint 2: env->hp
(gdb) n

Thread 14 "1_scheduler" hit Hardware watchpoint 2: env->hp

Old value = (Eterm *) 0x7fff79840ff0
New value = (Eterm *) 0x7fff79840fe0
0x00005555556ce798 in cache_env (env=0x7fffb34bee40) at beam/erl_nif.c:421
warning: Source file is more recent than executable.
421 env->hp = HEAP_TOP(env->proc);
(gdb) bt
#0  0x00005555556ce798 in cache_env (env=0x7fffb34bee40) at beam/erl_nif.c:421
#1  enif_binary_to_term (dst_env=0x7fffb34bee40, data=0x7fffb5b82b68 "\203a*", data_sz=<optimized out>, term=0x7fffb34bed28, 
    opts=(unknown: 0)) at beam/erl_nif.c:1161


So as far as I can tell, something to do with the switchover between HAlloc and alloc_heap (flush_env / cache_env) seems to be losing some bytes off the heap pointer, but _only_ for subsequent calls. Unfortunately I must go back to my day job for now, but will try and debug more later.

Hope this helps.

- Ross S

On Mon, May 14, 2018 at 4:57 PM, Vincent Siliakus <[hidden email]> wrote:
Hi all,

I'm writing a NIF library and can't wrap my head around why the following code makes the erlang runtime hang when called from a shell:

  static ERL_NIF_TERM test(ErlNifEnv* env, int argc, const ERL_NIF_TERM argv[]) {
    ErlNifBinary bin;

    ERL_NIF_TERM list = enif_make_list(env, 0);
    ERL_NIF_TERM in_term = enif_make_uint(env, 42);
    ERL_NIF_TERM out_term1, out_term2, out_term3;

    enif_term_to_binary(env, in_term, &bin);

    enif_binary_to_term(env, bin.data, bin.size, &out_term1, 0);
    list = enif_make_list_cell(env, out_term1, list);

    enif_binary_to_term(env, bin.data, bin.size, &out_term2, 0);
    list = enif_make_list_cell(env, out_term2, list);

    enif_binary_to_term(env, bin.data, bin.size, &out_term3, 0);
    list = enif_make_list_cell(env, out_term3, list);

    return list;
  }

The multiple calls to enif_binary_to_term somehow seem to corrupt memory in the calling environment, so I'm probably using it incorrectly. Could some kind soul point me to the error? I'm running this code on OTP20 / erts-9.2.

Thanks in advance,
Vincent

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions



_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Issue with using enif_binary_to_term

Vincent Siliakus-2
In reply to this post by Vincent Siliakus-2
Hi Sverker,

Thanks for the reply including the fix. Will there be an OTP 20 release that will include this patch, or will it only be included in OTP 21? For internal usage it doesn't really matter, but I'm hoping to release the library I'm working on as open source in the near future.

Regarding testing the return value: I'm actually doing this in the code where I stumbled upon this issue, which is reading terms in external term format from an embedded kv store. I just wanted to make the example as simple as possible to highlight the issue.

Best,
Vincent
 
This is a bug in enif_binary_to_term which causes heap corruption when the term
is an immediate (atom, small integer, pid, port, empty list).

This should fix it:

diff --git a/erts/emulator/beam/erl_nif.c b/erts/emulator/beam/erl_nif.c
index e208792..0fbf0eb 100644
--- a/erts/emulator/beam/erl_nif.c
+++ b/erts/emulator/beam/erl_nif.c
@@ -1255,8 +1255,10 @@ size_t enif_binary_to_term(ErlNifEnv *dst_env,
     if (is_non_value(*term)) {
         return 0;
     }
-    erts_factory_close(&factory);
-    cache_env(dst_env);
+    if (size > 0) {
+        erts_factory_close(&factory);
+        cache_env(dst_env);
+    }
 
     ASSERT(bp > data);
     return bp - data;



Your usage looks correct. The only nitpick is to test the return value from
enif_binary_to_term, either to handle broken binary or assert it's correct.

/Sverker


_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Issue with using enif_binary_to_term

Vincent Siliakus-2
In reply to this post by Ross Schlaikjer
Hi Ross,

Wow, thanks for the extensive detective work! Once upon a time I was a full time C++ developer, only using Visual Studio and it's debugger. Gdb always looked so arcane in comparison, but this is an excellent reminder that there's no excuse for me to dig into gdb.

In the mean time it's confirmed that my issue is caused by a bug in enif_binary_to_term, but many thanks for you time anyway.

Best,
Vincent

On Tue, May 15, 2018 at 5:01 PM, Ross Schlaikjer <[hidden email]> wrote:
Vincent,

I'm not entirely sure yet why this is happening, but I think I have at least figured out at least some of what is going on.

Heap allocation in erts is done very simply (for the most part), in that it is linearly allocated like so (as of otp git tag OTP-19.2.1): 

 140 static ERTS_INLINE Eterm* alloc_heap(ErlNifEnv* env, size_t need)
 141 {
 142     Eterm* hp = env->hp;
 143     env->hp += need;
 144     if (env->hp <= env->hp_end) {
 145     return hp;
 146     }
 147     return alloc_heap_heavy(env, need, hp);
 148 }

When allocating a new cons cell, we take two 64bit words, store car/cdr and return the address of the base pointer + the list tag (0x1).

1726 ERL_NIF_TERM enif_make_list_cell(ErlNifEnv* env, Eterm car, Eterm cdr)
1727 {
1728     Eterm* hp = alloc_heap(env,2);
1729     Eterm ret = make_list(hp);
1730 
1731     ASSERT_IN_ENV(env, car, 0, "head of list cell");
1732     ASSERT_IN_ENV(env, cdr, 0, "tail of list cell");
1733     CAR(hp) = car;
1734     CDR(hp) = cdr;
1735     return ret;
1736 } 

make_list above epands to the macro
#define _unchecked_make_list(x) ((Uint)(x) + TAG_PRIMARY_LIST)
so all it's doing is adding 1 to that ret pointer.

If we take this all to gdb, we can step through your code and see the list modifications occurring (comments inline):

18     ERL_NIF_TERM list = enif_make_list(env, 0);
(gdb) n
19     ERL_NIF_TERM in_term = enif_make_uint(env, 42);
(gdb) p/x list
$1 = 0xfffffffffffffffb  # This is the 'empty list' value
(gdb) n
22     int success = enif_term_to_binary(env, in_term, &bin);
(gdb) n
24     int bcount = enif_binary_to_term(env, bin.data, 3, &out_term1, 0);
(gdb) p/x env->hp  # Check the heap pointer before we call enif_binary_to_term
$4 = 0x7fff79840fe0
(gdb) n
25     list = enif_make_list_cell(env, out_term1, list);
(gdb) p/x env->hp  # Check again after - notice the hp hasn't changed
$5 = 0x7fff79840fe0
(gdb) n  # Make the cons cell
27     bcount = enif_binary_to_term(env, bin.data, 3, &out_term2, 0);
(gdb) p/x list # As we can see, it has the address of the base pointer + the list tag (1)
$6 = 0x7fff79840fe1
(gdb) x/2g list-1  # If we examine it, we see it has the head (our immediate int value) and the tail (empty list)
0x7fff79840fe0: 0x00000000000002af 0xfffffffffffffffb
(gdb) p/x env->hp  # Our heap pointer has gone up by 16 bytes, as expected (2 64bit words)
$7 = 0x7fff79840ff0
(gdb) n  # Now we execute the second enif_binary_to_term call
28     list = enif_make_list_cell(env, out_term2, list);
(gdb) p/x env->hp
$8 = 0x7fff79840fe0  # This is where our problem begins! binary_to_term has _decremented_ the heap pointer!
(gdb) n  # Create our second cons cell
30     bcount = enif_binary_to_term(env, bin.data, 3, &out_term3, 0);
(gdb) p/x list  # We look at the 'new' list pointer value... but it's the same as before!
$9 = 0x7fff79840fe1
(gdb) x/2g list -1
0x7fff79840fe0: 0x00000000000002af 0x00007fff79840fe1  # This list now references itself!

It looks like when we call binary_to_term, the cache_env call is what's messing with our heap pointer -
(gdb) watch env->hp
Hardware watchpoint 2: env->hp
(gdb) n

Thread 14 "1_scheduler" hit Hardware watchpoint 2: env->hp

Old value = (Eterm *) 0x7fff79840ff0
New value = (Eterm *) 0x7fff79840fe0
0x00005555556ce798 in cache_env (env=0x7fffb34bee40) at beam/erl_nif.c:421
warning: Source file is more recent than executable.
421 env->hp = HEAP_TOP(env->proc);
(gdb) bt
#0  0x00005555556ce798 in cache_env (env=0x7fffb34bee40) at beam/erl_nif.c:421
#1  enif_binary_to_term (dst_env=0x7fffb34bee40, data=0x7fffb5b82b68 "\203a*", data_sz=<optimized out>, term=0x7fffb34bed28, 
    opts=(unknown: 0)) at beam/erl_nif.c:1161


So as far as I can tell, something to do with the switchover between HAlloc and alloc_heap (flush_env / cache_env) seems to be losing some bytes off the heap pointer, but _only_ for subsequent calls. Unfortunately I must go back to my day job for now, but will try and debug more later.

Hope this helps.

- Ross S

On Mon, May 14, 2018 at 4:57 PM, Vincent Siliakus <[hidden email]> wrote:
Hi all,

I'm writing a NIF library and can't wrap my head around why the following code makes the erlang runtime hang when called from a shell:

  static ERL_NIF_TERM test(ErlNifEnv* env, int argc, const ERL_NIF_TERM argv[]) {
    ErlNifBinary bin;

    ERL_NIF_TERM list = enif_make_list(env, 0);
    ERL_NIF_TERM in_term = enif_make_uint(env, 42);
    ERL_NIF_TERM out_term1, out_term2, out_term3;

    enif_term_to_binary(env, in_term, &bin);

    enif_binary_to_term(env, bin.data, bin.size, &out_term1, 0);
    list = enif_make_list_cell(env, out_term1, list);

    enif_binary_to_term(env, bin.data, bin.size, &out_term2, 0);
    list = enif_make_list_cell(env, out_term2, list);

    enif_binary_to_term(env, bin.data, bin.size, &out_term3, 0);
    list = enif_make_list_cell(env, out_term3, list);

    return list;
  }

The multiple calls to enif_binary_to_term somehow seem to corrupt memory in the calling environment, so I'm probably using it incorrectly. Could some kind soul point me to the error? I'm running this code on OTP20 / erts-9.2.

Thanks in advance,
Vincent

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions




_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Issue with using enif_binary_to_term

Sverker Eriksson-4
In reply to this post by Vincent Siliakus-2

On tis, 2018-05-15 at 17:02 +0200, Vincent Siliakus wrote:
> Hi Sverker,
>
> Thanks for the reply including the fix. Will there be an OTP 20 release that
> will include this patch, or will it only be included in OTP 21? For internal
> usage it doesn't really matter, but I'm hoping to release the library I'm
> working on as open source in the near future.
>

There is no OTP-20.4 planned, but I've put the fix in the pipe to be included
in
the next patch release on OTP-20.3.* if (or dare I say when) that happens.


/Sverker
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions