term_to_binary/2 with atom cache and/or pid_info/1

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

term_to_binary/2 with atom cache and/or pid_info/1

Loïc Hoguin-4
Hello,

Currently the Erlang Term Format has two variants:

 * the full featured format that includes different forms of atom caches

 * the simpler term_to_binary/1 format that does not

This is not a satisfying state of affairs: sometimes we want to use
term_to_binary/1 for protocols or when exchanging data, but the lack
of atom cache can result in us sending a lot of 'undefined' atoms in
string form.

  => Should term_to_binary/1 allow setting up an atom cache?
     Perhaps the cache could be maintained as a map to be encoded
     separately by the user. This could also allow predefining
     the most common atoms that could then never be sent (for
     example #{undefined => 1, true => 2, false => 3}). Whatever
     the interface we should reuse as much of the distribution
     header atom cache code as possible.

An alternative would be to build our own format loosely based on the
Erlang Term Format. But in that scenario we end up lacking at least
the pid_info/1 and ref_info/1 functions that would allow us to encode
a pid/reference without having to use either term_to_binary/1 or
{pid,ref}_to_list/1. On the other side the pid/reference can be
recomposed via a pid_from_info/1 or ref_from_info/1 type of function.

These functions can be useful to have regardless of the answer to the
first question above. For example pid_info/1 is used in Mnesia here:

 https://github.com/erlang/otp/blob/master/lib/mnesia/src/mnesia_locker.erl#L1270

And also in RabbitMQ here, as well as pid_from_info/1:

 https://github.com/rabbitmq/rabbitmq-server/blob/master/deps/rabbit/src/pid_recomposition.erl

I've also been writing similar code when experimenting with custom
distribution drivers.

  => Should erlang:pid_info/1 and erlang:pid_from_info/1 be added?
     This is the strongest case as there's code in the wild
     already doing this.

  => Should erlang:ref_info/1 and erlang:ref_from_info/1 be added?

It's possible that ports and funs may benefit as well, but I have
a hard time figuring out when we would want to use a port that
way, and funs I believe that we already have everything we need
as long as they're not anonymous funs.

Cheers,

--
Loïc Hoguin

Reply | Threaded
Open this post in threaded view
|

Re: term_to_binary/2 with atom cache and/or pid_info/1

Rickard Green-2
On Mon, Mar 22, 2021 at 12:08 PM Loïc Hoguin <[hidden email]> wrote:
Hello,

Currently the Erlang Term Format has two variants:

 * the full featured format that includes different forms of atom caches

 * the simpler term_to_binary/1 format that does not

This is not a satisfying state of affairs: sometimes we want to use
term_to_binary/1 for protocols or when exchanging data, but the lack
of atom cache can result in us sending a lot of 'undefined' atoms in
string form.

  => Should term_to_binary/1 allow setting up an atom cache?
     Perhaps the cache could be maintained as a map to be encoded
     separately by the user. This could also allow predefining
     the most common atoms that could then never be sent (for
     example #{undefined => 1, true => 2, false => 3}). Whatever
     the interface we should reuse as much of the distribution
     header atom cache code as possible.

An alternative would be to build our own format loosely based on the
Erlang Term Format. But in that scenario we end up lacking at least
the pid_info/1 and ref_info/1 functions that would allow us to encode
a pid/reference without having to use either term_to_binary/1 or
{pid,ref}_to_list/1. On the other side the pid/reference can be
recomposed via a pid_from_info/1 or ref_from_info/1 type of function.

These functions can be useful to have regardless of the answer to the
first question above. For example pid_info/1 is used in Mnesia here:

 https://github.com/erlang/otp/blob/master/lib/mnesia/src/mnesia_locker.erl#L1270

And also in RabbitMQ here, as well as pid_from_info/1:

 https://github.com/rabbitmq/rabbitmq-server/blob/master/deps/rabbit/src/pid_recomposition.erl

I've also been writing similar code when experimenting with custom
distribution drivers.

  => Should erlang:pid_info/1 and erlang:pid_from_info/1 be added?
     This is the strongest case as there's code in the wild
     already doing this.

  => Should erlang:ref_info/1 and erlang:ref_from_info/1 be added?

It's possible that ports and funs may benefit as well, but I have
a hard time figuring out when we would want to use a port that
way, and funs I believe that we already have everything we need
as long as they're not anonymous funs.

Cheers,

--
Loïc Hoguin


It is a bit unfortunate that the "creation" value of the node part is so well hidden since the full identifier of a node is its nodename together with its creation. It would have been nice if the node/1 BIF had returned '{Nodename, Creation}' instead of just 'Nodename', but that is too late to change now. Perhaps a nid/1 BIF?

Currently pids, ports and references are the datatypes that contain node identifiers which also are the types the node/0 BIF can handle.

I think it is reasonable with functionality for creation of such data types from full information, so that alternative protocols wont have to go via the external term format.

Regards,
Rickard
--
Rickard Green, Erlang/OTP, Ericsson AB
Reply | Threaded
Open this post in threaded view
|

Re: term_to_binary/2 with atom cache and/or pid_info/1

Rickard Green-2
On Tue, Mar 23, 2021 at 3:46 PM Rickard Green <[hidden email]> wrote:
On Mon, Mar 22, 2021 at 12:08 PM Loïc Hoguin <[hidden email]> wrote:
Hello,

Currently the Erlang Term Format has two variants:

 * the full featured format that includes different forms of atom caches

 * the simpler term_to_binary/1 format that does not

This is not a satisfying state of affairs: sometimes we want to use
term_to_binary/1 for protocols or when exchanging data, but the lack
of atom cache can result in us sending a lot of 'undefined' atoms in
string form.

  => Should term_to_binary/1 allow setting up an atom cache?
     Perhaps the cache could be maintained as a map to be encoded
     separately by the user. This could also allow predefining
     the most common atoms that could then never be sent (for
     example #{undefined => 1, true => 2, false => 3}). Whatever
     the interface we should reuse as much of the distribution
     header atom cache code as possible.

An alternative would be to build our own format loosely based on the
Erlang Term Format. But in that scenario we end up lacking at least
the pid_info/1 and ref_info/1 functions that would allow us to encode
a pid/reference without having to use either term_to_binary/1 or
{pid,ref}_to_list/1. On the other side the pid/reference can be
recomposed via a pid_from_info/1 or ref_from_info/1 type of function.

These functions can be useful to have regardless of the answer to the
first question above. For example pid_info/1 is used in Mnesia here:

 https://github.com/erlang/otp/blob/master/lib/mnesia/src/mnesia_locker.erl#L1270

And also in RabbitMQ here, as well as pid_from_info/1:

 https://github.com/rabbitmq/rabbitmq-server/blob/master/deps/rabbit/src/pid_recomposition.erl

I've also been writing similar code when experimenting with custom
distribution drivers.

  => Should erlang:pid_info/1 and erlang:pid_from_info/1 be added?
     This is the strongest case as there's code in the wild
     already doing this.

  => Should erlang:ref_info/1 and erlang:ref_from_info/1 be added?

It's possible that ports and funs may benefit as well, but I have
a hard time figuring out when we would want to use a port that
way, and funs I believe that we already have everything we need
as long as they're not anonymous funs.

Cheers,

--
Loïc Hoguin


It is a bit unfortunate that the "creation" value of the node part is so well hidden since the full identifier of a node is its nodename together with its creation. It would have been nice if the node/1 BIF had returned '{Nodename, Creation}' instead of just 'Nodename', but that is too late to change now. Perhaps a nid/1 BIF?

Currently pids, ports and references are the datatypes that contain node identifiers which also are the types the node/0 BIF can handle.

I think it is reasonable with functionality for creation of such data types from full information, so that alternative protocols wont have to go via the external term format.

Regards,
Rickard
--
Rickard Green, Erlang/OTP, Ericsson AB

> the types the node/0 BIF can handle

should have been: "the types the node/1 BIF can handle"

--
Rickard Green, Erlang/OTP, Ericsson AB