
12

On 10/27/2017 04:04 AM, Richard A. O'Keefe wrote:
> The current thread has left me somewhat confused.
>
> Suppose E1 and E2 are different computations,
> delivering maps M1 and M2 respectively, such
> that M1 =:= M2 is true.
>
> Are we guaranteed that term_to_binary(M1) and
> term_to_binary(M2) are equal binaries?
>
>
No. term_to_binary does not give any such guarantees.
And in the case where M1 and M2 were created by different VM instances,
you can with current implementation get different binaries.
/Sverker
_______________________________________________
erlangquestions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlangquestions


Sverker, I understand and agree with the reasons behind the implementation decision.
I can expand a little on discussions I've been involved in:
We have a scenario where we need to serialize objects partly for network comms, partly for storage and partly for signing and cryptographic hasing. We also prefer the serialization to work easily across programming languages.
The trouble is of course the signing/cryptohashing. For this to be stable, the order must be fixed. Obviously, adding a function that makes Erlang maps appear ordered will not suffice, since other language environments would either not expect maps to be ordered at all, or are likely to have a different take on how keys are ordered*.
Better then to stick with the assumption that map key ordering is undefined.
Picking e.g. msgpack encoding as an example, a possible alternative would be to use arrays of 1element maps:
[#{K1 => V1, #{K2 => V2}, ...]
This is easy enough to decode and handle in e.g. JavaScript, and should be stable enough to sign (using the msgpackencoded representation as input to the signing/hash function). The encoding overhead compared to using a single map seems to be about one byte per key.
* For example, the Go implementation of msgpack has a function to make maps appear ordered, but with its own idea of what key types can be sorted.
BR, Ulf W
_______________________________________________
erlangquestions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlangquestions


On 2017年10月27日 金曜日 15:26:22 Sverker Eriksson wrote:
> Yes. All Erlang terms, including maps, have a globally defined,
> implementation independent, total order.
>
> The reason, for this "surprising" order of maps, is the alternative of
> using the normal arithmetic order for keys is much worse.
I am curious, though, why a compound data type was added to the language as a primitive data type. This was the only thing that really bothered me about maps.
Secondclass data type, sure. Ordering unknown. Whatever. Same with all the other compound data types that we make up for ourselves.
Having it as a data primitive introduces inconsistency in the language itself, as we either have to have a tradeoff to amortize enforcement of an arbitrary ordering to make comparisons faster, or force an ordering at the time of comparison (and/or serialization, maybe) at the cost of some computation time in order for things to work. That adds one more odd point of weirdness to the language we *never* once worried about before but now need to remember in edge cases where performance matters. In other words, this creates a new edge case for performance, if I understand thing correctly.
Who before was ever seriously considering using dicts as keys to dicts?
"But lists are compound data types!" Sort of. The ordering of a list very often IS its meaning  thus strings. Not so for maps by their very nature.
Craig
_______________________________________________
erlangquestions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlangquestions


On 2017年10月27日 金曜日 15:42:30 Ulf Wiger wrote:
> We have a scenario where we need to serialize objects partly for network
> comms, partly for storage and partly for signing and cryptographic hasing.
> We also prefer the serialization to work easily across programming
> languages.
>
> The trouble is of course the signing/cryptohashing. For this to be stable,
> the order must be fixed. Obviously, adding a function that makes Erlang
> maps appear ordered will not suffice, since other language environments
> would either not expect maps to be ordered at all, or are likely to have a
> different take on how keys are ordered*.
Interesting. I have been dealing with this exact case myself for quite
some time (before the advent of maps, actually). The tradeoff we had to
make was exactly as you describe: Universal ordered representation of
pretty much any K/V sort of data type (especially across languages) only
works as an ordered list, so the external, canonical representation of
the data must be defined as an ordered list (sorted some specific way).
Occasionally this also means internal lists must be themselves sorted.
That is one critical part of the definition of any serialization
procedure for any data that needs to be verifiable via signature.
Either that or build an ASN.1 DER representation for everything; which
isn't actually so bad, but would be way better if there were tutorials
on just the DDL part of ASN.1 for the cool kids to brush up on...
Having maps inside an Erlang program (and dicts in Python and blats in
Frozz and so on) is quite nice, but it can never be relied on for things
like canonical serialization. Imagine if every SQL query return had to
suddenly be ordered!
Those internal representations can only ever be immediate conveniences,
never canonical data representations. I think this is easy to forget
because most of the time we never even have to worry about what a
"canonical representation" even might be for 99% of the data most of us
ever deal with.
Craig
_______________________________________________
erlangquestions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlangquestions


On 10/27/2017 04:39 PM, Richard
Carlsson wrote:
orddict and gb_trees both use '==' to distinguish keys, which makes
it possible to order them with '>'.
How would you order #{1 => x, 1.0 => y} and #{1 => y, 1.0
=> x}
if you can't order 1 and 1.0?
/Sverker
_______________________________________________
erlangquestions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlangquestions


On Fri, Oct 27, 2017 at 09:36:58PM +0200, Richard Carlsson wrote:
> 20171027 17:45 GMT+02:00 Sverker Eriksson < [hidden email]>
> >
> >
> > How would you order #{1 => x, 1.0 => y} and #{1 => y, 1.0 => x}
> > if you can't order 1 and 1.0?
> >
>
> As long as we're talking about the arithmetic term order (<, >, ==), I
> don't see why they would need to be. Look at tuples:
>
> {1, 1.0} < {1.0, 1}.
> false
> {1, 1.0} > {1.0, 1}.
> false
> {1, 1.0} == {1.0, 1}.
> true
> {1, 1.0} =:= {1.0, 1}.
> false
>
> Maps ought to behave analogously, in the arithmetic ordering.
The problem is, the tuples you provide and comparison operators (<, >,
=<, >=, and == (not =:= one)) form a welldefined partial order (total
order, actually); mainly, if A =< B and B =< A, then A == B. Being
a partial order is a very important property of Erlang's type system,
one that was quite explicitly baked in the VM and is used in many
different places.

Stanislaw Klekot
_______________________________________________
erlangquestions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlangquestions


On 10/27, Richard Carlsson wrote:
>(switching the order of the keys in the left tuple), both of which
>would be
>equally "legal" since 1 == 1.0. But this switches the order of x and y, so
>that the comparisons of the maps would give different results. One way to
>avoid this situation would be to say that you can't have two keys that
>compare equal with == in the same map, just as for an orddict.
This would, unfortunately, break pattern matching:
M = #{1 => 1},
case M of
#{1.0 := _} > % it is safe to add 1.0 if we match
M#{1.0 := 2}; % does this crush a value and change the key?
_ >
other
end.
The fourth line here is problematic. Either (1) maps have a special
magical pattern matching case where 1.0 and 1 compare equal (which
happens nowhere else) and decide whether to replace the key or keep it
the same, or (2) you keep current pattern matching semantics and you
can't use existing pattern matching to enforce the constraints above.
(1) is particularly nasty for cases such as:
X = 1.0,
case {#{1 => 1}, 1} of
{#{X := _}, X} > true = X =:= X; % works
{#{X := _}, Y} > false = X =/= Y % crashes
end
Which one should match? In the first clause, the map would match fine,
even though X =/= X! We just broke a lot of language here. To preserve
safe pattern matching, you should probably not be able to make 1 be
equal to 1.0 in a map through pattern insertions (M#{K := NewVal}).
_______________________________________________
erlangquestions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlangquestions


But again, Jesper, just about everyone relies on the fact that maps follow the general principle that there is a welldefined term comparison order. Otherwise, maps would be highly unsuitable to use in keys, and generally treacherous to use as a replacement for records. Following the Principle of Least Surprise, it's a darned good thing that Erlang doesn't randomize the key order in its maps.
I largely regard a total term order as a language mistake. The right solution, obviously, is to have several equalities, where the programmer can define what they mean by equality in a certain part of the program. Equality is way too important in programming for you to be left with a single one!
That said, there is a welldefined order of maps currently, but it is not the logical one you might expect (which is by ordering the keys of the map). Rather the order is defined on
* What the key hashes to * What the internal HAMT structure looks like right now
It is a total order even! So you can use maps as keys in a balanced search tree for instance.
However, your point does seem to touch on a couple of important things that should be considered for future inclusion:
* We may want to have an "ordered map" in the language. These are selfbalancing binary trees. They are more costly in lookup time and they take up more memory space, but they have the "natural ordering" of keys which means they are welldefined in their traversal.
* Your "sext" library exists to plug yet another hole in the language, namely that binary_to_term have certain freedoms with certain data structures and this leads to situations where you cannot rely on the binary output for, e.g., cryptographic applications.
In short, one has to weigh different implementation details when building data structures. If you want to have it all, your efficiency eventually has to give.
_______________________________________________
erlangquestions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlangquestions


On 28/10/17 2:26 AM, Sverker Eriksson wrote:
> Yes. All Erlang terms, including maps, have a globally defined,
> implementation independent, total order.
>
> The reason, for this "surprising" order of maps, is the alternative of
> using the normal arithmetic order for keys is much worse.
>
> If we want 1 and 1.0 to be different keys (which we do as we use
> matching (=:=) to lookup keys)
> then we need to order 1 and 1.0, and normal arithmetic term ordering
> does not (1 == 1.0).
The fundamental mistake here is confusing *arithmetic* ordering
with *term* ordering. In *term* ordering, if X and Y are
behaviourally distinguishable then either X < Y or X > Y should
be true. Since integers and floats are behaviourally
distinguishable,
1> X = 1 bsl 53.
9007199254740992
2> Y = float(X).
9007199254740992.0
3> X == Y.
true
4> X+1 == Y+1.
false
it follows that either X < Y or Y > X should be true in
*term* (but not *arithmetic*) order.
Arithmetic order in Erlang has some seriously weird issues, like
you can find numbers X Y such that X  Y is 0.0 but X == Y is false.
>
>
> It has been discussed (more and less serious) to expose this
> "mapkeyorder" with operators like :<, :>, =:<, >:=.
Surely we deserve *some* sane ordering in Erlang?
Prolog had two sets of comparison operators: term ordering and
arithmetic ordering. There were *reasons* for that.
_______________________________________________
erlangquestions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlangquestions


On 28/10/17 2:34 AM, Sverker Eriksson wrote:
[term_to_binary/1 is not a pure function; it
depends on the representation of its argument,
not just its value].
OUCH.
For term_to_binary/2, of course the result
depends on the Options argument, but I take
it now that even being explicit about the
Options is not enough. Can we have a
'canonical' option?
>
> And in the case where M1 and M2 were created by different VM instances,
> you can with current implementation get different binaries.
I can live with different *versions* of the VM using different
versions of the binary term format, but two instances of the *same*
VM turning mathematically identical terms into different binaries
is, well, did Nyarlathotep, the Crawling Chaos, have a hand in the
design? In all seriousness, *OUCH*. Please mention this in LARGE
red letters in the documentation for term_to_binary; I don't see it
at the moment, but it puts limits on what you can reasonably do
with termsasbinaries.
_______________________________________________
erlangquestions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlangquestions


A clear statement of lack of guarantee makes sense.
A 'canonical' option could make sense.
That would require N*log(N) sorting to get map keys in canonical order
(deja vu).
About VM instance and version; I don't see the big divide there.
Would it not be treacherous to base an application design
on term_to_binary being a pure function among VM instances
as long as you don't upgrade them.
/Sverker
On 10/30/2017 01:16 AM, Richard A. O'Keefe wrote:
>
>
> On 28/10/17 2:34 AM, Sverker Eriksson wrote:
> [term_to_binary/1 is not a pure function; it
> depends on the representation of its argument,
> not just its value].
>
> OUCH.
>
> For term_to_binary/2, of course the result
> depends on the Options argument, but I take
> it now that even being explicit about the
> Options is not enough. Can we have a
> 'canonical' option?
>
>>
>> And in the case where M1 and M2 were created by different VM instances,
>> you can with current implementation get different binaries.
>
> I can live with different *versions* of the VM using different
> versions of the binary term format, but two instances of the *same*
> VM turning mathematically identical terms into different binaries
> is, well, did Nyarlathotep, the Crawling Chaos, have a hand in the
> design? In all seriousness, *OUCH*. Please mention this in LARGE
> red letters in the documentation for term_to_binary; I don't see it
> at the moment, but it puts limits on what you can reasonably do
> with termsasbinaries.
>
>
>
_______________________________________________
erlangquestions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlangquestions


On Sat, Oct 28, 2017 at 11:09 PM Richard Carlsson < [hidden email]> wrote: As we quoted from the reference manual, the < ordering on maps is actually implementation independent and future proof (while still being a total order). The sacrifice is that to compare two maps with <, their keys must be mapped into canonical order (with integers before floats as discussed). This is clearly more costly than just taking whatever order the current underlying hash+HAMT produces, but worth it since it preserves those nice properties.
Oh, then I was wrong!
Thanks for the correction.
_______________________________________________
erlangquestions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlangquestions


In reply to this post by Richard A. O'Keefe2
> On 30 Oct 2017, at 01:16, Richard A. O'Keefe < [hidden email]> wrote:
>
> For term_to_binary/2, of course the result
> depends on the Options argument, but I take
> it now that even being explicit about the
> Options is not enough. Can we have a
> 'canonical' option?
>
+1 . That would be an awesome option :)
 benoit
_______________________________________________
erlangquestions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlangquestions


to continue the discussion i just commited a simple library that does this:
Hopefully such thing will be optimised soon :)
 benoit
On 30 Oct 2017, at 01:16, Richard A. O'Keefe <[hidden email]> wrote:
For term_to_binary/2, of course the result depends on the Options argument, but I take it now that even being explicit about the Options is not enough. Can we have a 'canonical' option?
+1 . That would be an awesome option :)  benoit
_______________________________________________
erlangquestions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlangquestions

12
