Bug in gb_trees ? Integer key not found.

classic Classic list List threaded Threaded
14 messages Options
Reply | Threaded
Open this post in threaded view
|

Bug in gb_trees ? Integer key not found.

PAILLEAU Eric
> T=gb_trees:from_orddict([{a1,10},{a2,15},{a3,5},{a4,7}]).
{4,{a3,5,{a2,15,{a1,10,nil,nil},nil},{a4,7,nil,nil}}}

> gb_trees:lookup(a1,T).
{value,10}

> f().
ok

>  T=gb_trees:from_orddict([{10,a1},{15,a2},{5,a3},{7,a4}]).
{4,{5,a3,{15,a2,{10,a1,nil,nil},nil},{7,a4,nil,nil}}}

> gb_trees:lookup(10,T).
none

While {value, a1} would be expected.

In documentation :  Key = Val = Term() .
No restriction for key is documented : so a lack of documentation or a
bug ...

Regards.


_______________________________________________
erlang-bugs mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-bugs
Reply | Threaded
Open this post in threaded view
|

Re: Bug in gb_trees ? Integer key not found.

Ulf Wiger

On 9 May 2011, at 22:01, PAILLEAU Eric wrote:

>> T=gb_trees:from_orddict([{10,a1},{15,a2},{5,a3},{7,a4}]).
> {4,{5,a3,{15,a2,{10,a1,nil,nil},nil},{7,a4,nil,nil}}}
>
>> gb_trees:lookup(10,T).
> none
>

The argument is not an orddict.

The reason for the function is speed, obviously, and verifying that the list is in fact ordered would defeat the speed advantage. Thus, it is up to the user to ensure that they actually pass an orddict. This isn't expressly written in the manual...

BR,
Ulf W

Ulf Wiger, CTO, Erlang Solutions, Ltd.
http://erlang-solutions.com



_______________________________________________
erlang-bugs mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-bugs
Reply | Threaded
Open this post in threaded view
|

Re: Bug in gb_trees ? Integer key not found.

PAILLEAU Eric
Hello,
this make sens. I was only looking that keys and values can be any terms
while the function name was self explicit ...

Thanks Ulf.

>
> The argument is not an orddict.
>
> The reason for the function is speed, obviously, and verifying that the list is in fact ordered would defeat the speed advantage. Thus, it is up to the user to ensure that they actually pass an orddict. This isn't expressly written in the manual...
>
_______________________________________________
erlang-bugs mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-bugs
Reply | Threaded
Open this post in threaded view
|

Re: Bug in gb_trees ? Integer key not found.

Matthias Lang
In reply to this post by Ulf Wiger
On Monday, May 09, Ulf Wiger wrote:
>
> On 9 May 2011, at 22:01, PAILLEAU Eric wrote:
>
> >> T=gb_trees:from_orddict([{10,a1},{15,a2},{5,a3},{7,a4}]).
> > {4,{5,a3,{15,a2,{10,a1,nil,nil},nil},{7,a4,nil,nil}}}
> >
> >> gb_trees:lookup(10,T).
> > none

> The argument is not an orddict.

The above was the important part of your mail, but this made me wonder:

> The reason for the function is speed, obviously, and verifying that
> the list is in fact ordered would defeat the speed advantage.

Checking that the input is ordered should be cheap since you have to
traverse the whole input anyway. Here's gb_trees:from_orddict:

  from_orddict(L) ->
    S = length(L),
    {S, balance_list(L, S)}.

replacing the length() call with one which also checks input ordering
is easy, so I did that. I measured a 5% slowdown on long (1M)
lists. I'd live with that, but maybe not everyone would.

I then took a look at "who uses gb_trees:from_orddict anyway?". In
OTP, it's mostly the beam compiler itself. This code pops up a few
times:

  gb_trees:from_orddict(lists:sort(L))

It feels like there's a function missing in gb_trees, one which inserts a
list of tuples which aren't sorted (but do have unique keys).

Matt
_______________________________________________
erlang-bugs mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-bugs
Reply | Threaded
Open this post in threaded view
|

Re: Bug in gb_trees ? Integer key not found.

Robert Virding-2
In reply to this post by PAILLEAU Eric
Dict/orddict have a from_list/1 which takes a list of Key-Value tuples in any order. They also accept non-unique keys taking the value of the last key in the list.

Robert

----- "Matthias Lang" <[hidden email]> wrote:

> On Monday, May 09, Ulf Wiger wrote:
> >
> > On 9 May 2011, at 22:01, PAILLEAU Eric wrote:
> >
> > >> T=gb_trees:from_orddict([{10,a1},{15,a2},{5,a3},{7,a4}]).
> > > {4,{5,a3,{15,a2,{10,a1,nil,nil},nil},{7,a4,nil,nil}}}
> > >
> > >> gb_trees:lookup(10,T).
> > > none
>
> > The argument is not an orddict.
>
> The above was the important part of your mail, but this made me
> wonder:
>
> > The reason for the function is speed, obviously, and verifying that
> > the list is in fact ordered would defeat the speed advantage.
>
> Checking that the input is ordered should be cheap since you have to
> traverse the whole input anyway. Here's gb_trees:from_orddict:
>
>   from_orddict(L) ->
>     S = length(L),
>     {S, balance_list(L, S)}.
>
> replacing the length() call with one which also checks input ordering
> is easy, so I did that. I measured a 5% slowdown on long (1M)
> lists. I'd live with that, but maybe not everyone would.
>
> I then took a look at "who uses gb_trees:from_orddict anyway?". In
> OTP, it's mostly the beam compiler itself. This code pops up a few
> times:
>
>   gb_trees:from_orddict(lists:sort(L))
>
> It feels like there's a function missing in gb_trees, one which
> inserts a
> list of tuples which aren't sorted (but do have unique keys).
>
> Matt
> _______________________________________________
> erlang-bugs mailing list
> [hidden email]
> http://erlang.org/mailman/listinfo/erlang-bugs
_______________________________________________
erlang-bugs mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-bugs
Reply | Threaded
Open this post in threaded view
|

Re: Bug in gb_trees ? Integer key not found.

Kostis Sagonas-2
In reply to this post by Matthias Lang
Matthias Lang wrote:

> On Monday, May 09, Ulf Wiger wrote:
>> On 9 May 2011, at 22:01, PAILLEAU Eric wrote:
>>
>>>> T=gb_trees:from_orddict([{10,a1},{15,a2},{5,a3},{7,a4}]).
>>> {4,{5,a3,{15,a2,{10,a1,nil,nil},nil},{7,a4,nil,nil}}}
>>>
>>>> gb_trees:lookup(10,T).
>>> none
>
>> The argument is not an orddict.

For the record, let me point out that this is not the first time this
question arises.

> The above was the important part of your mail, but this made me wonder:
>
>> The reason for the function is speed, obviously, and verifying that
>> the list is in fact ordered would defeat the speed advantage.
>
> Checking that the input is ordered should be cheap since you have to
> traverse the whole input anyway. Here's gb_trees:from_orddict:
>
>   from_orddict(L) ->
>     S = length(L),
>     {S, balance_list(L, S)}.
>
> replacing the length() call with one which also checks input ordering
> is easy, so I did that. I measured a 5% slowdown on long (1M)
> lists. I'd live with that, but maybe not everyone would.

I strongly support adding something like the above to OTP! The benefits
of doing so far outweigh the drawbacks. Those that cannot live with this
overhead, should better switch to some other language, in my opinion. I
have real trouble seeing how this change would make any measurable
difference in some application. (On the other hand, it might prevent
some accidental hair loss for some developers...)


Even better yet, why not make orddict() opaque so that dialyzer can
detect constructions of "fake ordicts" (i.e. orddicts that are not
produced by calling the functions of the orddict module) ?


> I then took a look at "who uses gb_trees:from_orddict anyway?". In
> OTP, it's mostly the beam compiler itself. This code pops up a few
> times:
>
>   gb_trees:from_orddict(lists:sort(L))

Hmmm...

I would have expected the above to read:

    gb_trees:from_orddict(orddict:from_list(L))

instead, which seems to me more kosher. (*)

<aside>
Note that the two calls might have different semantics actually:

    1> orddict:from_list([{2,a},{1,b},{2,a},{3,c}]).
    [{1,b},{2,a},{3,c}]
    2> lists:sort([{2,a},{1,b},{2,a},{3,c}]).
    [{1,b},{2,a},{2,a},{3,c}]
</aside>

Is this call to lists:sort/1 correct/intentional?  Only the original
developer can answer such questions, if he happens to still recall
whether this was done just to save a millisecond or for some other
reason...  God only knows how many possible subtle bugs may be hidden in
such (or similar) code.


> It feels like there's a function missing in gb_trees, one which inserts a
> list of tuples which aren't sorted (but do have unique keys).

Right again.

Kostis


(*) Even stranger is code in beam_clean.erl which reads:

        Lmap = gb_trees:from_orddict(ordsets:from_list(Lmap0)),

     and

        gb_trees:from_orddict(ordsets:from_list(Acc)).
_______________________________________________
erlang-bugs mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-bugs
Reply | Threaded
Open this post in threaded view
|

Re: Bug in gb_trees ? Integer key not found.

Ulf Wiger
In reply to this post by Matthias Lang

On 9 May 2011, at 23:13, Matthias Lang wrote:

>> The argument is not an orddict.
>
> The above was the important part of your mail, but this made me wonder:
>
>> The reason for the function is speed, obviously, and verifying that
>> the list is in fact ordered would defeat the speed advantage.

Yeah, I know. I thought the same right after I posted the mail. I figured someone would call me on it. :)

BR,
Ulf

Ulf Wiger, CTO, Erlang Solutions, Ltd.
http://erlang-solutions.com



_______________________________________________
erlang-bugs mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-bugs
Reply | Threaded
Open this post in threaded view
|

Re: Bug in gb_trees ? Integer key not found.

Anthony Ramine
In reply to this post by Kostis Sagonas-2
Le 9 mai 2011 à 23:56, Kostis Sagonas a écrit :

> <aside>
> Note that the two calls might have different semantics actually:
>
>   1> orddict:from_list([{2,a},{1,b},{2,a},{3,c}]).
>   [{1,b},{2,a},{3,c}]
>   2> lists:sort([{2,a},{1,b},{2,a},{3,c}]).
>   [{1,b},{2,a},{2,a},{3,c}]
> </aside>
>
> Is this call to lists:sort/1 correct/intentional?  Only the original developer can answer such questions, if he happens to still recall whether this was done just to save a millisecond or for some other reason...  God only knows how many possible subtle bugs may be hidden in such (or similar) code.

lists:ukeysort/2 removes duplicates.

--
Anthony Ramine
Dev:Extend
http://dev-extend.eu




_______________________________________________
erlang-bugs mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-bugs
Reply | Threaded
Open this post in threaded view
|

Re: Bug in gb_trees ? Integer key not found.

Kostis Sagonas-2
Anthony Ramine wrote:

> Le 9 mai 2011 à 23:56, Kostis Sagonas a écrit :
>
>> <aside>
>> Note that the two calls might have different semantics actually:
>>
>>   1> orddict:from_list([{2,a},{1,b},{2,a},{3,c}]).
>>   [{1,b},{2,a},{3,c}]
>>   2> lists:sort([{2,a},{1,b},{2,a},{3,c}]).
>>   [{1,b},{2,a},{2,a},{3,c}]
>> </aside>
>>
>> Is this call to lists:sort/1 correct/intentional?  Only the original developer can answer such questions, if he happens to still recall whether this was done just to save a millisecond or for some other reason...  God only knows how many possible subtle bugs may be hidden in such (or similar) code.
>
> lists:ukeysort/2 removes duplicates.

Right, I know. So does lists:usort/1.

And your point is what exactly?

Kostis
_______________________________________________
erlang-bugs mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-bugs
Reply | Threaded
Open this post in threaded view
|

Re: Bug in gb_trees ? Integer key not found.

Anthony Ramine
Le 10 mai 2011 à 00:30, Kostis Sagonas a écrit :

> Anthony Ramine wrote:
>> Le 9 mai 2011 à 23:56, Kostis Sagonas a écrit :
>>> <aside>
>>> Note that the two calls might have different semantics actually:
>>>
>>>  1> orddict:from_list([{2,a},{1,b},{2,a},{3,c}]).
>>>  [{1,b},{2,a},{3,c}]
>>>  2> lists:sort([{2,a},{1,b},{2,a},{3,c}]).
>>>  [{1,b},{2,a},{2,a},{3,c}]
>>> </aside>
>>>
>>> Is this call to lists:sort/1 correct/intentional?  Only the original developer can answer such questions, if he happens to still recall whether this was done just to save a millisecond or for some other reason...  God only knows how many possible subtle bugs may be hidden in such (or similar) code.
>> lists:ukeysort/2 removes duplicates.
>
> Right, I know. So does lists:usort/1.
>
> And your point is what exactly?
>
> Kostis

Sounds like I forgot to type the most important part of the mail. Shouldn't this sort of nasty trick where the rationale tend to be undocumented and forgotten really fast be avoided in the crucial parts of OTP?

--
Anthony Ramine
Dev:Extend
http://dev-extend.eu




_______________________________________________
erlang-bugs mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-bugs
Reply | Threaded
Open this post in threaded view
|

Re: Bug in gb_trees ? Integer key not found.

Robert Virding-2
Wouldn't the easiest and cleanest solution to this problem be just add gb_trees:from_list? I have no idea why there isn't one.

I don't know if orddict could be an opaque type as the point of it is that the representation is defined to be an ordered list.

Robert

----- "Anthony Ramine" <[hidden email]> wrote:

> Le 10 mai 2011 à 00:30, Kostis Sagonas a écrit :
>
> > Anthony Ramine wrote:
> >> Le 9 mai 2011 à 23:56, Kostis Sagonas a écrit :
> >>> <aside>
> >>> Note that the two calls might have different semantics actually:
> >>>
> >>>  1> orddict:from_list([{2,a},{1,b},{2,a},{3,c}]).
> >>>  [{1,b},{2,a},{3,c}]
> >>>  2> lists:sort([{2,a},{1,b},{2,a},{3,c}]).
> >>>  [{1,b},{2,a},{2,a},{3,c}]
> >>> </aside>
> >>>
> >>> Is this call to lists:sort/1 correct/intentional?  Only the
> original developer can answer such questions, if he happens to still
> recall whether this was done just to save a millisecond or for some
> other reason...  God only knows how many possible subtle bugs may be
> hidden in such (or similar) code.
> >> lists:ukeysort/2 removes duplicates.
> >
> > Right, I know. So does lists:usort/1.
> >
> > And your point is what exactly?
> >
> > Kostis
>
> Sounds like I forgot to type the most important part of the mail.
> Shouldn't this sort of nasty trick where the rationale tend to be
> undocumented and forgotten really fast be avoided in the crucial parts
> of OTP?
>
> --
> Anthony Ramine
> Dev:Extend
> http://dev-extend.eu
>
>
>
>
> _______________________________________________
> erlang-bugs mailing list
> [hidden email]
> http://erlang.org/mailman/listinfo/erlang-bugs
_______________________________________________
erlang-bugs mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-bugs
Reply | Threaded
Open this post in threaded view
|

Re: Bug in gb_trees ? Integer key not found.

PAILLEAU Eric
Le 11/05/2011 03:42, Robert Virding a écrit :
> Wouldn't the easiest and cleanest solution to this problem be just add gb_trees:from_list? I have no idea why there isn't one.
>
> I don't know if orddict could be an opaque type as the point of it is that the representation is defined to be an ordered list.
>
> Robert
>
hello,
I didn't figure out that my "newbie mistake" will generate all these
answers.
The puzzling thing in my mistake is that the function looks to return a
valid balanced tree, even if lookup doesn't work properly...

As 'ordered list' is not a type, input is not tested and no exception is
raised. I do understand.

I agree with Robert : a gb_trees:from_list would be necessary because it
is unlikely to have ordered list from scratch in real life.
It is probable that any call to the function uses 'sort' in the line
above or directly in the input, like do the beam compiler
:gb_trees:from_orddict(lists:sort(L)) . So the overhead is either done
in the module code or in the user code anyway ...

In addition to this, there is 'from_list' functions in queue, ordsets,
array, orddict, dict, sets and gb_sets modules, but not in gb_trees ...
Why not :>) ?

Regards.





_______________________________________________
erlang-bugs mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-bugs
Reply | Threaded
Open this post in threaded view
|

Re: Bug in gb_trees ? Integer key not found.

Matthias Lang
In reply to this post by PAILLEAU Eric
Hi,

sorry about the missing threading headers.

Eric wrote:

 > In addition to this, there is 'from_list' functions in queue, ordsets,
 > array, orddict, dict, sets and gb_sets modules, but not in gb_trees ...
 > Why not :>) ?

Completely agree with your observation that more consistency would be
nice. My 'favourite' example is deciding the function and argument
order when finding out if something's in a structure:

      lists:member(Element, Structure)
        ets:member(Structure, Element)
     string:chr(Structure, Element)
  [ord]dict:is_key(Element, Structure)
       sets:is_element(Element, Structure)
   gb_trees:is_defined(Element, Structure)

None of the those present when this happened have chosen to comment on
the 'why', here are my guesses:

Why did it start? A "just do it!" spirit when Erlang was born, along with
different people having different ideas about how to do things.

Why is it still that way? Backwards compatibility has always been a
high priority in Erlang because it's been used in real products almost
from the start. Cleaning up has usually taken a back seat. But changes
like adding 'from_list' to gb_trees should be uncontroversial.

---

Aside: just spotted a bug in the 'sets' manpage. It says:

   | This module provides exactly the same interface as the module
   | ordsets but with a defined representation.

The bit about "defined representation" looks like a cut-and-paste
error from ordsets. (I looked at R14B0, erlang.org is down for me)

Matt
_______________________________________________
erlang-bugs mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-bugs
Reply | Threaded
Open this post in threaded view
|

Bug in gb_trees ? Integer key not found.

Bengt Kleberg-4
Greetings,

I think it is worth mentioning the attempt made by Ricard C. "Erlang
standard library packages". The comments say:
Basically following Jonas B. et al., "proposal 15"

So some work has been done on this subject.


bengt

On Thu, 2011-05-12 at 18:36 +0200, Matthias Lang wrote:

> Hi,
>
> sorry about the missing threading headers.
>
> Eric wrote:
>
>  > In addition to this, there is 'from_list' functions in queue, ordsets,
>  > array, orddict, dict, sets and gb_sets modules, but not in gb_trees ...
>  > Why not :>) ?
>
> Completely agree with your observation that more consistency would be
> nice. My 'favourite' example is deciding the function and argument
> order when finding out if something's in a structure:
>
>       lists:member(Element, Structure)
>         ets:member(Structure, Element)
>      string:chr(Structure, Element)
>   [ord]dict:is_key(Element, Structure)
>        sets:is_element(Element, Structure)
>    gb_trees:is_defined(Element, Structure)
>
> None of the those present when this happened have chosen to comment on
> the 'why', here are my guesses:
>
> Why did it start? A "just do it!" spirit when Erlang was born, along with
> different people having different ideas about how to do things.
>
> Why is it still that way? Backwards compatibility has always been a
> high priority in Erlang because it's been used in real products almost
> from the start. Cleaning up has usually taken a back seat. But changes
> like adding 'from_list' to gb_trees should be uncontroversial.
>
> ---
>
> Aside: just spotted a bug in the 'sets' manpage. It says:
>
>    | This module provides exactly the same interface as the module
>    | ordsets but with a defined representation.
>
> The bit about "defined representation" looks like a cut-and-paste
> error from ordsets. (I looked at R14B0, erlang.org is down for me)
>
> Matt
> _______________________________________________
> erlang-bugs mailing list
> erlang-bugs
> http://erlang.org/mailman/listinfo/erlang-bugs