RNG algorithm choice for uniform natural numbers

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

RNG algorithm choice for uniform natural numbers

Krzysztof Jurewicz
I’m looking for an algorithm to perform a transparent, reproducible generation of an uniform sequence of natural numbers. There are three RNG algorithms documented in the rand module:

• Xoroshiro116+. According to the linked page at http://xoshiro.di.unimi.it/ , it is suitable only for floating point numbers, so it doesn’t fit. Moreover, the page actually describes an algorithm with 64-bit precision, while rand implements a variant with 56-bit precision. It looks like the latter is implemented only in Erlang, so there is little practical interoperability between languages.
• Xorshift1024*. It fails the BigCrush test according to https://lemire.me/blog/2017/09/15/the-xorshift1024-random-number-generator-fails-bigcrush/ .
• Xorshift116+. The comment about 58-bit precision being rare applies (according to https://github.com/jj1bdx/emprng/ , “the original exsplus was Xorshift128+”). Xorshift128+ fails the BigCrush test, so I presume that xorshift116+ fails it too.

So… is there any particular reason for the rand module to not implement xoshiro256**, which is more widespread than 58-bit precision algorithms and, according to its creators, “has excellent (sub-ns) speed, a state space (256 bits) that is large enough for any parallel application, and it passes all tests we are aware of”?
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: RNG algorithm choice for uniform natural numbers

Raimo Niskanen-2
On Thu, Aug 02, 2018 at 02:19:17PM +0200, Krzysztof Jurewicz wrote:
> I’m looking for an algorithm to perform a transparent, reproducible generation of an uniform sequence of natural numbers. There are three RNG algorithms documented in the rand module:
>
> • Xoroshiro116+. According to the linked page at http://xoshiro.di.unimi.it/ , it is suitable only for floating point numbers, so it doesn’t fit. Moreover, the page actually describes an algorithm with 64-bit precision, while rand implements a variant with 56-bit precision. It looks like the latter is implemented only in Erlang, so there is little practical interoperability between languages.
> • Xorshift1024*. It fails the BigCrush test according to https://lemire.me/blog/2017/09/15/the-xorshift1024-random-number-generator-fails-bigcrush/ .

It is a known fact that the lowest bits of Xors... generators with * and +
scramblers are of lower quality than the rest.  If he had used the high 32
bits I predict that it would have passed.

The author of Xoroshiro, etc, thinks that this is a small problem.

> • Xorshift116+. The comment about 58-bit precision being rare applies (according to https://github.com/jj1bdx/emprng/ , “the original exsplus was Xorshift128+”). Xorshift128+ fails the BigCrush test, so I presume that xorshift116+ fails it too.

I would also guess so.

>
> So… is there any particular reason for the rand module to not implement xoshiro256**, which is more widespread than 58-bit precision algorithms and, according to its creators, “has excellent (sub-ns) speed, a state space (256 bits) that is large enough for any parallel application, and it passes all tests we are aware of”?

Any 64-bit word algorithm runs about 2..3 (or more) times slower
than a 58-bit output word one, on 64-bit Erlang.  That's why.

Implementation of a new algorithm should be fairly easy (mostly
cut-and-paste).  Unfortunately the bignum operations that 64-bit
calculations provoke hits performance a lot.

We do not have any generator that uses the new ** scrambler that makes
all tests pass BigCrush (with the lowest bits), yet, but...

I have a pull request for Xoroshiro928** that is a reworked Xoroshiro1024**
with 58-bit word size, exactly because of speed reasons.  For the next
step I was thinking about rewriting Xoroshiro256** to 58-bit, but the
resulting algorighm would not be much faster than Xoroshiro928**,
it would only have smaller state space, hopefully.  So I am not certain
that it would be worth implementing.

    https://github.com/erlang/otp/pull/1857

I have never thought about implementing the new 64-bit word size generators
since nobody has asked for inter-platform compatibility before, and you still
have to be careful with bit compatibility.  It is only when you request
the generator word size range that you get the raw numbers.  And for
floating point if you are lucky that both platforms use the same bits
(low/high/middle).

--

/ Raimo Niskanen, Erlang/OTP, Ericsson AB
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: RNG algorithm choice for uniform natural numbers

Raimo Niskanen-2
On Thu, Aug 02, 2018 at 06:10:24PM +0200, Raimo Niskanen wrote:
> On Thu, Aug 02, 2018 at 02:19:17PM +0200, Krzysztof Jurewicz wrote:
> > I’m looking for an algorithm to perform a transparent, reproducible generation of an uniform sequence of natural numbers. There are three RNG algorithms documented in the rand module:
> >
: :
>
> I have a pull request for Xoroshiro928** that is a reworked Xoroshiro1024**
> with 58-bit word size, exactly because of speed reasons.  For the next
> step I was thinking about rewriting Xoroshiro256** to 58-bit, but the
> resulting algorighm would not be much faster than Xoroshiro928**,
> it would only have smaller state space, hopefully.  So I am not certain
> that it would be worth implementing.
>
>     https://github.com/erlang/otp/pull/1857

To get you platform interoperability I could publish the C code I used as
reference for writing Xoroshiro928** in Erlang.
Yes there is a C implementation of it!


>
> I have never thought about implementing the new 64-bit word size generators
> since nobody has asked for inter-platform compatibility before, and you still
> have to be careful with bit compatibility.  It is only when you request
> the generator word size range that you get the raw numbers.  And for
> floating point if you are lucky that both platforms use the same bits
> (low/high/middle).
>
> --
>
> / Raimo Niskanen, Erlang/OTP, Ericsson AB

--

/ Raimo Niskanen, Erlang/OTP, Ericsson AB
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: RNG algorithm choice for uniform natural numbers

Richard O'Keefe
In reply to this post by Krzysztof Jurewicz
a "uniform sequence of natural numbers".  There are infinitely many natural numbers.  The generators you mention
produce integers of fixed size.  What exactly is it you want?   Do you want something like

On 3 August 2018 at 00:19, Krzysztof Jurewicz <[hidden email]> wrote:
I’m looking for an algorithm to perform a transparent, reproducible generation of an uniform sequence of natural numbers. There are three RNG algorithms documented in the rand module:

• Xoroshiro116+. According to the linked page at http://xoshiro.di.unimi.it/ , it is suitable only for floating point numbers, so it doesn’t fit. Moreover, the page actually describes an algorithm with 64-bit precision, while rand implements a variant with 56-bit precision. It looks like the latter is implemented only in Erlang, so there is little practical interoperability between languages.
• Xorshift1024*. It fails the BigCrush test according to https://lemire.me/blog/2017/09/15/the-xorshift1024-random-number-generator-fails-bigcrush/ .
• Xorshift116+. The comment about 58-bit precision being rare applies (according to https://github.com/jj1bdx/emprng/ , “the original exsplus was Xorshift128+”). Xorshift128+ fails the BigCrush test, so I presume that xorshift116+ fails it too.

So… is there any particular reason for the rand module to not implement xoshiro256**, which is more widespread than 58-bit precision algorithms and, according to its creators, “has excellent (sub-ns) speed, a state space (256 bits) that is large enough for any parallel application, and it passes all tests we are aware of”?
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions


_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: RNG algorithm choice for uniform natural numbers

Krzysztof Jurewicz
In reply to this post by Raimo Niskanen-2
Raimo Niskanen writes:

>> • Xorshift1024*. It fails the BigCrush test according to https://lemire.me/blog/2017/09/15/the-xorshift1024-random-number-generator-fails-bigcrush/ .
>
> It is a known fact that the lowest bits of Xors... generators with * and +
> scramblers are of lower quality than the rest.  If he had used the high 32
> bits I predict that it would have passed.
>
> The author of Xoroshiro, etc, thinks that this is a small problem.

What I actually want to implement is a randomized apportionment scheme, as described on https://rangevoting.org/Apportion.html (“Truly unbiased methods”), but more efficient, like in the code below (not tested). Therefore if the test failure won’t cause a noticeable bias, then it indeed shouldn’t be a problem.

-type score() :: ({Votes :: non_neg_integer(), Id :: any()}).

-spec apportion(list(score()), pos_integer(), pos_integer()) -> list({Id :: any(), Seats :: non_neg_integer()}).
apportion(Scores, TotalSeats, VotesSum) ->
    apportion(Scores, TotalSeats, VotesSum, []).

apportion([], 0, 0, Acc) ->
    Acc;
apportion([{Votes, Id}|ScoresTail], TotalSeats, VotesSum, Acc) ->
    WholeSeats = Votes * TotalSeats div VotesSum,
    DrawnSeats =
        case rand:uniform(VotesSum) =< Votes * TotalSeats rem VotesSum of
            true ->
                1;
            false ->
                0
        end,
    Seats = WholeSeats + DrawnSeats,
    apportion(ScoresTail, TotalSeats - Seats, VotesSum - Votes, [{Id, Seats}|Acc]).

> Any 64-bit word algorithm runs about 2..3 (or more) times slower
> than a 58-bit output word one, on 64-bit Erlang.  That's why.

Sounds plausible. Though such differences likely won’t matter in my case, it doesn’t rule out choosing a 58-bit algorithm. The inter-platform compatibility issue is hypothetical for now.

> I have never thought about implementing the new 64-bit word size generators
> since nobody has asked for inter-platform compatibility before, and you still
> have to be careful with bit compatibility.  It is only when you request
> the generator word size range that you get the raw numbers.  And for
> floating point if you are lucky that both platforms use the same bits
> (low/high/middle).

If I understand correctly, you basically mean that different RNG libraries can convert the same entropy bits to numbers differently.
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: RNG algorithm choice for uniform natural numbers

Krzysztof Jurewicz
In reply to this post by Richard O'Keefe
Richard O'Keefe writes:

> a "uniform sequence of natural numbers".  There are infinitely many natural
> numbers.  The generators you mention
> produce integers of fixed size.  What exactly is it you want?   Do you want
> something like
> https://gmplib.org/manual/Integer-Random-Numbers.html ?

Yes, bad wording on my side. It will be a sequence of invocations of rand:uniform_s/2 (on potentially differing integers).
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: RNG algorithm choice for uniform natural numbers

Krzysztof Jurewicz
In reply to this post by Krzysztof Jurewicz
> What I actually want to implement is a randomized apportionment scheme, as described on https://rangevoting.org/Apportion.html (“Truly unbiased methods”), but more efficient, like in the code below (not tested).

Tests revealed that the simple algorithm does not satisfy the quota property. Therefore I’ve used a more sophisticated one (code below), which however doesn’t require using a specific interval for drawing integers. Instead of the rand module, I use SHA-256 to generate random integers (taking first n bits). This is very interoperable between languages, though I don’t know whether SHA-256 satisfies all RNG properties desirable in that context. (Also I don’t have a proof that the distribution of apportioned seats is unbiased).

-type score() :: ({Votes :: non_neg_integer(), Id :: any()}).
-type result() :: {Id :: any(), Seats :: non_neg_integer()}.

-spec apportion(list(score()), pos_integer(), pos_integer(), binary()) -> list(result()).
apportion(Scores, TotalSeats, VotesSum, State) ->
    {WholeSeatsMap, RemainderScores, RemainingSeats, _} =
        lists:foldl(
          fun (
            {Votes, Id},
            {WholeSeatsAcc,
             RemainderScoresAcc,
             RemSeatsAcc,
             StateAcc}) ->
                  Seats = Votes * TotalSeats div VotesSum,
                  <<RandInt:32, _/binary>> = NewState = crypto:hash(sha256, StateAcc),
                  RemainderScore = (Votes * TotalSeats rem VotesSum) * (RandInt + 1),
                  {maps:put(Id, Seats, WholeSeatsAcc),
                   [{RemainderScore, Id}|RemainderScoresAcc],
                   RemSeatsAcc - Seats,
                   NewState}
          end,
          {#{},
           [],
           TotalSeats,
           State},
          Scores),
    DrawnIds =
        %% We actually need to sort the list only at first RemainderSeats places, so here is a room for optimization.
        [Id || {_, Id} <- lists:sublist(lists:reverse(lists:sort(RemainderScores)), RemainingSeats)],
    ResultMap =
        lists:foldl(
          fun (Id, ResultMapAcc) ->
                  maps:update_with(
                    Id,
                    fun (OldSeats) -> OldSeats + 1 end,
                    1,
                    ResultMapAcc)
          end,
          WholeSeatsMap,
          DrawnIds),
    lists:sort([{Id, Seats} || {Id, Seats} <- maps:to_list(ResultMap), Seats > 0]).
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: RNG algorithm choice for uniform natural numbers

Raimo Niskanen-2
On Thu, Aug 09, 2018 at 03:27:51PM +0200, Krzysztof Jurewicz wrote:

> > What I actually want to implement is a randomized apportionment scheme, as described on https://rangevoting.org/Apportion.html (“Truly unbiased methods”), but more efficient, like in the code below (not tested).
>
> Tests revealed that the simple algorithm does not satisfy the quota property. Therefore I’ve used a more sophisticated one (code below), which however doesn’t require using a specific interval for drawing integers. Instead of the rand module, I use SHA-256 to generate random integers (taking first n bits). This is very interoperable between languages, though I don’t know whether SHA-256 satisfies all RNG properties desirable in that context. (Also I don’t have a proof that the distribution of apportioned seats is unbiased).
>
> -type score() :: ({Votes :: non_neg_integer(), Id :: any()}).
> -type result() :: {Id :: any(), Seats :: non_neg_integer()}.
>
> -spec apportion(list(score()), pos_integer(), pos_integer(), binary()) -> list(result()).
> apportion(Scores, TotalSeats, VotesSum, State) ->
>     {WholeSeatsMap, RemainderScores, RemainingSeats, _} =
>         lists:foldl(
>           fun (
>             {Votes, Id},
>             {WholeSeatsAcc,
>              RemainderScoresAcc,
>              RemSeatsAcc,
>              StateAcc}) ->
>                   Seats = Votes * TotalSeats div VotesSum,
>                   <<RandInt:32, _/binary>> = NewState = crypto:hash(sha256, StateAcc),

Have you tried to change line above to
                    RandInt = rand:uniform(1 bsl 32),
and not add 1 to RandInt below?

To see if the problem you had was about the random number generator or in
the surrounding code...

>                   RemainderScore = (Votes * TotalSeats rem VotesSum) * (RandInt + 1),
>                   {maps:put(Id, Seats, WholeSeatsAcc),
>                    [{RemainderScore, Id}|RemainderScoresAcc],
>                    RemSeatsAcc - Seats,
>                    NewState}
>           end,
>           {#{},
>            [],
>            TotalSeats,
>            State},
>           Scores),
>     DrawnIds =
>         %% We actually need to sort the list only at first RemainderSeats places, so here is a room for optimization.
>         [Id || {_, Id} <- lists:sublist(lists:reverse(lists:sort(RemainderScores)), RemainingSeats)],
>     ResultMap =
>         lists:foldl(
>           fun (Id, ResultMapAcc) ->
>                   maps:update_with(
>                     Id,
>                     fun (OldSeats) -> OldSeats + 1 end,
>                     1,
>                     ResultMapAcc)
>           end,
>           WholeSeatsMap,
>           DrawnIds),
>     lists:sort([{Id, Seats} || {Id, Seats} <- maps:to_list(ResultMap), Seats > 0]).

--

/ Raimo Niskanen, Erlang/OTP, Ericsson AB
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: RNG algorithm choice for uniform natural numbers

Raimo Niskanen-11
In reply to this post by Krzysztof Jurewicz
On Thu, Aug 09, 2018 at 03:27:51PM +0200, Krzysztof Jurewicz wrote:

> > What I actually want to implement is a randomized apportionment scheme, as described on https://rangevoting.org/Apportion.html (“Truly unbiased methods”), but more efficient, like in the code below (not tested).
>
> Tests revealed that the simple algorithm does not satisfy the quota property. Therefore I’ve used a more sophisticated one (code below), which however doesn’t require using a specific interval for drawing integers. Instead of the rand module, I use SHA-256 to generate random integers (taking first n bits). This is very interoperable between languages, though I don’t know whether SHA-256 satisfies all RNG properties desirable in that context. (Also I don’t have a proof that the distribution of apportioned seats is unbiased).
>
> -type score() :: ({Votes :: non_neg_integer(), Id :: any()}).
> -type result() :: {Id :: any(), Seats :: non_neg_integer()}.
>
> -spec apportion(list(score()), pos_integer(), pos_integer(), binary()) -> list(result()).
> apportion(Scores, TotalSeats, VotesSum, State) ->
>     {WholeSeatsMap, RemainderScores, RemainingSeats, _} =
>         lists:foldl(
>           fun (
>             {Votes, Id},
>             {WholeSeatsAcc,
>              RemainderScoresAcc,
>              RemSeatsAcc,
>              StateAcc}) ->
>                   Seats = Votes * TotalSeats div VotesSum,
>                   <<RandInt:32, _/binary>> = NewState = crypto:hash(sha256, StateAcc),

Have you tried to change line above to
                    RandInt = rand:uniform(1 bsl 32),
and not add 1 to RandInt below?
To see if the problem you had was about the random number generator or in
the surrounding code...

     

>                   RemainderScore = (Votes * TotalSeats rem VotesSum) * (RandInt + 1),
>                   {maps:put(Id, Seats, WholeSeatsAcc),
>                    [{RemainderScore, Id}|RemainderScoresAcc],
>                    RemSeatsAcc - Seats,
>                    NewState}
>           end,
>           {#{},
>            [],
>            TotalSeats,
>            State},
>           Scores),
>     DrawnIds =
>         %% We actually need to sort the list only at first RemainderSeats places, so here is a room for optimization.
>         [Id || {_, Id} <- lists:sublist(lists:reverse(lists:sort(RemainderScores)), RemainingSeats)],
>     ResultMap =
>         lists:foldl(
>           fun (Id, ResultMapAcc) ->
>                   maps:update_with(
>                     Id,
>                     fun (OldSeats) -> OldSeats + 1 end,
>                     1,
>                     ResultMapAcc)
>           end,
>           WholeSeatsMap,
>           DrawnIds),
>     lists:sort([{Id, Seats} || {Id, Seats} <- maps:to_list(ResultMap), Seats > 0]).

--

/ Raimo Niskanen, Erlang/OTP, Ericsson AB
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: RNG algorithm choice for uniform natural numbers

Krzysztof Jurewicz
In reply to this post by Raimo Niskanen-2
> Have you tried to change line above to
>                     RandInt = rand:uniform(1 bsl 32),
> and not add 1 to RandInt below?
>
> To see if the problem you had was about the random number generator or in
> the surrounding code...

Let’s assume that we have 3 seats and the following voting results (in order):

A — 1 vote;
B — 1 vote;
C — 100 votes.

Following the simple algorithm, A has 1⁄102 chance of getting a seat. Let’s assume that A was lucky. Then we have 2 remaining seats and the following results remaining:

B — 1 vote;
C — 100 votes.

B has then 1⁄101 chance of getting a seat. Let’s assume that it was lucky. Then we have one remaining seat which goes to C. Final results:

A — 1 seat;
B — 1 seat;
C — 1 seat.

But according to quota property, C should get either 2 or 3 seats. This flaw is inherent to the simple apportionment algorithm and independent of the random number generator.
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions