

I’m looking for an algorithm to perform a transparent, reproducible generation of an uniform sequence of natural numbers. There are three RNG algorithms documented in the rand module:
• Xoroshiro116+. According to the linked page at http://xoshiro.di.unimi.it/ , it is suitable only for floating point numbers, so it doesn’t fit. Moreover, the page actually describes an algorithm with 64bit precision, while rand implements a variant with 56bit precision. It looks like the latter is implemented only in Erlang, so there is little practical interoperability between languages.
• Xorshift1024*. It fails the BigCrush test according to https://lemire.me/blog/2017/09/15/thexorshift1024randomnumbergeneratorfailsbigcrush/ .
• Xorshift116+. The comment about 58bit precision being rare applies (according to https://github.com/jj1bdx/emprng/ , “the original exsplus was Xorshift128+”). Xorshift128+ fails the BigCrush test, so I presume that xorshift116+ fails it too.
So… is there any particular reason for the rand module to not implement xoshiro256**, which is more widespread than 58bit precision algorithms and, according to its creators, “has excellent (subns) speed, a state space (256 bits) that is large enough for any parallel application, and it passes all tests we are aware of”?
_______________________________________________
erlangquestions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlangquestions


On Thu, Aug 02, 2018 at 02:19:17PM +0200, Krzysztof Jurewicz wrote:
> I’m looking for an algorithm to perform a transparent, reproducible generation of an uniform sequence of natural numbers. There are three RNG algorithms documented in the rand module:
>
> • Xoroshiro116+. According to the linked page at http://xoshiro.di.unimi.it/ , it is suitable only for floating point numbers, so it doesn’t fit. Moreover, the page actually describes an algorithm with 64bit precision, while rand implements a variant with 56bit precision. It looks like the latter is implemented only in Erlang, so there is little practical interoperability between languages.
> • Xorshift1024*. It fails the BigCrush test according to https://lemire.me/blog/2017/09/15/thexorshift1024randomnumbergeneratorfailsbigcrush/ .
It is a known fact that the lowest bits of Xors... generators with * and +
scramblers are of lower quality than the rest. If he had used the high 32
bits I predict that it would have passed.
The author of Xoroshiro, etc, thinks that this is a small problem.
> • Xorshift116+. The comment about 58bit precision being rare applies (according to https://github.com/jj1bdx/emprng/ , “the original exsplus was Xorshift128+”). Xorshift128+ fails the BigCrush test, so I presume that xorshift116+ fails it too.
I would also guess so.
>
> So… is there any particular reason for the rand module to not implement xoshiro256**, which is more widespread than 58bit precision algorithms and, according to its creators, “has excellent (subns) speed, a state space (256 bits) that is large enough for any parallel application, and it passes all tests we are aware of”?
Any 64bit word algorithm runs about 2..3 (or more) times slower
than a 58bit output word one, on 64bit Erlang. That's why.
Implementation of a new algorithm should be fairly easy (mostly
cutandpaste). Unfortunately the bignum operations that 64bit
calculations provoke hits performance a lot.
We do not have any generator that uses the new ** scrambler that makes
all tests pass BigCrush (with the lowest bits), yet, but...
I have a pull request for Xoroshiro928** that is a reworked Xoroshiro1024**
with 58bit word size, exactly because of speed reasons. For the next
step I was thinking about rewriting Xoroshiro256** to 58bit, but the
resulting algorighm would not be much faster than Xoroshiro928**,
it would only have smaller state space, hopefully. So I am not certain
that it would be worth implementing.
https://github.com/erlang/otp/pull/1857I have never thought about implementing the new 64bit word size generators
since nobody has asked for interplatform compatibility before, and you still
have to be careful with bit compatibility. It is only when you request
the generator word size range that you get the raw numbers. And for
floating point if you are lucky that both platforms use the same bits
(low/high/middle).

/ Raimo Niskanen, Erlang/OTP, Ericsson AB
_______________________________________________
erlangquestions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlangquestions


On Thu, Aug 02, 2018 at 06:10:24PM +0200, Raimo Niskanen wrote:
> On Thu, Aug 02, 2018 at 02:19:17PM +0200, Krzysztof Jurewicz wrote:
> > I’m looking for an algorithm to perform a transparent, reproducible generation of an uniform sequence of natural numbers. There are three RNG algorithms documented in the rand module:
> >
: :
>
> I have a pull request for Xoroshiro928** that is a reworked Xoroshiro1024**
> with 58bit word size, exactly because of speed reasons. For the next
> step I was thinking about rewriting Xoroshiro256** to 58bit, but the
> resulting algorighm would not be much faster than Xoroshiro928**,
> it would only have smaller state space, hopefully. So I am not certain
> that it would be worth implementing.
>
> https://github.com/erlang/otp/pull/1857To get you platform interoperability I could publish the C code I used as
reference for writing Xoroshiro928** in Erlang.
Yes there is a C implementation of it!
>
> I have never thought about implementing the new 64bit word size generators
> since nobody has asked for interplatform compatibility before, and you still
> have to be careful with bit compatibility. It is only when you request
> the generator word size range that you get the raw numbers. And for
> floating point if you are lucky that both platforms use the same bits
> (low/high/middle).
>
> 
>
> / Raimo Niskanen, Erlang/OTP, Ericsson AB

/ Raimo Niskanen, Erlang/OTP, Ericsson AB
_______________________________________________
erlangquestions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlangquestions


a "uniform sequence of natural numbers". There are infinitely many natural numbers. The generators you mention produce integers of fixed size. What exactly is it you want? Do you want something like
_______________________________________________
erlangquestions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlangquestions


Raimo Niskanen writes:
>> • Xorshift1024*. It fails the BigCrush test according to https://lemire.me/blog/2017/09/15/thexorshift1024randomnumbergeneratorfailsbigcrush/ .
>
> It is a known fact that the lowest bits of Xors... generators with * and +
> scramblers are of lower quality than the rest. If he had used the high 32
> bits I predict that it would have passed.
>
> The author of Xoroshiro, etc, thinks that this is a small problem.
What I actually want to implement is a randomized apportionment scheme, as described on https://rangevoting.org/Apportion.html (“Truly unbiased methods”), but more efficient, like in the code below (not tested). Therefore if the test failure won’t cause a noticeable bias, then it indeed shouldn’t be a problem.
type score() :: ({Votes :: non_neg_integer(), Id :: any()}).
spec apportion(list(score()), pos_integer(), pos_integer()) > list({Id :: any(), Seats :: non_neg_integer()}).
apportion(Scores, TotalSeats, VotesSum) >
apportion(Scores, TotalSeats, VotesSum, []).
apportion([], 0, 0, Acc) >
Acc;
apportion([{Votes, Id}ScoresTail], TotalSeats, VotesSum, Acc) >
WholeSeats = Votes * TotalSeats div VotesSum,
DrawnSeats =
case rand:uniform(VotesSum) =< Votes * TotalSeats rem VotesSum of
true >
1;
false >
0
end,
Seats = WholeSeats + DrawnSeats,
apportion(ScoresTail, TotalSeats  Seats, VotesSum  Votes, [{Id, Seats}Acc]).
> Any 64bit word algorithm runs about 2..3 (or more) times slower
> than a 58bit output word one, on 64bit Erlang. That's why.
Sounds plausible. Though such differences likely won’t matter in my case, it doesn’t rule out choosing a 58bit algorithm. The interplatform compatibility issue is hypothetical for now.
> I have never thought about implementing the new 64bit word size generators
> since nobody has asked for interplatform compatibility before, and you still
> have to be careful with bit compatibility. It is only when you request
> the generator word size range that you get the raw numbers. And for
> floating point if you are lucky that both platforms use the same bits
> (low/high/middle).
If I understand correctly, you basically mean that different RNG libraries can convert the same entropy bits to numbers differently.
_______________________________________________
erlangquestions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlangquestions


> What I actually want to implement is a randomized apportionment scheme, as described on https://rangevoting.org/Apportion.html (“Truly unbiased methods”), but more efficient, like in the code below (not tested).
Tests revealed that the simple algorithm does not satisfy the quota property. Therefore I’ve used a more sophisticated one (code below), which however doesn’t require using a specific interval for drawing integers. Instead of the rand module, I use SHA256 to generate random integers (taking first n bits). This is very interoperable between languages, though I don’t know whether SHA256 satisfies all RNG properties desirable in that context. (Also I don’t have a proof that the distribution of apportioned seats is unbiased).
type score() :: ({Votes :: non_neg_integer(), Id :: any()}).
type result() :: {Id :: any(), Seats :: non_neg_integer()}.
spec apportion(list(score()), pos_integer(), pos_integer(), binary()) > list(result()).
apportion(Scores, TotalSeats, VotesSum, State) >
{WholeSeatsMap, RemainderScores, RemainingSeats, _} =
lists:foldl(
fun (
{Votes, Id},
{WholeSeatsAcc,
RemainderScoresAcc,
RemSeatsAcc,
StateAcc}) >
Seats = Votes * TotalSeats div VotesSum,
<<RandInt:32, _/binary>> = NewState = crypto:hash(sha256, StateAcc),
RemainderScore = (Votes * TotalSeats rem VotesSum) * (RandInt + 1),
{maps:put(Id, Seats, WholeSeatsAcc),
[{RemainderScore, Id}RemainderScoresAcc],
RemSeatsAcc  Seats,
NewState}
end,
{#{},
[],
TotalSeats,
State},
Scores),
DrawnIds =
%% We actually need to sort the list only at first RemainderSeats places, so here is a room for optimization.
[Id  {_, Id} < lists:sublist(lists:reverse(lists:sort(RemainderScores)), RemainingSeats)],
ResultMap =
lists:foldl(
fun (Id, ResultMapAcc) >
maps:update_with(
Id,
fun (OldSeats) > OldSeats + 1 end,
1,
ResultMapAcc)
end,
WholeSeatsMap,
DrawnIds),
lists:sort([{Id, Seats}  {Id, Seats} < maps:to_list(ResultMap), Seats > 0]).
_______________________________________________
erlangquestions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlangquestions


On Thu, Aug 09, 2018 at 03:27:51PM +0200, Krzysztof Jurewicz wrote:
> > What I actually want to implement is a randomized apportionment scheme, as described on https://rangevoting.org/Apportion.html (“Truly unbiased methods”), but more efficient, like in the code below (not tested).
>
> Tests revealed that the simple algorithm does not satisfy the quota property. Therefore I’ve used a more sophisticated one (code below), which however doesn’t require using a specific interval for drawing integers. Instead of the rand module, I use SHA256 to generate random integers (taking first n bits). This is very interoperable between languages, though I don’t know whether SHA256 satisfies all RNG properties desirable in that context. (Also I don’t have a proof that the distribution of apportioned seats is unbiased).
>
> type score() :: ({Votes :: non_neg_integer(), Id :: any()}).
> type result() :: {Id :: any(), Seats :: non_neg_integer()}.
>
> spec apportion(list(score()), pos_integer(), pos_integer(), binary()) > list(result()).
> apportion(Scores, TotalSeats, VotesSum, State) >
> {WholeSeatsMap, RemainderScores, RemainingSeats, _} =
> lists:foldl(
> fun (
> {Votes, Id},
> {WholeSeatsAcc,
> RemainderScoresAcc,
> RemSeatsAcc,
> StateAcc}) >
> Seats = Votes * TotalSeats div VotesSum,
> <<RandInt:32, _/binary>> = NewState = crypto:hash(sha256, StateAcc),
Have you tried to change line above to
RandInt = rand:uniform(1 bsl 32),
and not add 1 to RandInt below?
To see if the problem you had was about the random number generator or in
the surrounding code...
> RemainderScore = (Votes * TotalSeats rem VotesSum) * (RandInt + 1),
> {maps:put(Id, Seats, WholeSeatsAcc),
> [{RemainderScore, Id}RemainderScoresAcc],
> RemSeatsAcc  Seats,
> NewState}
> end,
> {#{},
> [],
> TotalSeats,
> State},
> Scores),
> DrawnIds =
> %% We actually need to sort the list only at first RemainderSeats places, so here is a room for optimization.
> [Id  {_, Id} < lists:sublist(lists:reverse(lists:sort(RemainderScores)), RemainingSeats)],
> ResultMap =
> lists:foldl(
> fun (Id, ResultMapAcc) >
> maps:update_with(
> Id,
> fun (OldSeats) > OldSeats + 1 end,
> 1,
> ResultMapAcc)
> end,
> WholeSeatsMap,
> DrawnIds),
> lists:sort([{Id, Seats}  {Id, Seats} < maps:to_list(ResultMap), Seats > 0]).

/ Raimo Niskanen, Erlang/OTP, Ericsson AB
_______________________________________________
erlangquestions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlangquestions


On Thu, Aug 09, 2018 at 03:27:51PM +0200, Krzysztof Jurewicz wrote:
> > What I actually want to implement is a randomized apportionment scheme, as described on https://rangevoting.org/Apportion.html (“Truly unbiased methods”), but more efficient, like in the code below (not tested).
>
> Tests revealed that the simple algorithm does not satisfy the quota property. Therefore I’ve used a more sophisticated one (code below), which however doesn’t require using a specific interval for drawing integers. Instead of the rand module, I use SHA256 to generate random integers (taking first n bits). This is very interoperable between languages, though I don’t know whether SHA256 satisfies all RNG properties desirable in that context. (Also I don’t have a proof that the distribution of apportioned seats is unbiased).
>
> type score() :: ({Votes :: non_neg_integer(), Id :: any()}).
> type result() :: {Id :: any(), Seats :: non_neg_integer()}.
>
> spec apportion(list(score()), pos_integer(), pos_integer(), binary()) > list(result()).
> apportion(Scores, TotalSeats, VotesSum, State) >
> {WholeSeatsMap, RemainderScores, RemainingSeats, _} =
> lists:foldl(
> fun (
> {Votes, Id},
> {WholeSeatsAcc,
> RemainderScoresAcc,
> RemSeatsAcc,
> StateAcc}) >
> Seats = Votes * TotalSeats div VotesSum,
> <<RandInt:32, _/binary>> = NewState = crypto:hash(sha256, StateAcc),
Have you tried to change line above to
RandInt = rand:uniform(1 bsl 32),
and not add 1 to RandInt below?
To see if the problem you had was about the random number generator or in
the surrounding code...
> RemainderScore = (Votes * TotalSeats rem VotesSum) * (RandInt + 1),
> {maps:put(Id, Seats, WholeSeatsAcc),
> [{RemainderScore, Id}RemainderScoresAcc],
> RemSeatsAcc  Seats,
> NewState}
> end,
> {#{},
> [],
> TotalSeats,
> State},
> Scores),
> DrawnIds =
> %% We actually need to sort the list only at first RemainderSeats places, so here is a room for optimization.
> [Id  {_, Id} < lists:sublist(lists:reverse(lists:sort(RemainderScores)), RemainingSeats)],
> ResultMap =
> lists:foldl(
> fun (Id, ResultMapAcc) >
> maps:update_with(
> Id,
> fun (OldSeats) > OldSeats + 1 end,
> 1,
> ResultMapAcc)
> end,
> WholeSeatsMap,
> DrawnIds),
> lists:sort([{Id, Seats}  {Id, Seats} < maps:to_list(ResultMap), Seats > 0]).

/ Raimo Niskanen, Erlang/OTP, Ericsson AB
_______________________________________________
erlangquestions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlangquestions


> Have you tried to change line above to
> RandInt = rand:uniform(1 bsl 32),
> and not add 1 to RandInt below?
>
> To see if the problem you had was about the random number generator or in
> the surrounding code...
Let’s assume that we have 3 seats and the following voting results (in order):
A — 1 vote;
B — 1 vote;
C — 100 votes.
Following the simple algorithm, A has 1⁄102 chance of getting a seat. Let’s assume that A was lucky. Then we have 2 remaining seats and the following results remaining:
B — 1 vote;
C — 100 votes.
B has then 1⁄101 chance of getting a seat. Let’s assume that it was lucky. Then we have one remaining seat which goes to C. Final results:
A — 1 seat;
B — 1 seat;
C — 1 seat.
But according to quota property, C should get either 2 or 3 seats. This flaw is inherent to the simple apportionment algorithm and independent of the random number generator.
_______________________________________________
erlangquestions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlangquestions

