Why do people say there is modulo bias when using a random number generator?

Question

Anonymous User · Answer

So rand() is a pseudo-random number generator which chooses a natural number between 0 and RAND_MAX, which is a constant defined in cstdlib (see this article for a general overview on rand()).

Now what happens if you want to generate a random number between say 0 and 2. For the sake of explanation, let's say RAND_MAX is 10 and I decide to generate a random number between 0 and 2 by calling rand()%3. However, rand()%3 does not produce the numbers between 0 and 2 with equal probability! When rand() returns 0, 3, 6, or 9, rand()%3 == 0. When rand() returns 1, 4, 7, or 10, rand()%3 == 1. When rand() returns 2, 5, or 8, rand()%3 == 2. Now if we analyze this statistically, we very quickly see that the probability of getting a 0 is 4/11, 1 is 4/11 but 2 is 3/11. This does not generate the numbers between 0 and 2 with equal probability. Of course for small ranges this might not be the biggest issue but for a larger range this could skew the distribution, biasing the smaller numbers.

So when does rand()%n return a range of numbers from 0 to n-1 with equal probability? When RAND_MAX%n == n - 1. In this case, along with our earlier assumption rand() does return a number between 0 and RAND_MAX with equal probability, the modulo classes of n would also be equally distributed.

So how do we solve this problem? One way is to keep generating random numbers until you get a number in your desired range:

int x; 
do {
    x = rand();
} while (x >= n);

Hope that helps everyone!

interview

Why do people say there is modulo bias when using a random number generator?

Anonymous User

Can you answer this question?

1 Answers

Liked By