top of page
Search

# Setting the Seed – Random Number Generation in Stata

Updated: Oct 5, 2022

Random number generation is often used in statistical sampling, simulation and statistical analysis. The use of randomness helps to prevent bias and, in the case of statistical sampling, prevents observational error (systemic bias).

Stata’s main random number generator is actually a pseudorandom number generator, the 64-bit Mersenne Twister. This number generator uses deterministic algorithms to generate numbers that can pass for random, but are not perfectly random. Most statistical applications of randomness will use a pseudorandom number generator, however some specific applications such as cryptography would require a proper random number generator.

Stata's pseudorandom number generator allows you to set the "seed" used to start the RNG process so that it is perfectly reproducible. Stata also allows you to track the RNG "state", which is the state the number generator is in before each command is run. Even though you set the "seed" at the beginning, each time a command is run the RNG "state" within that seed changes. This is to maintain the randomness of each subsequent command. For this reason, once you've set the "seed" if you do not save the RNG "state" after each command is run, then to maintain reproducibility you would need to run all the commands in the same order.

Here I will show you how to set a seed for the number generator, how to record the state of the number generator for reproducibility, and show several examples of the different types of random variables you can create.

## How to Use:

To set the seed

`set seed number`

To save the RNG state to a macro

`local state_macro_name = "`c(rngstate)'"`

To reset the current RNG state to one you saved in a macro

`set rngstate `state_macro_name'`

Once you have set your seed or RNG state, you can run your RNG commands reproducibly.

## Worked Example 1:

In this example, I am going to set the seed and create two random variables with numbers between 0 and 1. I will also save the RNG state before generating the second variable, which I will use in a later example.

In the command pane:

```set obs 5
set seed 123456
generate x = runiform()
local mstate = "`c(rngstate)'"
generate y = runiform()
list x y```

## Worked Example 2:

In this example I am going to generate two random variables with numbers between 1 and 100. I am then going to reset the RNG state using the "mstate" macro from the previous example, which will allow me to perfectly reproduce the variable "y" from the previous example.

In the command pane:

```clear
set obs 5
set seed 456789
generate a = runiformint(1,100)
generate b = runiformint(1,100)
set rngstate `mstate'
generate y = runiform()
list a b y```

If you compare the "y" variable generated here to the one we generated in the previous example, they are exactly the same. The "y" variable in both examples is the same because in the previous example we saved the RNG state before we generated the "y" variable, and then in this example we reapplied this saved state before re-generating the "y" variable.

## Worked Example 3:

In this example, I am going to demonstrate how to create a random date variable and a random date/time variable.

In the command pane:

```set obs 5
set seed 789456
generate date = runiformint(td(1 jan 2020),today())
format date %td
generate date2 = runiformint(td(1 jan 1900),td(1 jan 1945))
format date2 %td
generate datetime = runiformint(tc(1 jan 2020 00:00:00.000),now())
format datetime %tc
generate datetime2 = runiformint(tc(1 jan 1900 00:00:00.000), tc(1 jan 1945 23:59:59.999))
format datetime2 %tc
list date*```

The "today()" function used in this example will output the number of days between the 1 January 1960 and the day when the command is run. The "now()" function used in this example will output the number of milliseconds between the 1 January 1960 and the day and time when the command was run.

Here we demonstrated the use of Stata's pseudorandom number generator to create randomly-generated variables. You can also use Stata's "set seed" and "set state" commands to perform both random sampling and random simulation (such as monte carlo simulations) perfectly reproducibly.

Stata's RNG functions are used in several chapters of the Stata Press book An Introduction to Time Series Using Stata, Revised Edition. If you are interested in Time Series and the use of RNG in Stata I highly recommend this book.