The Recode Command - Data Management in Stata

The recode command in Stata can be used to convert a continuous variable into a categorical variable, and can also be used to condense a categorical variable. By default this command will overwrite the original variable. It is recommended to always use the generate() option with the recode command, so that your original variable remains intact. It is good practice to generate new variables wherever possible, rather than overwriting any variables you already have. Creating new variables allows you to create multiple different recoded variables to test which one you prefer, and it allows you to easily recover from mistakes. Here we will look at a couple of different scenarios where you might use recode to create a new categorical variable.

How to Use:

Where "rule1" or "rule2" etc. is in the above code, you substitute one of the following rules. The letters represent different numbers:

Worked Example 1:

In all of the examples I am going to use the Stata auto example dataset. For this first example I am going to recode the headroom variable. To start with, I am going to have a look at how this variable is distributed to help decide how I am going to recode it. In the command pane:

This produces the following output:

This variable appears to increase in increments of 0.5 from 1.5 to 5. I would like to convert this variable so that it increases in increments of 1 instead of 0.5. To do this, I am going to recode the half values down. In the command pane:

This produces the following output:

As you can see from the tabulate command, I have successfully created the variable new_headroom with my recoded values of headroom.

Worked Example 2:

For this example I am going to recode the variable mpg. My new variable will categorise mileage on a scale from 1 (poor/inefficient) to 5 (excellent/very efficient), in the same vein as the rep78 variable. In the command pane:

This produces the following output:

Our variable has successfully been recoded.

Worked Example 3:

In this example, I am going to create a new categorical variable by recoding price. My new variable will contain three categories, 1 (cheap), 2 (average), and 3 (expensive). I am then going to label these categories. In the command pane:

This produces the following output:

Our variable has been successfully recoded. I used the macros generated by the summarize command to separate the price of cars into three distinct categories.

For continuous variables that you want to separate into a set number of distinct groups containing the same intervals I suggest using the egen cut() command with the group() option in place of recode. For example, if I wanted to instead separate the price variable into ten distinct "groups" or "bins", I would use the egen cut() command and supply the option group(10).

For more information on recode and other Stata data management commands, I recommend the book Data Management Using Stata: A Practical Handbook (2nd Ed.). It is a comprehensive guide on how to get your raw data ready for analysis.

511 views0 comments

Recent Posts

See All