The Separate Command - Sorting Variables for Analysis

The separate command can be used to split or sort a variable by another variable or an expression. This can be used to categorise data into multiple groups using another variable, or into two distinct groups using an expression. The command will generate new variable(s) based on how the user has split the initial variable. Each new variable will use the original variable as a stub for the new variable names.

How to Use:

separate variable, by(x)

Where x is either the name of another variable, or an expression which will be evaluated to true or false.

Worked Example:

For this example I use the auto dataset, which I load using the command sysuse auto, clear. I want to compare the mileage of the cars in my dataset with their price, to determine whether more expensive cars are more efficient. However, I also want to be able to distinguish between foreign and domestic cars for this analysis, to see how the two categories compare.

In the command pane I type the following:

separate mpg, by(foreign)
scatter mpg0-mpg1 price, ytitle("Miles per Gallon")

This gives me the following output:

Note: The “mpg0-mpg1” part of the scatter command is the range of new variables that have been generated. If the “foreign” variable listed cars from 10 countries instead of simply “domestic” or “foreign”, then that would change to mpg1-mpg10.

The scatter plot shows the mileage (mpg) compared to the price of all the cars in the dataset. The separate command has separated cars based on whether they are foreign or domestic, so you can compare the spread of the two types of car. From this plot you might infer that more expensive cars are actually less efficient at fuel usage than cheaper cars. The spread of cars for domestic and foreign cars is pretty similar, taking into account that there are more domestic than foreign cars in the dataset. Foreign cars could also be seen as more efficient at fuel usage than domestic cars, as the spread of foreign cars sits a bit higher on the graph than the spread of domestic cars.

This command can be a useful way of splitting or sorting data into categories, enabling more meaningful comparisons.