Manage Your Memory in Stata

Updated: Apr 21, 2020

When you load a dataset into Stata, it keeps the data in memory. This makes Stata fast and safe for user interaction. However, when you add variables derived from other variables, the dataset can become larger. As the dataset expands, it will still need to fit in to the available memory. When dealing with expanding datasets you may want to reduce the size of your dataset to reduce the amount of memory used. Today I would like to introduce three methods to reduce the size of your datasets in Stata. The dataset I used is automobile dataset.

NOTE: Please make sure that the file must either be in the present working directory, or you will need to specify the full path when using the dataset. To demonstrate, I will specify the full path in every example.

Method 1: The compress command

You can use compress with a varlist, to specify certain variables. If you do not give a varlist, it acts on all the variables. The compress command works on both string and numeric variables.

NOTE: Stata will not reduce the precision of your data by compressing the data.

Let's pretend that the automobile dataset is extremely large with several thousand variables:

Please note, to specify the full path of my dataset, I can also use code:

If you are using a Mac, the code is:

NOTE: Memory compression is a concern only when you are dealing with big datasets.

Method 2: The use command

Another way to manage your memory is to only load the necessary variables into Stata. Sometimes a dataset can have thousands of variables and billions of observations. You may only care about a few variables or a range of observations for your analysis. The command use is an effective way to achieve this. Again, let's pretend that the automobile dataset is extremely large with several thousands variables, but for our analysis, we may only want the variables mpg (Mileage) and weight. To load just these two variables, I would type:

If you are using a Mac, the code is:

Method 3: The describe command

If you are not sure which variables you want to load into memory, before you choose the variables, it is possible to explore a dataset without loading it into memory. The describe command allows you to do exactly that. I will use the automobile dataset to demonstrate how to use this command. We can see from the below output that Stata gives you the storage type for all the variables:

If your dataset has thousands of variables, you might need to subset the variables list. For example, in the automobile dataset, if I would like to know just the variables that start with "t", I would type:

If you are using a Mac, the code is:

558 views0 comments

Recent Posts

See All