The estat esize Command - Linear Regression Post-estimation

The estat esize command can be used to calculate effect sizes for a linear regression. The effect size measures the size of the association between variables in the model. A bigger effect size means a stronger association, and a smaller effect size means a weaker association. This test reports eta-squared estimates by default, which are equivalent to R-squared estimates. You can also report epsilon-squared estimates, which are equivalent to adjusted-R-squared estimates and are more appropriate for a multiple linear regression.

This test can be important in helping you to identify whether your model gives meaningful results. A statistically significant linear regression model tells you there is an association between your dependent and independent variables, but it does not tell you how strong that association is. If you have a very small effect size for your independent variables, this tells you that the association or correlation between your independent and dependent variables is very weak. A weak association usually means your model is not particularly useful even if it is significant.

It is also important to note that effect size is independent of sample size, something that is not true of significance tests. It is for this reason that some believe effect sizes to be a more useful measure of the validity of a model.

How to Use:

*Perform linear regression, then

estat esize

OR to display epsilon estimates (e.g. for multiple regressions)

estat esize, epsilon

OR to display omega estimates

estat esize, omega

OR to specify a different confidence interval (e.g. 0.1 = 90 instead of 0.05 = 95)

estat esize, level(90)

Worked Example:

In this example I use the auto dataset. I am going to generate a linear regression, and then use estat esize to show the effect sizes with Eta-squared and Epsilon-squared estimates. In the command pane I type the following:

sysuse auto, clear
regress price mpg rep78
estat esize
estat esize, epsilon

This gives the following output in Stata:

Here you can see the Stata output for the regression followed by the tests for effect sizes. The default for estat esize is to display Eta-squared estimates with a 95% (p-value = 0.05) confidence interval, which is our first test. Our second test displays Epsilon-squared estimates by adding the epsilon option to the estat esize command.

The estat esize output shows the Eta-squared or Epsilon-squared influence estimates for the whole model, followed by the partial Eta-squared or partial Epsilon-squared influence estimates for each individual independent variable.

Our first effect size test with Eta-squared estimates shows the variation explained by the whole model is about ~25.1% of the total variation observed. Because Eta-squared is similar to R-squared, this value should be similar to the R-squared value reported in the initial regression. If we interpret as percentages, the mpg variable alone is explaining about ~25.09% of the observed variation, and the rep78 variable alone is explaining about ~0.54% of the observed variation.

When testing linear regression models the Eta-squared estimates are best used only on single linear regressions. Multiple linear regressions give the same bias to Eta-squared as to R-squared, and therefore need adjusted estimates reported instead. This is the purpose of our second test using Epsilon-squared estimates, which is more appropriate for this regression because it is a multiple regression.

Epsilon-squared estimates are equivalent to adjusted-R-squared estimates in a regression model. This means our Epsilon-squared estimate of ~22.83% for the whole model should be similar to the adjusted-R-squared value given with the initial regression. If we interpret as percentages, the mpg variable alone is explaining about ~23.96% of the observed variation, and the rep78 variable alone is explaining about ~0.4% of the observed variation.

There are a couple of things to note here. Firstly, the rep78 variable has a very small effect size of less than 1%. If we also look at the regression values we see that, even though the whole regression model is statistically significant (p-value = 0.0001), the rep78 variable is not significant (p-value = 0.056). This tells us that the rep78 variable is both statistically not-significant and practically insignificant. It is unlikely that it is affecting the dependent variable, and even if it were the effect it is having is quite small.

The second thing to note is the confidence intervals shown for the eta-squared and partial eta-squared estimates. The lower confidence interval for the rep78 variable is shown as “.” which in Stata usually denotes a missing observation or value. In this case Stata is telling you it cannot give you a lower confidence interval because effect sizes cannot be negative. The variable is either having an effect, or it is not. When the lower confidence interval is calculated as less than zero it is simply not shown, indicating that an effect of 0 (no effect) is within the error margin for that eta-squared estimate. This missing lower confidence interval, in conjunction with a statistically not-significant regression p-value, tells us that the rep78 variable is not influencing the dependent variable much compared to the other variable in the model.

IMPORTANT Note: The individual variable terms are reported as partial Eta-squared or partial Epsilon-squared, which is calculated differently than normal Eta-squared or Epsilon-squared. These individual variable values can sum to more than 1 or nowhere near 1, and are not easily interpreted. Although we have interpreted them above as percentages, this can be problematic.

The effect size for individual variables is best used as a guide. In this example, the small effect size for rep78 indicates the rep78 variable is having a small effect compared to the other independent variables in the model.

If I were to replace the rep78 variable with the weight variable, this change drops the mpg partial Eta-squared estimate from 0.2509 to just 0.00463. It is unlikely that the mpg variable is all of a sudden having next to no impact on the dependent variable compared with our first regression. Instead, the relative effect size of mpg has dropped, as the weight variable and the weight-mpg variable interaction are contributing more to the model.