The estat imtest Command - Linear Regression Post-estimation
Updated: Mar 17, 2020
This command performs an information matrix test on a linear regression model, and then an orthogonal decomposition for heteroskedasticity, skewness, and kurtosis tests (Cameron and Trivedi, 1990). There is also an option to add White’s test (1980) for unrestricted forms of heteroskedasticity. If you add White’s test the results should be similar to the heteroskedasticity test performed as part of the orthogonal decomposition.
Heteroskedasticity, as previously discussed in our estat hettest blog post, is when the variances are unequal. One of the underlying assumptions of a linear regression model is that of homoskedasticity, so the presence of heteroskedasticity in any form violates this assumption. The estat imtest test for unrestricted forms of heteroskedasticity is a more general test than either estat hettest or estat szroeter.
Skewness relates to the distribution curve, where your residuals are skewed if the curve is asymmetrical (the left side does not match the right side of the graph). One of the underlying assumptions of a linear regression is that your residuals have a normal distribution, i.e. the curve is symmetrical.
Kurtosis relates to the tails and peak of the distribution curve. Your residuals have kurtosis if the tails are excessively long (often with a low peak), or excessively short (often with a high peak), compared to a normal distribution. Another underlying assumption of a regression model is that of average tail length in the distribution curve.
If your regression model contains heteroskedasticity, skewness or kurtosis to a significant degree then it is violating the underlying assumptions of a linear regression model, and you cannot make any inferences about your data. To make sure a model is not violating any of these assumptions we check using the estat imtest command.
Additional Note: The assumption of normal distribution applies to your residuals only. Your independent and dependent variables can contain both skewness and kurtosis and this will not affect the validity of your model.
How to Use:
*Perform linear regression, then
OR to include White's test
OR for large datasets
In this example I use the auto dataset. I am going to generate a linear regression, and then use estat imtest to check if heteroskedasticity, skewness or kurtosis is present in my regression model. I will use the “white” option with estat imtest so I can also look at White’s test for heteroskedasticity. I would expect White’s test to give a same or similar p-value to the Cameron-Trivedi heteroskedasticity test.
In the command pane I type the following:
This gives me the following output:
Here you see the output from the Stata regression and then the results of the White’s test and the Cameron-Trivedi tests. As expected the p-value for both heteroskedasticity tests is the same. If we use a significance p-value of 0.05 we can see that our regression model does not violate any of the assumptions. However, the p-value for skewness does come quite close to 0.05.
It is important to note that when you have a very high or very low p-value (e.g. 0.4 or 0.0001) it is easy to conclude whether or not the value is statistically significant. When the p-value sits close to your chosen cutoff, defined here as 0.05, it usually warrants further investigation. In this case a p-value for skewness of 0.0588 is close enough to 0.05 for me to want to graph the distribution curve of my residuals. To do this I first have to create a new variable that contains the residuals, using the predict command.
In the command pane I type the following:
This gives me the following graph:
Here you can see my residuals are clearly skewed to the left and are asymmetrical. Visually I evaluate there is skewness in my residuals, however the information-matrix test indicates that this is not statistically significant. It is possible there is a problem with one or more of my variables that is causing my skewed residuals. For example, I may have included an independent variable that is not linearly related to my dependent variable.
If I were to continue with this analysis I would use other post-estimation diagnostics to further evaluate my regression model and each of the variables it contains. This may turn up other issues which, if dealt with appropriately, will improve the distribution of my residuals. Ultimately I am not too concerned about non-normal residuals. If this is the only linear regression assumption violated, the regression is still robust enough to be used. It is much more important to eliminate heteroskedasticity and make sure the relationships between variables are linear.