An Overview of Linear Regression Post-Estimation Plots
Updated: Feb 25
We have a blog series about the post-estimation commands that can be used with linear regressions. These commands are used to evaluate various aspects of a linear regression model. While these commands are useful, they can sometimes be difficult to interpret, especially if you prefer to evaluate relationships visually. Fortunately, Stata contains a wide range of post-estimation graph commands that allow you to test your model visually as well. These graphs are run after an estimation command, such as a regression. Below I list some post-estimation graphs commonly used with linear regression models and why you would use them.
This is not an exhaustive list, however it includes all commands listed in the Stata help page as being of ‘special interest’ to linear regression. The help page for regress diagnostic plots can be loaded with the help regress postestimation plots command.
This plots the residuals against the fitted values of the dependent variable. There should be no pattern to the residuals in this plot. If you see a curved pattern this indicates that your dependent variable is not linear. The dependent variable needs to be broadly linear for a linear regression to be the correct model fit.
Any other patterns you see in this plot indicate some form of heteroskedasticity in your regression. For example, if the spread of residuals gets wider as you proceed from one side of the graph to the other. This indicates your residuals are getting larger as the regression line progresses, rather than staying relatively uniform and homoscedastic. One of the underlying assumptions of linear regression is that of homoscedastic residuals. Any heteroskedasticity identified with this plot means your model is problematic.
You can use this plot in conjunction with the estat hettest, estat szroeter and estat imtest commands to identify heteroskedasticity. The graph matrix command can be used to check linear relationships between variables. You can also run a linear regression with only one independent variable to check if the relationship between two variables is linear.
This is an added-variable plot used in multiple regressions. It is also sometimes called a partial-regression leverage plot, a partial regression plot, or an adjusted partial residual plot. When you have a simple linear regression (one independent variable), you are able to graph both variables (dependent and independent) with the regression line all in one graph. This gives a reasonable indication of the relationship between dependent and independent variables. However, you cannot do this with multiple regressions. The avplot command generates a graph that shows the relationship between the dependent variable and one independent variable while holding all other variables constant. It gives an indication of the true relationship between variables in the model.
The avplot added-variable plot is used after a multiple linear regression. It will plot one independent variable against the dependent variable. The slope of an added-variable plot indicates the amount of influence an independent variable is having on the dependent variable. These plots are also useful for identifying outliers. Use the avplots command to plot all added-variable plots in one graph image. You can use this plot in conjunction with the dfbeta and predict commands to identify outliers. The predict command is used to look at several different influence statistics. Check out the linked tech tip for more information on how to use these.
This is a component-plus-residual plot. This plot is used to identify non-linearity in the independent variables of a multiple regression model. The scatter and the slope of the line should look broadly linear in this plot. If it is not then the independent variable you are testing is not linear. All variables in a linear regression should be linear. You may wish to add a lowess smoothed line or a median spline. These can assist in detecting non-linearity in a scatter that appears somewhat linear.
You can use this plot in conjunction with estat ovtest to test for non-linearity. To check for non-linearity in a single independent variable using estat ovtest, run a simple linear regression with your dependent variable and the independent variable you are testing. Then run estat ovtest.
This is an augmented component-plus-residual plot. In a 1986 article by C.L. Mallows it is suggested that this plot is a bit better at detecting non-linearity than its non-augmented counterpart above. This graph has all the same options as the normal component-residual plot, so you may find a lowess or median spline line assists in identifying non-linearity. As with cprplot you can use this plot in conjunction with estat ovtest to identify non-linearity in your independent variables more easily.
This independent variable plot graphs the residuals against an independent variable. It is a good plot for identifying regression assumption violations. If all linear regression assumptions are correct there should be no pattern to the graph. The interpretation of this plot is similar to the rvfplot. Any pattern is a problem, but a curved pattern indicates a non-linear relationship and some other patterns (such as cone or reverse-cone) indicate heteroskedasticity. As with rvfplot you can use estat hettest, estat szroeter and estat imtest to look for heteroskedasticity, and graph matrix or a simple linear regression is helpful when looking for linear relationships between variables.
This plots leverage against normalized squared residuals (known as an L-R plot). This plot is useful in determining which individual observations have high leverage or large residual values. Leverage is on the Y-axis, and the red vertical line represents the average leverage across the dataset. Any point that is to the right of the red line has high leverage, and the points that are quite far to the right are cause for concern. Residual squared values are along the X-axis, and the red horizontal line represents the average squared residual value across the dataset. Any point that is above the red line has a large residual value, and the points that go up quite high are cause for concern.
An observation with high leverage is one that has no close neighbouring observations. Because this lonely value has no neighbours it will naturally pull the regression line towards it, influencing the slope of the line. An observation with a large squared residual indicates that the predicted value based on the regression line was a lot higher or a lot lower than the actual observed value. This indicates outliers in the data. You can use this plot in conjunction with the dfbeta and predict commands to identify outliers or overly influential or leveraged values.
These are just a few of Stata’s many post-estimation plots. The type of regression you run will determine which post-estimation plots are most useful to you. For a list of post-estimation plots for your regression you should check the post-estimation plot help file for that regression. To access, type help command postestimation plots or they may also be listed under the overall help file with help command postestimation. For example, help regress postestimation plots will bring up all linear regression diagnostic plots, but this will not work for the logistic command. The logistic plots are recorded with all the other postestimation commands under help logistic postestimation.