- Laura Whiting

# An Overview of Linear Regression Post-Estimation Plots

Updated: Feb 25

We have a __blog series__ about the post-estimation commands that can be used with linear regressions. These commands are used to evaluate various aspects of a linear regression model. While these commands are useful, they can sometimes be difficult to interpret, especially if you prefer to evaluate relationships visually. Fortunately, Stata contains a wide range of post-estimation graph commands that allow you to test your model visually as well. These graphs are run after an estimation command, such as a regression. Below I list some post-estimation graphs commonly used with linear regression models and why you would use them.

This is not an exhaustive list, however it includes all commands listed in the Stata help page as being of ‘special interest’ to linear regression. The help page for **regress** diagnostic plots can be loaded with the **help regress postestimation plots** command.

__rvfplot__

__rvfplot__

This plots the residuals against the fitted values of the dependent variable. There should be no pattern to the residuals in this plot. If you see a curved pattern this indicates that your dependent variable is not linear. The dependent variable needs to be broadly linear for a linear regression to be the correct model fit.

Any other patterns you see in this plot indicate some form of heteroskedasticity in your regression. For example, if the spread of residuals gets wider as you proceed from one side of the graph to the other. This indicates your residuals are getting larger as the regression line progresses, rather than staying relatively uniform and homoscedastic. One of the underlying assumptions of linear regression is that of homoscedastic residuals. Any heteroskedasticity identified with this plot means your model is problematic.

You can use this plot in conjunction with the __estat hettest__, __estat szroeter__ and __estat imtest__ commands to identify heteroskedasticity. The **graph matrix** command can be used to check linear relationships between variables. You can also run a linear regression with only one independent variable to check if the relationship between two variables is linear.

__avplot and avplots__

__avplot and avplots__

This is an added-variable plot used in multiple regressions. It is also sometimes called a partial-regression leverage plot, a partial regression plot, or an adjusted partial residual plot. When you have a simple linear regression (one independent variable), you are able to graph both variables (dependent and independent) with the regression line all in one graph. This gives a reasonable indication of the relationship between dependent and independent variables. However, you cannot do this with multiple regressions. The **avplot **command generates a graph that shows the relationship between the dependent variable and one independent variable while holding all other variables constant. It gives an indication of the true relationship between variables in the model.

The **avplot** added-variable plot is used after a multiple linear regression. It will plot one independent variable against the dependent variable. The slope of an added-variable plot indicates the amount of influence an independent variable is having on the dependent variable. These plots are also useful for identifying outliers. Use the **avplots** command to plot all added-variable plots in one graph image. You can use this plot in conjunction with the __dfbeta__ and __predict__ commands to identify outliers. The **predict** command is used to look at several different influence statistics. Check out the linked tech tip for more information on how to use these.

**cprplot**

This is a component-plus-residual plot. This plot is used to identify non-linearity in the independent variables of a multiple regression model. The scatter and the slope of the line should look broadly linear in this plot. If it is not then the independent variable you are testing is not linear. All variables in a linear regression should be linear. You may wish to add a *lowess* smoothed line or a *median spline*. These can assist in detecting non-linearity in a scatter that appears somewhat linear.

You can use this plot in conjunction with __estat ovtest__ to test for non-linearity. To check for non-linearity in a single independent variable using **estat ovtest**, run a simple linear regression with your dependent variable and the independent variable you are testing. Then run **estat ovtest**.

**acprplot**

This is an augmented component-plus-residual plot. In a 1986 article by C.L. Mallows it is suggested that this plot is a bit better at detecting non-linearity than its non-augmented counterpart above. This graph has all the same options as the normal component-residual plot, so you may find a *lowess* or *median spline* line assists in identifying non-linearity. As with **cprplot** you can use this plot in conjunction with __estat ovtest__ to identify non-linearity in your independent variables more easily.

**rvpplot**

This independent variable plot graphs the residuals against an independent variable. It is a good plot for identifying regression assumption violations. If all linear regression assumptions are correct there should be no pattern to the graph. The interpretation of this plot is similar to the **rvfplot**. Any pattern is a problem, but a curved pattern indicates a non-linear relationship and some other patterns (such as cone or reverse-cone) indicate heteroskedasticity. As with **rvfplot** you can use __estat hettest__, __estat szroeter__ and __estat imtest__ to look for heteroskedasticity, and **graph matrix** or a simple linear regression is helpful when looking for linear relationships between variables.

**lvr2plot**

This plots leverage against normalized squared residuals (known as an L-R plot). This plot is useful in determining which individual observations have high leverage or large residual values. Leverage is on the Y-axis, and the red vertical line represents the average leverage across the dataset. Any point that is to the right of the red line has high leverage, and the points that are quite far to the right are cause for concern. Residual squared values are along the X-axis, and the red horizontal line represents the average squared residual value across the dataset. Any point that is above the red line has a large residual value, and the points that go up quite high are cause for concern.

An observation with high leverage is one that has no close neighbouring observations. Because this lonely value has no neighbours it will naturally pull the regression line towards it, influencing the slope of the line. An observation with a large squared residual indicates that the predicted value based on the regression line was a lot higher or a lot lower than the actual observed value. This indicates outliers in the data. You can use this plot in conjunction with the __dfbeta__ and __predict__** **commands to identify outliers or overly influential or leveraged values.

These are just a few of Stata’s many post-estimation plots. The type of regression you run will determine which post-estimation plots are most useful to you. For a list of post-estimation plots for your regression you should check the post-estimation plot help file for that regression. To access, type *help command postestimation plots* or they may also be listed under the overall help file with *help command postestimation*. For example, **help regress postestimation plots** will bring up all linear regression diagnostic plots, but this will not work for the **logistic** command. The **logistic** plots are recorded with all the other postestimation commands under **help logistic postestimation**.