The avplot(s) Command - Linear Regression Post-Estimation Plots
The avplot command plots an added-variable plot of the dependent variable and one independent variable from a multiple regression. An added-variable plot is sometimes also known as a partial-regression leverage plot, a partial regression plot, or an adjusted partial residual plot.
When you have a simple linear regression (one independent variable only), you are able to graph both variables (dependent and independent) with the regression line all in one graph. This gives a reasonable indication of the relationship between dependent and independent variables.
For a multiple regression this is more complicated. The relationship between the dependent variable and independent variable A will impact the relationship between the dependent variable and independent variable B. This makes it more difficult to observe the exact relationship between variables. The avplot command generates a graph that shows the relationship between the dependent variable and independent variable A while holding all other variables constant. It is an indication of the true relationship between variables in the model.
The slope of an added-variable plot indicates the amount of influence an independent variable is having on the dependent variable. These plots are also useful for identifying outliers. The avplot command will plot a single independent variable of your choosing. It is worth noting that you can specify an independent variable that is not yet in the model using avplot, so you can see whether it would be a good addition to your model. The avplots command will plot all added-variable plots (one for each independent variable in the model) in one graph image.
For this example I am using the Stata auto example dataset. I will run a multiple linear regression and then use the avplots command to look at the relationships between each independent variable and the dependent variable. In the command pane I type the following:
This produces the following composite graph:
From these plots I can determine that the independent variable length has the strongest linear relationship with the dependent variable weight. The independent variable displacement also appears to have a linear relationship with weight, but it is not as strong. Both independent variables trunk and turn do not appear to have any linear relationship with weight in this model.
Let’s take a closer look at the avplot for displacement, as the linear relationship we see here is questionable. In the command pane:
This generates the same weight v displacement plot shown in the composite above, but it is bigger and easier to evaluate visually.
There are some issues that you can see with this graph. The linear line-of-best-fit appears to rely heavily on only a few plot points in the right side of the graph to get its linear fit. There are many more plot points on the left side of the graph but these appear more scattered and unrelated. I’d say that there are outliers here that are influencing the graph too much for the relationship to be considered properly linear.
From these avplots I can determine that only weight and length have a good linear relationship in this regression.
There is one additional use for the avplot command. It can be used on other variables in your dataset that you did not include in your model, to evaluate whether that variable would be a good addition to your model. Let's see whether the mpg variable would benefit our model. In the command pane:
This generates the following graph:
There appears to be a small linear relationship here. This relationship is a better fit than either trunk or turn, so perhaps mpg would be a good substitute for those two variables.
The avplot and avplots commands are a useful way of visually evaluating the relationships in multiple linear regression models. They can give a better indication of the relationships between variables than just using the regress p-values. A p-value will tell you whether or not a relationship is statistically significant, but it cannot tell you what that relationship looks like. The avplot for the mpg variable above is a good example.
If the mpg variable is added to the regression above and both trunk and turn are removed, the p-value given for mpg is 0.031 - which is statistically significant. However, we can see from the avplot above that although the relationship is statistically significant, it is also only having a small practical effect on the dependent variable weight. If the mpg is consistently impacting the weight of the vehicle, but the impact is only 1 ounce of weight per 5 mile change in mpg (miles per gallon), then mpg is not having any practical effect on the weight of cars when they weigh between 1700 and 4900 pounds.
Something that is statistically significant is not always practically significant. The avplot and avplots commands can help to illustrate the practical effectiveness of relationships.