The cprplot and acprplot Commands - Linear Regression Post-Estimation Plots

A component-plus-residual plot is used to identify non-linearity in the independent variables of a multiple regression model. There is also an augmented version of this plot which can be a bit better at detecting non-linearity than the original.

The scatter and the slope of the line in these plots should look broadly linear. If it is not then the independent variable you are testing is not linear. All variables in a linear regression should be linear. Both plots allow you to add either a lowess smoothed line or a median spline line to the plot which can sometimes be helpful in identifying non-linear patterns in the plots, especially when the scatter appears somewhat linear.

How to Use:

*Run a multiple linear regression, then

OR to add a lowess smoothed line

OR to add a median spline line

Worked Example:

For this example I use the Stata example dataset auto.dta. I will run a multiple linear regression and then use the cprplot and acprplot commands to look for non-linearity in the independent variables in the model. In the command pane I type the following:

This produces the following four graphs. The first two show the original and augmented component-plus-residual plots for the mpg variable, with the lowess smoothed line to help with visual evaluation. The second two graphs show the original and augmented component-plus-residual plots for the weight variable, with the median spline line to help with visual evaluation.

The cprplot (left) indicates there aren’t any major issues with this fit. The lowess smoothed line does not appear to deviate too much from the line of best fit. The acprplot (right) is a bit more pronounced, with the lowess smoothed line deviating a little more than we saw in the cprplot. Overall the linearity appears sound, but from these graphs we cannot rule out a non-linear relationship. Further investigation with other diagnostic tests, such as estat ovtest, may help to identify any issues.

The cprplot (left) looks alright, although the clusters of observations that sit nicely along the median spline line are concerning. The acprplot (right) median spline line fits the observations a bit better than the original plot, and I am a bit concerned about the non-linearity in the relationship between the independent variable weight and the dependent variable price in this multiple linear regression. Further investigation using other statistical diagnostic tools such as the estat ovtest command may provide further indication of an issue with non-linearity here.

To look at non-linearity for a single variable, it can be helpful to run a simple linear regression with your dependent variable and the independent variable in question. In this example, we see more of a problem with the weight variable than the mpg variable, and so for further investigation you can run a regression of price with weight. If you do this in Stata and run a non-linearity statistical test with estat ovtest, you get a highly statistically significant result, indicating that the relationship is indeed not linear.

It is important not to rely on a single diagnostic test to make conclusions about the robustness of your regression model. Make sure to use all the diagnostic tests available, both statistical and graphical, to make a proper informed decision about your regression model.

1,170 views0 comments

Recent Posts

See All