Getting Started in Stata - Linear Regression

Today we are going to show you how to perform a simple linear regression in Stata using both drop-down menus and Stata commands. As a new Stata user it is recommended that you start by using the Stata menus to perform your analysis. Each analysis, such as a linear regression, will show up in your Review pane (on the left side of the Stata screen) as the equivalent Stata command. This allows you to begin learning the general structure of commands and how to use them. Once you become more familiar with commands you will find it is faster and easier to perform your analyses using commands.

I am going to use the Stata Example dataset auto.dta. I load this dataset using the following command:

sysuse auto.dta

Linear Regression via Stata Menus:

Statistics > Linear models and related > Linear regression

Select “mpg” as the dependent variable and select “weight” and “foreign” as the independent variables

Click OK

Linear Regression via Commands:

regress mpg weight foreign

Output:

Stata prints the following to the Results pane (in the centre of the Stata window):

The table in the top left is showing the raw statistics generated from the least squares regression. The “SS” is “sum of squares”, “df” is “degrees of freedom”, and “MS” is “mean square”. For this regression 1619 of the sum of squares is explained by the model, and 824 is unexplained residual.

The table in the top right contains summary statistics related to the regression. It tells us there were 74 observations in the dataset. The F statistic is used to test the hypothesis that all coefficients apart from the constant are 0 (zero). The F statistic for this model is 69.75, which is based on a numerator of 2 df and a denominator of 71 df. The probability of observing an F statistic this high is 0.000 (in Stata this means the p-value was less than 0.00005). The R-squared for this regression is 0.6627 and the R-squared adjusted for degrees of freedom is 0.6532. The Root MSE is 3.4071, which is the square root (√) of the mean squared error of the residual.

The bottom table contains the results of the linear regression. In the top-left corner of the table is your dependent variable. The independent variables are listed below along with the constant or intercept (_cons). The results table gives the coefficients, standard error of the coefficients, the t statistic which is used to calculate the p-value, and positive and negative confidence intervals.

If we operate off a significance (p) value of 0.05, we can see that our overall model is statistically significant, however within our model the foreign variable is not statistically significant whereas the weight variable is.