Getting Started in Stata - Testing for Unequal Variances

Today we are going to show you how to test for unequal variance between two samples using both drop-down menus and Stata commands. When performing some hypothesis tests you can be asked whether the variance for your two groups is unequal. If your variance is unequal but you run a hypothesis test that assumes equal variances you can get an unreliable result. It is recommended that you test for unequal variances before performing a hypothesis test.

As a new Stata user it is recommended that you start by using the Stata menus to perform your analysis. Each analysis, such as a t-test or variance test, will show up in your Review pane (on the left side of the Stata screen) as the equivalent Stata command. This allows you to begin learning the general structure of commands and how to use them. Once you become more familiar with commands you will find it is faster and easier to perform your analyses using commands.

I am going to use the Stata Example dataset auto.dta. I load this dataset using the following command:

sysuse auto.dta

Unequal Variance Test via Stata Menus:

Statistics > Summaries, tables, and tests > Classical tests of hypotheses > Variance-comparison test

Select “Two-sample using groups”, then select “mpg” under Variable name, and select “foreign” under Group variable name. Leave the confidence level as 95, which indicates a significance (p-value) of 0.05 as the cutoff.

Click OK

Unequal Variances Test via Stata Commands:

sdtest mpg, by(foreign)

Output:

Stata prints the following to the Results pane (in the centre of the Stata window):

For this example I have performed a variance ratio test on the mpg of foreign cars versus the mpg of domestic cars. The “foreign” variable is used to separate the cars into two groups, in this case the groups indicate whether the car observation was made overseas or domestically. If you are using this to determine whether you have unequal variances for a subsequent hypothesis test, make sure you use the same variables here as for your hypothesis test.

Looking at the table, the first column is called “Group” and it allows us to identify individual statistics for each group, as well as combined statistics. The second column is “Obs” which lists the number of observations per group (sample), in this case 52 domestic cars and 22 foreign cars for a combined total of 74 cars. The third column is “Mean” which shows the average mpg. The fourth column is “Std. Err.” which is the standard error of the mean. The fifth column is “Std. Dev.” which indicates the standard deviation from the mean. Finally, the last two columns show the negative and positive confidence intervals, for which there is a 95% certainty the mean is contained in that range.

Underneath the table are some summary statistics. The “ratio” indicates how the ratio was calculated, in this case the standard deviation of the domestic sample divided by the standard deviation of the foreign sample. The “f” gives us our f-statistic value, which is used to calculate the p-value (significance level). The “Ho” shows the null hypothesis, in this case that the ratio will be 1 (equal). Across from the null hypothesis is the degrees of freedom (used with the f statistic to calculate the p-value), which for this test are 51 (domestic) and 21 (foreign). The last line shows three different alternate hypotheses. The middle alternate hypothesis is “ratio != 1” which states that the ratio does not equal 1. This hypothesis has a p-value of 0.0549, which at a significance level of 0.05 means the ratio is not statistically significantly different from 1.

The two alternate hypotheses on the left and right are only of interest if the middle statistic is significant. In this example the middle p-value is not statistically significant so this is where our analysis stops. We can assume we have equal variance for the purpose of any subsequent hypothesis test. However, if the middle p-value was statistically significant, then the two alternate hypotheses will indicate whether the difference between the calculated ratio and 1 is greater than or less than 1. This allows you to draw some more specific conclusions about your data. For this example, if we got a significant p-value, we can then determine that it is statistically significant that the difference is less than (<) 1 (equation on the left) but not statistically significant that the difference is greater than (>) 1 (equation on the right). We can then use the original “ratio” equation to determine that the variance of foreign cars is statistically significantly higher than the variance of domestic cars. However, in this example that knowledge is irrelevant to our overall conclusion of equal variances.