top of page

New and Improved Tables in Stata - An Introduction

Stata recently introduced the collect command, which allows you to consume results generated by Stata commands and mould them into customised tables. Any command that returns results in r() or e() macros can be consumed by collect. A couple of examples are regression models (e.g. linear, logistic, generalized, etc.), summary statistics (using the improved table command), statistical tests (e.g. t-test, chi2, etc.), and much more.


To determine if a command has consumable results, first run your command followed by these commands:

return list
ereturn list

If either of these commands prints output to the results pane, then the initial command you ran has results that can be consumed by collect.


Any command output that is consumed by collect is saved into a collection. By default, the collect command will save your consumed results to a collection called “default”. We will cover collections in more detail in a different post. Once you have consumed results with the collect command, you can then use the collect layout command to set up the overall table structure (rows and columns). When Stata consumes a command output, it automatically assigns both dimension(s) and level(s) to each item. This is what you use to define your table structure.


A dimension is like an overarching group. For example if you run a linear regression, the “coefficient” and the “standard error” are both assigned to the “results” dimension. The variables and the constant from the regression are assigned to the “colname” dimension.


A level is an individual item within the dimension. So for a linear regression, the coefficient (referred to as _r_b) is a level within the “results” dimension. The standard error (referred to as _r_se) is another level within the “results” dimension. The different variable names and the constant (referred to as _cons) are all separate levels within the “colname” dimension.


By separating everything into dimensions and levels Stata allows you to assign different levels or dimensions to different columns, rows, or even multiple tables. This means that your output can be manipulated to appear in every possible combination.


How to Use:

*Run a command, then:

collect get r() e()
collect layout (row_dimension[level(s)]) (column_dimension[level(s)]) (table_dimension[level(s)])

NOTE: You are not required to specify [level(s)], if you only specify the dimension then only the default set of levels will be shown. The default set of levels doesn’t include everything that was collected and is specific to the command you have collected from.


Worked Example 1: A Default Linear Regression

In this example I am going to demonstrate the use of both the collect commands and the Stata menus/Tables Builder to create a simple table using linear regression output. I will use the Stata example dataset auto.dta.



To start I will collect output from a linear regression and structure a simple table.


Tables via Commands:

sysuse auto, clear
regress price mpg
return list
ereturn list
collect get r() e()
collect layout (result) (colname)

The layout command produces the following table output:

Tables via Menus:

After loading the auto dataset with the sysuse auto command, we can run the regression through the Stata menus before collecting the output.


Statistics > Linear models and related > Linear regression



Now that we've run the regression, we can collect the output:


Statistics > Summaries, tables, and tests > Tables and collections > Collect results



Once we've collected the output, we can open the Tables Builder to build our table layout:


Statistics > Summaries, tables, and tests > Tables and collections > Build and style a table



The dimensions and levels can be seen and selected in the top-left corner of the Tables Builder (shown above). In the top right are three squares, labelled rows, columns, and tables. This is where the selections of dimensions and levels are placed. The "plus" buttons to the left of the three squares are used to assign dimensions and/or levels to their chosen positions. In this case, I am only interested in the "result" and "colname" dimensions (the top two in the list). I click on the "result" dimension and click the plus next to rows, then I click on the "colname" dimension and click the plus next to columns.

This changes the view in the Tables Builder to the following:


As you can see a preview of the resulting table is shown in the bottom-right of the Tables Builder (shown above).


There are a few things to note here. For the collect get command, it doesn’t matter what you specify beyond either r() or e(). You can specify a specific r() or e() result (or select one via the menu item) - for example r(table) to collect the matrix table - but this will still cause Stata to collect all the returned results in r() and/or e(). This is why I prefer to specify only r() and/or e() by themselves as this requires less typing. As I didn’t specify any levels for either of my dimensions in the layout, the default set for each dimension was shown. In this case, only the results from the third regression table output are shown, even though the information in the other two tables was also collected.


Worked Example 2: Selecting Levels

You can see the list of levels for each dimension in the Tables Builder. In addition to that, the following list of levels will be collected in the dimension result where appropriate:


_r_b -- coefficients or transformed coefficients reported by command

_r_se -- standard errors of _r_b

_r_z -- test statistics for _r_b

_r_df -- degrees of freedom for _r_b

_r_p -- p-values for _r_b

_r_lb -- lower bounds of confidence intervals for _r_b

_r_ub -- upper bounds of confidence intervals for _r_b

_r_ci -- confidence intervals for _r_b

_r_cri -- credible interval (CrI) of Bayesian estimates

_r_crlb -- lower bound of CrI of Bayesian estimates

_r_crub -- upper bound of CrI of Bayesian estimates


You can refer to this list in Stata by looking at the collect get help file.


I want to refine my previous table to show only the coefficient and p-value of the independent variable mpg. To do this I just need to add levels to the result and colname dimensions.


Tables via Commands:

The only command that changes from our previous example (1) is the last one, which we change to the following:

collect layout (result[_r_b _r_p]) (colname[mpg])

The layout command produces the following table output:

Tables via Menu:

All the steps are the same as in the previous example (1), we just need to add our selected levels. The easiest way to do this is within the three squares (rows, columns, and tables) at the top left of the Tables Builder. The dimensions that we previously selected and moved into these squares have little drop-down arrows to the right of the dimension names. Click the drop-down arrow for the "result" dimension in rows and select "Edit":

A box will appear and on the right of that box you can "tick" individual levels to select them:

Once you click "OK", the selections will show in the appropriate square of the Tables Builder:

We can also make the same selections for the "colname" dimension:



The Tables Builder then appears as follows with the appropriate table preview:


 

Dimensions and levels are the core to building complex tables in Stata 17. For the layout of tables I prefer to use the Tables Builder rather than just the commands. This allows for easier visualisation and manipulation to achieve the table structure I want. Once I have achieved my desired layout I can then use the corresponding layout command to reapply the layout where appropriate.

1,110 views0 comments

Recent Posts

See All
bottom of page