Making tables from Stata is one of the most common coding tasks in applied economics. For most researchers, it is also one on which much time is wasted: questions about better ways of automating the formatting of nice tables from Stata often crop up on social media. Reproducibility in economics also crucially depends on streamlining this process.
Take the example of the pre-working paper publication reproducibility checks DIME has recently implemented. This is a bit of a fancy group: all codes are typically hosted on GitHub, all data registered in the microdata Library, RAs get a lot of training on reproducible coding practices. Yet, out of 22 recently reviewed papers, most failures to replicate had to do with the workflow used to export tables.
In this blog, we propose workflows to minimize the pain and increase the gains. We share the framework DIME Analytics has developed to help research teams with the task of coding tables, discuss two distinct stages to the problem, and link to Stata code for getting the job done.
The bad and the ugly
When incorporating tables into papers, it is a common practice to copy-and-paste results from csv and Excel files, or the Stata window, and then format then in Word. Some setups are more manual than others, but the road to sharing results that do not reproduce is short: all you need to do is to not copy one table, or one line of one table, after updating your data or specification. Additionally, heavily formatting tables after they are exported often makes it harder to confirm that the results exported are the same as the ones shown in the paper. This all needs to stop!!!
Two stages for coding tables
There are lots of reasons to export tables somewhere other than the Stata results window, but they don't all justify the same approach. You might be exploring regression results with various specifications, and not want to read them one-by-one. You might be preparing a report or paper for submission or publication. Your journal might require tables inline in Word. (Really.) Depending on what you are doing now and what you might need to do in the future, there are some questions that should help you triage before implementing code:
- Do I need this output to be immediately shareable without post-processing?
- Is this output ready for publication, or just for discovery and exploration? - Do I need to be able to adjust number formatting and rounding later?
- Will I need to adjust table layout and formatting later?
- What will be the required workflow when I re-produce this table?
- What will happen to the table if I alter models, parameters, or other core components?
- Am I likely to alter models, parameters, or other core components?
Different use cases have different answers to these questions, but most projects fall into one of two broad development stages.
- Stage One: you only care about making the information human-readable now, and you are going to use those results to adjust the structure of the table repeatedly. You may adjust the models and parameters, rename the rows and columns, delete or add lines; but you will probably not finalize that output for a while. Therefore, Stage One only requires you to export minimally formatted and annotated tables: just enough to understand what your results are telling you without optimizing for page numbers and aesthetic concerns. You may even be sharing the tables in markdown. This stage should not take long to implement, as formatting to death is usually the hardest part. And really, you don't want to spend a lot of time making your tables look super nice at this stage: you are very very unlikely to use them for the final output.
- Stage Two: At some point you will want to share nice-looking, easy to read tables with researchers or policymakers beyond those in your direct research team. When moving to Stage Two, be sure that you really need nicely-formatted, reproducible output, ready for publication, that is not going to need many structural adjustments later. You may think that it is fine to just copy the results into Word and format them manually, as this is a one-time operation. But it really never ever is… You may continue receiving data, or may want to make very minor adjustments that don't require reporting new information, or the journal you are submitting to may require a reproducibility check, the referees may propose a better way to present your results, etc. So, if the core structure of a table is set, it is worth the one-time investment of formatting it up in Stata code. Once you agree to move to Stage Two, the key thing to keep in mind is that the RA will spend a fair amount of time implementing this, so we recommend only doing it once you have found your core set of results and discussed the best way to present them.
The way you move from Stage One to Stage Two will depend on the output software you plan to use. Here, we will describe how to automate LaTeX tables — we also talk about working in Excel here. LaTeX gives more flexibility with auto-updating tables in reports and presentations; however, there is some fixed cost in learning the formatting language. Excel remains popular (and is preferred by some journals for house styling), but applying formatting in Excel remains a largely manual process, slowing down replication runs.
In all cases, we recommend a simple file structure to help keep organized. Every table has its own file. When tables are in Stage One, these outputs should be named informatively: names like
balance-tables.tex are great. During Stage Two, these should change to structural names:
table-A05_robustness.xlsx are acceptable (note the use of dashes, leading zeros, and underscores to organize semantic content: learn more about naming things). This ensures that you and your code reviewers can always find things and understand how they are connected both to the code and to the final product. Use Git wherever possible to track and store past and alternative specifications, and how they affect your results.
Coding tables from Stata in LaTeX
Last month, the two most-downloaded packages from SSC were
outreg2, which are used to export tables. Both can create simple tables in LaTeX, although they will not always look the nicest without formatting. Exporting results to individual
.tex files for each table and importing them with
input into a master
.tex document is the easiest way to create outputs when you are still making changes to the results. The greatest advantage of all this is that you only need to recompile the master document, without any copy-pasting or opening multiple files to see all the new results at once.
As for actually doing this, the
estout package, by Ben Jann, has lots of options. You can get it to do basically anything you want! The default table is pretty simple, and the documentation is huge, but we've prepared a few go-to examples that solve the most common formatting needs for a LaTeX table. The
esttab command also allows you to export nicely formatted tables to Word, Excel, csv and HTML, but the options vary from one format to the other.
If you're trying to create a very specific table format, the easiest way to do it in a replicable manner is to write the complete LaTeX code for the table. This means saving any number that should be displayed as locals, and hardcoding the LaTeX code for the table. But instead of writing the number themselves, you just call the locals that were previously saved.
filewrite allows you to write the LaTeX code in a do-file, then have Stata write the text file with the table, and save it as a
.tex file. You can find an example of how to use it here.
The two commands above are our go-to solution to exporting tables to LaTeX. However, there are a few other options out there.
outreg2 also exports tables to LaTeX format, but we've found it harder to use and to find resources than
stata-tex is another option for custom tables, but takes some more setting up with Excel and Python. Finally, you can write a whole HTML, word or PDF document using different options for Stata markdown, entirely within Stata. Discussing these options would take yet another blog post, but you can check out the dynamic documents,
texdoc documentation for more information.
Keep it reproducible
Whichever software and packages you decide to use, automating your table creation workflow will likely save you time, as long as you do it at the right moment. It will also greatly reduce the risk of circulating, submitting or publishing manuscripts with out-of-date results. And if we may plant another seed for debate, this, rather than aesthetics is also the main reason to prefer LaTeX over Word for applied work.