By Luis Eduardo San Martin, Rony Rodriguez-Ramirez, and Mizuhiro Suzuki
More and more, statistical programming code is developed collaboratively across large teams. Add reviewers to this and it’s a party! Similar to co-authors fighting over the wording of their paper intro, it is also common for people to have different coding conventions. This makes the prospect of reading someone else’s code daunting at best, and may lead to misunderstandings and mistakes. Not to weigh in on the Stata vs. R war, but this is especially true when using Stata: unlike other programming languages, there is no widely accepted style guide, and few economics graduate students are taught best practices for writing code. [Side note: Languages like R solve this problem by having ``style guides’’ like the Google R style guide or the Tidyverse style guide. Coding tools like RStudio have “linters” which can automatically apply these styles to code you write.] Teeee-heeeee.
To address this clear shortcoming and attempt to restore peace across R and Stata users, DIME Analytics recently launched a new tool: the Stata linter. The Stata linter helps users better write Stata code by identifying problematic coding practices. Here is how it works.
How does “linting” make code better?
Good code is both correct (produces the intended output) and easily comprehensible to someone who has never seen it before (Jones, Bjärkefur, Cardoso, and Daniels, 2021). The Stata linter helps on the second point. Since Stata does not have a style guide, the first step was for DIME Analytics to propose a Stata Style Guide, setting standard coding conventions for Stata. These were based on protocols developed at DIME as well as analysis of the hundreds of do-files reviewed through our peer code review process and reproducibility checks. In a second step, the Stata linter automates that Style Guide by detecting use of non-standard practices and suggesting corrections. While the tool prefers our suggested coding convention, it is also quite adaptable. It includes two functionalities:
- Detection: identifies coding practices that should be changed to improve code clarity. It can display a list of by-line items where corrections are needed using the verbose option.
- Correction: automatically applies corrections to some of the identified bad coding practices and saves a new do-file with the results. Note that this command is not guaranteed to correct codes without changing results. We strongly recommend that you verify that the outputs of the do-file do not change after applying this command.
How to use the Stata Linter
The Stata linter command runs in Stata but uses Python code in the background. You will need Stata 16 (or higher) and Python to run it, and you will have to make sure that your installations of Stata and Python are integrated and that the Pandas package is installed.
You can install the Stata linter by running the following command in Stata:
ssc install stata_linter
The basic syntax of the command is the following (you can find the do-file used for this example here). N.B., the name of the package is stata_linter but the name of the command is lint.
lint "${test_dir}/bad.do"
Detection feature
The browser displays a list of bad practices and specifics on whether / how often each was found. Linting the example do-file gives the following output:
-------------------------------------------------------------------------------------
Bad practice Occurrences
-------------------------------------------------------------------------------------
Hard tabs used instead of soft tabs: Yes
One-letter local name in for-loop: 3
Non-standard indentation in { } code block: 7
No indentation on line following ///: 1
Use of . where missing() is appropriate: 5
Missing whitespaces around operators: 0
Implicit logic in if-condition: 1
Delimiter changed: 1
Working directory changed: 0
Lines too long: 4
Global macro reference without { }: 0
Potential omission of missing values in expression: 1
Backslash detected in potential file path: 0
Tilde (~) used instead of bang (!) in expression: 5
-------------------------------------------------------------------------------------
The detection feature can be customized using a variety of options, such as:
1. Show exactly which lines have bad coding practices
lint "test/bad.do", verbose
2. Remove the summary of bad practices
lint "test/bad.do", nosummary
3. Specify the number of whitespaces (default: 4):
lint "test/bad.do", indent(2)
4. Specify the maximum number of characters in a line (default: 80):
lint "test/bad.do", linemax(100)
5. Specify the number of whitespaces used instead of hard tabs (default: 4):
lint "test/bad.do", tab_space(3)
6. Exports the results of the line by line analysis to an Excel file
lint "test/bad.do", excel("test_dir/detect_output.xlsx")
7. Finally, you can also use this command to test all the do-files that are in a folder:
lint "test"
Correction feature
The linter can automate corrections to the identified bad practices. When you choose to do this, you are asked to specify the name of the do-file where the corrections will be saved - this ensures your original do-file is not overwritten. Clever!
lint "test/bad.do" using "test/bad_corrected.do"
You are then asked whether you want each specific bad practice detected to be corrected. For example, the command above displays the following requests for confirmation:
------------------------------------------------------------
Correcting do-file
------------------------------------------------------------
Avoid using [delimit], use three forward slashes (///) instead.
Do you want to correct this? To confirm type Y and hit enter, to abort type N and hit enter. Type BREAK and hit enter to stop the code. See option automatic to not be prompted before creating files.
Other options for correction include:
1. Automatic (Stata corrects the file automatically, without confirmation of use):
lint "test/bad.do" using "test/bad_corrected.do", automatic
2. Replace the output file if it already exists
lint "test/bad.do" using "test/bad_corrected.do", automatic replace
A list with all the bad practices listed in detection and fixed in correction can be found in the command help file or the Stata linter DIME Wiki page.
Recommended use
We recommend the following workflow:
- Use the detection feature to get an idea of how many bad coding practices the do-file has.
- Decide whether to use or not the correction feature. If only a few bad practices are flagged, they could be corrected manually with help of the verbose option
- If there are many bad practices, use the correction feature and verify that the outputs of the do-file have not changed
- Re-apply the detection feature and correct any outstanding issues manually
We hope you find this tool useful and help us improve it through feedback, feature requests, or bug reports here.
Join the Conversation