"Hello, world"
local my1stlocalmacro
"Hello, world" global my1stglobalmarcro
Automating Tasks
Macros: Local vs. Global
Macros are a kind of temporary holding space akin to “variables” in R, Python, and other general program languages (remember: think of your data set as a spreadsheet, or in R/Python, etc. terms, a “data frame” and its variables as columns). These are also different from “macros” in Excel, Macs, etc. which are a set of step automated via short code scripts. There are 2 flavors of Macros in Stat: LOCALS and GLOBALS.
LOCALS last only as long as your commands are continuously running within a session and have limited (local/private) SCOPE, meaning their values are independent and cannot be accessed at different levels (i.e. within loops) in your code or other DO files you might run in the same session.
GLOBALS persist for your entire Stata session and have global (public) SCOPE, meaning their values are accessible and changeable at any point in your session.
Both types of macros can be used to storre strings or numeric values/expressions:
You define LOCALS with the “local” command, followed by what you want to name it.
The Stata syntax to refer to LOCALS with an open single quote “`” (shift+ ~), the name, and then a close single quote/apostrophe ” ’ ” (`my1stlocalmacro'
)
NOTE: Since quarto doesn’t support Stata code chunks, the closing single apostrophe has been changed to an open single quote for legibility in some cases.
Refer to GLOBALS with the dollars sign “$” followed by the global name (${my1stglobalmacro}
)
You can use macros to store lists of variables that you want to use over and over again:
"age gender"
local myvarlist `myvarlist` sum
Or for filepaths to use/save data and results from/to:
pwd`c(pwd)`
local workingdir "`workingdir'\Stata 301\" global saved_data
#create the directory, use capture in case it already exists...
"${saved_data}"
capture noisily mkdir #save the file there
"${saved_data}data_wide_copy.dta", replace save
In practical terms, I find GLOBALS easier to work with as they persist, and I tend not to do anything too too fancy things like running many PROGRAMS (see below) or DO/ADO (those downloaded user written commands) files within other DO files where scoping would be a bigger issue. However, it is technically bad form to use GLOBALS generally due to potential conflicts. Strictly speaking, good programming form is to use LOCALS rather than GLOBALS unless you have a program that must remember something from one run to the next. Even with DO files, you can actually pass values from one DO file to another run within it with LOCAL arguments (see below).
Looping
We can use forvalues
for looping over numeric values with consistent intervals:
=1(1)10 { #syntax is i=min(interval)max
forvalues i"`i'" n#call the values with `i'
display }
Loop over odd values:
=1(2)10 {
forvalues i"`i'"
display }
Loop over even values:
=2(2)10 {
forvalues i"`i'"
display }
foreach
is used to loop over non-numeric values or numeric values with inconsistent intervals. The syntax is: foreach lname in list
elements of the list are separated by a blank space.
in col_1 col_2 {
foreach k "`k'"
display }
You can combine the “of varlist” syntax with the flexible ways Stata allows you to define a VARLIST in order loop through a group of variables that start, end, or have some other common section in their names. Let’s try only looking at whether age and gender are related to if individuals had specifically a 4th arrest of anytype:
*4 {
foreach var of varlist arr_"`var'"
di `var` age gender
reg }
The forvalues
looping syntax assumes a regular pattern numeric pattern for values of a variable you might want to step through in your loop. However, this is not always the case, and trying to refer to a value that does not exist can lead to errors. We can use the levelsof
syntax to loop through just the valid values of either a numeric or even a string variable:
local(levels)
levelsof arrest_y1,
"`levels'"
display
foreach l of local levels {*4 {
foreach var of varlist arr_"`var' `l'"
di `var' age gender if arrest_y1 == `l`
reg outreg2 using "regressions.doc" , label ctitle("`var'_`l'") append
}
}
If else
If and else statements let you control the flow of your program to execute some commands and/or skip others depending on some condition:
=1(1)39 {
forvalues iif `i`<10 {
"Less than 10"
display
}if `i`>=10 & `i`<20 {
"10-20"
display
}else if `i`>=20 {
"Greater than 20"
display
} }
Programming (functions)
Programs are analogous to defining functions in R, Python etc. and allows you to take one or more pieces of varying input and execute the same set of commands for the varying input. A general rule is that if you’re copying the same code more than twice, you should turn it into a program to automate it for you and avoid errors.
#Defines the name of the program
program define test #Defines the input(s) in the program
args x "`x'" #Tells stata what to do with that input
display #Tells stata this is the end of your program end
To run the program type the name and the input(s)
"Stata is great" test
Suppose you want to make a bunch of histograms:
program define pretty_histogram
args var title color num`var`, percent
histogram title("`title'")
color("`color'") lcolor(black)
width(1)
ylabel(0(25)100,nogrid)
xlabel(0(5)35,nogrid)
xtitle("Number of Arrests")
graphregion(fcolor(white))
plotregion(style(none))
"AppendixA_Figure`num'.pdf", replace
graph export end
You can then call this program with pretty_histogram arr_theft "Arrests for Theft" "207 232 243" 1
Outputting Descriptive Statistics
There are three (3) frequently used ways of outputting results to external files, each with pros and cons
Function | Pros | Cons |
---|---|---|
POSTFILE |
|
|
TABOUT |
|
|
PUTEXCEL |
|
|
POSTFILE
Postfile creates a second temporary data set in memory, separate from the one you have open. You will run your summary statistics on the active data set, store the results in local macro, write the results in the locals to the temporary data set, and, finally, open the temporary data set and export it to an external Excel file.
First create a temporary data set in memory with a temporary name assigned to the local macro “memhold”
#First create a temporary data set in memory with a temporary name assigned to the local macro "memhold"
tempname memhold#In this line we actually create the temporary dataset in memory (but which is not active) in the namespace "memhold" The different variables/columns we want in our temporary data set are listed between "memhold" and using . Using specifies where the data set actually gets saved/stored eventually
`memhold` str20(varname) meanmale meanfemale pvaloftest using "${saved_data}FINRA_02_postfile.dta", replace
postfile #Next we will specify the different variables that we want statistics on in a loop we can iterate through. In this case we want to see if the mean/average value of these various variables differ significanlty by gender (A3)
in A3A r_white r_black r_hisp r_asian r_other reg_ne reg_mw reg_south reg_west {
foreach var #First find the mean value of the varible of interest for males (A3==1)
`var` if A3 == 1
su #Return list displays the statistics stata saved and can be stored in your locals. Store this average termporarily in a local macro called "meanmale"; we will write this vlaue to our temporary data file below in the "post" command line
return list = r(mean)
loc meanmale
#Next, find the mean value of the varible of interest for females (A3==2)
`var` if A3==2
su #store the result to a local macro called "meanfemale"
= r(mean)
loc meanfemale
#Then, we want to see if the average for males and females differs statistically,
#so we will use the mean estimation and the postestimatation test command to see.
#We could also use "ttest" to do this in one line, but "mean" is more flexible and allows us to use weights if we want
`var`, over (A3)
mean `var`]Male = [`var`]Female # Stata 15 syntax. If using Stata 16, use this alternative syntax: test `var'@1.A3 = `var'@2.A3
test [
#Save the 2-sided p-value of this difference of means test to a third local macro
= r(p)
local pvaloftest #Finally, for the variable we are actively looping through, write the mean value for males , mean value for females, and 2-sided p-value of the difference as a new observation (row) in our temporary data file (not the one we have open actively and are analyzing)
`memhold' ("`var`") (`meanmale``) (`meanfemale`) (`pvaloftest`)
post
}#Postclose tells Stata we are done writing to the temporary data file we created and to finalize/save it where we specified in the "postfile" command
`memhold`
postclose
#Then we just have to open up and export that new data file and export the results. First we may want to preserve our active data file, though, so we can restore it quickly to conduct additional analyses. This part is optional, though, since you could just reopen your original data file,or save a working version here and reopen it after the export instead of using preserve/restore
#optional
preserve #Open up your results file data set:
"${saved_data}FINRA_02_postfile.dta", clear
use #Export to Excel:
"${saved_data}FINRA_02_postfile.xls", replace
export excel using #optional
restore
#Take a look at the resulting Excel file for the output--its gives you the information you need,
#but the numbers are not formatted and there are no column headings to tell you what metric is in each column
#Sometimes, the better option is... TABOUT
#(though you would not be able to pull out and write the p-value for the significance test in TABOUT easily)
TABOUT
TABOUT is second way of outputting descriptive statistics that is pretty automated/easy to use
# Download command:
capture ssc install tabout #Can use to show tabulations
"${saved_data}FINRA_02.xls", cells(col) stats(chi2) replace
tabout A3 using
#Or crosstabs (by gender in this case):
foreach characteristic of varlist A4A A6 A5 {`characteristic` A3 using "${saved_data}FINRA_02.xls", cells(col) stats(chi2) append
tabout
}
#Can also use to show summary statistics with the "sum" option
#You need to specify which statistic to show for each variable whose mean, median, etc. you want to show
#Tabout then runs this by the differing variables listed before "using"
"${saved_data}FINRA_02.xls", sum c(mean A3A mean emp_full) f(4) append
tabout A3 A4A using #f(4) indicates four digits after the decimal
#Again, you can use loops and loops within loops to automate and quicken your work...
foreach characteristic of varlist A4A A6 A5 {
foreach outcome of varlist A3A emp_full {`characteristic' using "${saved_data}FINRA_02.xls", sum c(N `outcome` mean`outcome`) f(4) append
tabout }
}
#Can also download and use the OUTSUM command from SSC to export summary statistics, but usually I find tabout along with the sum option to be adequate
PUTEXCEL
PUTEXCEL is another way of outputting results from Stata to external file (the most tedious), but also the most flexible in terms of what to place in each individual cell and how to format it.
#First specify the file that you want to write your output to:
"${saved_data}test.xls", replace
putexcel set #Can write stored results/contents of local macros similar to postfile:
sum A3A= r(mean)
local avg = `avg`
putexcel B2= r(sd)
local stdDev = `stdDev`
putexcel B3 #Can also add labeling more easily than in postfile:
="Age" A2="Average" A3="Standard Deviation" putexcel B1