Export Excel Files

Reminder

Please use all code samples responsibly - these are samples and likely require adjustments to work correctly for your specific needs. Read through the documentation and comments to understand any caveats or limitations of the code and/or data and follow-up with the code author or Code Library admins if you have questions on how to adapt the sample to your specific use case.

Purpose: A data scan to understand scope of Homeownership voucher program. It provides examples of how to use Stata dataframes, reshape, collapse, and export excel to create easy-to-digest excel tables.

Data: Public Data from HUD on the HCV program and the homeownership voucher option within that program.

Author: Dan Teles (June 2022)

**Preliminary Scan**
/* See What PHAs have most Homeownership HCVs
 and most and fewest relative to size of HCV program
 Export summary files using dataframes and export excel 
 Created by Dan Teles 6/23/2022
 Updated by D T 10/19/2022
*/

clear all
set more off
version 16

*******************
** Data Prep
*******************
*import HCV Homeownership Data
import excel "..\Data\Raw\HUD\Homeownership_Enrolllment_(2-21-21).xls", sheet("Homeownership Enrollment") firstrow
*rename columns
rename D HOvouchers2015
rename E HOvouchers2016
rename F HOvouchers2017
rename G HOvouchers2018
rename H HOvouchers2019
rename I HOvouchers2020 
rename J HOvouchers

*save as a temporary file
tempfile HOvouchers
save `HOvouchers', replace

*import HCV data
clear
import delimited "..\Data\Raw\HUD\hudPicture2021_462940.csv", stringcols(6) 

keep name code subsidizedunitsavailable occupied reported

*rename code PHAcode for merge
rename code PHACode

*rename subsidizedavailableavailable to be shorter
rename subsidizedunitsavailable HCVavailable 

*remove =" " formatting from PHACode and PHA name
replace PHACode = subinstr(PHACode,`"""', "", .)
replace PHACode = subinstr(PHACode,`"="', "", .)
replace name = subinstr(name,`"""', "", .)
replace name = subinstr(name,`"="', "", .)

*merge back on HCV Homeownership data
merge 1:1 PHACode using `HOvouchers'

*********
** Data Exploration
*********
* summarize data
sum HOvouchers, detail

*create new variables: percent, increase, and percentage increase
gen pctHO=HOvouchers/HCVavailable
gen HOincrease=HOvouchers-HOvouchers2015
gen pctHOinc=(HOvouchers/HOvouchers2015)-1

*summarize new varaibles
sum pctHO, detail
sum HOincrease, detail
sum pctHOinc, detail 

****First Export: PHAs with the most HOvouchers
* copy main dataset into a new frame
frame copy default MostHOVs 
* load the new frame
frame change MostHOVs
* keep just the columns we care about 
keep PHACode PHAName HCVavailable HOvouchers pctHO
* sort the data by number of HO vouchers
gsort - HOvouchers
* keep the top 25 PHAs
keep if _n<=25
* reorder the columns 
order PHACode PHAName HCVavailable HOvouchers pctHO
*export to excel 
export excel using "..\Output\InitialExploration", sheet("MostHOVs", modify) firstrow(variables) 

****Second Export: PHAs with the most HOvouchers as a percent of total
* copy main dataset into a new frame
frame copy default MostHOVs_aspct 
* load the new frame
frame change MostHOVs_aspct
* keep just the columns we care about 
keep PHACode PHAName HCVavailable HOvouchers pctHO HOincrease
* sort the data by number of percent of vouchers that are for homeownership
gsort - pctHO
* keep the top 25
keep if _n<=25
order PHACode PHAName HCVavailable HOvouchers pctHO
* export to excel 
export excel using "..\Output\InitialExploration", sheet("MostHOVs_aspct", modify) firstrow(variables) 


****Third Export: PHAs with the most HCVs and ZERO HOvouchers
* copy main dataset into a new frame
frame copy default MostHCVs_NoHOVs 
* load the new frame
frame change MostHCVs_NoHOVs
* keep just the columns we care about 
keep PHACode name HCVavailable HOvouchers pctHO HOincrease
* change a column name
rename name PHAname //use PHAname from HCV data instead from HOV data
* sort the data by number HCVs
gsort - HCVavailable
* keep the PHAs with ZERO homeownership vouchers
keep if HOvouchers==0 | HOvouchers==.
* keep the 25 PHAs with the most HCVs (and no HO vouchers)
keep if _n<=25
order PHACode PHAname HCVavailable HOvouchers pctHO
* export to excel 
export excel using "..\Output\InitialExploration", sheet("MostHCVs_NoHOVs", modify) firstrow(variables)

****Fourth Export: PHAs with most growth
* copy main dataset into a new frame
frame copy default Growth
* load the new frame
frame change Growth
* drop if HO increase is undefined
drop if HOincrease==.
* keep just the columns we care about 
keep PHACode PHAName HCVavailable HOvouchers pctHO HOincrease
* sort 
gsort - HOincrease
* keep PHAs with an increase of 10 or more 
keep if HOincrease>10
order PHACode PHAName HCVavailable HOvouchers pctHO
*export to excel 
export excel using "..\Output\InitialExploration", sheet("HOIncrease", modify) firstrow(variables)

****Fifth Export: Overall Summary Stats by year
* copy main dataset into a new frame
frame copy default sum_long
* load the new frame
frame change sum_long 
**create aggreagates by year and graph
rename HOvouchers HOvouchers2021
**reshape the dataset from wide (years are columns) to long (years are rows)
reshape long HOvouchers, i(PHACode) j(year)
*keep fields we care about
keep PHACode PHAName year HOvouchers
*create a variable for new vouchers each year
by PHACode: gen New = HOvouchers-HOvouchers[_n-1] // the n-1 thing pulls from the row above
* create a variable for percent new
gen pctNew=New/HOvouchers[_n-1]
* create a yes/no variable for whether the PHA has any HO vouchers in a given year
gen any=HOvouchers>0 & HOvouchers!=.
* create a yes/no variable for whether the PHA has any new HO vouchers in a given year
gen anyNew=(New>=1 & New!=.)
** Collapse to a summary across all agencies by year
gen PHAs=1 // create a counter for number of agencies
collapse (sum) HOvouchers any anyNew PHAs, by(year)
*reorder and export to excel 
order year PHAs any HOvouchers anyNew
export excel using "..\Output\InitialExploration", sheet("Annual", modify) firstrow(variables)