**Preliminary Scan**
/* See What PHAs have most Homeownership HCVs
and most and fewest relative to size of HCV program
Export summary files using dataframes and export excel 6/23/2022
Created by Dan Teles 10/19/2022
Updated by D T */
clear all
set more off16
version
*******************
** Data Prep
*******************
*import HCV Homeownership Data
"..\Data\Raw\HUD\Homeownership_Enrolllment_(2-21-21).xls", sheet("Homeownership Enrollment") firstrow
import excel *rename columns
rename D HOvouchers2015
rename E HOvouchers2016
rename F HOvouchers2017
rename G HOvouchers2018
rename H HOvouchers2019
rename I HOvouchers2020
rename J HOvouchers
*save as a temporary file
tempfile HOvouchers`HOvouchers', replace
save
*import HCV data
clear
import delimited "..\Data\Raw\HUD\hudPicture2021_462940.csv", stringcols(6)
keep name code subsidizedunitsavailable occupied reported
*rename code PHAcode for merge
rename code PHACode
*rename subsidizedavailableavailable to be shorter
rename subsidizedunitsavailable HCVavailable
*remove =" " formatting from PHACode and PHA name
replace PHACode = subinstr(PHACode,`"""', "", .)
replace PHACode = subinstr(PHACode,`"="', "", .)
replace name = subinstr(name,`"""', "", .)
replace name = subinstr(name,`"="', "", .)
*merge back on HCV Homeownership data
1:1 PHACode using `HOvouchers'
merge
*********
** Data Exploration
*********
* summarize data
sum HOvouchers, detail
*create new variables: percent, increase, and percentage increase
gen pctHO=HOvouchers/HCVavailable
gen HOincrease=HOvouchers-HOvouchers2015
gen pctHOinc=(HOvouchers/HOvouchers2015)-1
*summarize new varaibles
sum pctHO, detail
sum HOincrease, detail
sum pctHOinc, detail
****First Export: PHAs with the most HOvouchers
* copy main dataset into a new frame
frame copy default MostHOVs
* load the new frame
frame change MostHOVs
* keep just the columns we care about
keep PHACode PHAName HCVavailable HOvouchers pctHO
* sort the data by number of HO vouchers
gsort - HOvouchers
* keep the top 25 PHAs
keep if _n<=25
* reorder the columns
order PHACode PHAName HCVavailable HOvouchers pctHO
*export to excel
export excel using "..\Output\InitialExploration", sheet("MostHOVs", modify) firstrow(variables)
****Second Export: PHAs with the most HOvouchers as a percent of total
* copy main dataset into a new frame
frame copy default MostHOVs_aspct
* load the new frame
frame change MostHOVs_aspct
* keep just the columns we care about
keep PHACode PHAName HCVavailable HOvouchers pctHO HOincrease
* sort the data by number of percent of vouchers that are for homeownership
gsort - pctHO
* keep the top 25
keep if _n<=25
order PHACode PHAName HCVavailable HOvouchers pctHO
* export to excel
export excel using "..\Output\InitialExploration", sheet("MostHOVs_aspct", modify) firstrow(variables)
****Third Export: PHAs with the most HCVs and ZERO HOvouchers
* copy main dataset into a new frame
frame copy default MostHCVs_NoHOVs
* load the new frame
frame change MostHCVs_NoHOVs
* keep just the columns we care about
keep PHACode name HCVavailable HOvouchers pctHO HOincrease
* change a column name
rename name PHAname //use PHAname from HCV data instead from HOV data
* sort the data by number HCVs
gsort - HCVavailable
* keep the PHAs with ZERO homeownership vouchers
keep if HOvouchers==0 | HOvouchers==.
* keep the 25 PHAs with the most HCVs (and no HO vouchers)
keep if _n<=25
order PHACode PHAname HCVavailable HOvouchers pctHO
* export to excel
export excel using "..\Output\InitialExploration", sheet("MostHCVs_NoHOVs", modify) firstrow(variables)
****Fourth Export: PHAs with most growth
* copy main dataset into a new frame
frame copy default Growth
* load the new frame
frame change Growth
* drop if HO increase is undefined
drop if HOincrease==.
* keep just the columns we care about
keep PHACode PHAName HCVavailable HOvouchers pctHO HOincrease
* sort
gsort - HOincrease
* keep PHAs with an increase of 10 or more
keep if HOincrease>10
order PHACode PHAName HCVavailable HOvouchers pctHO
*export to excel
export excel using "..\Output\InitialExploration", sheet("HOIncrease", modify) firstrow(variables)
****Fifth Export: Overall Summary Stats by year
* copy main dataset into a new frame
frame copy default sum_long
* load the new frame
frame change sum_long
**create aggreagates by year and graph
rename HOvouchers HOvouchers2021
**reshape the dataset from wide (years are columns) to long (years are rows)
reshape long HOvouchers, i(PHACode) j(year)
*keep fields we care about
keep PHACode PHAName year HOvouchers
*create a variable for new vouchers each year
by PHACode: gen New = HOvouchers-HOvouchers[_n-1] // the n-1 thing pulls from the row above
* create a variable for percent new
gen pctNew=New/HOvouchers[_n-1]
* create a yes/no variable for whether the PHA has any HO vouchers in a given year
gen any=HOvouchers>0 & HOvouchers!=.
* create a yes/no variable for whether the PHA has any new HO vouchers in a given year
gen anyNew=(New>=1 & New!=.)
** Collapse to a summary across all agencies by year
gen PHAs=1 // create a counter for number of agencies
collapse (sum) HOvouchers any anyNew PHAs, by(year)
*reorder and export to excel
order year PHAs any HOvouchers anyNew
export excel using "..\Output\InitialExploration", sheet("Annual", modify) firstrow(variables)
Export Excel Files
Reminder
Please use all code samples responsibly - these are samples and likely require adjustments to work correctly for your specific needs. Read through the documentation and comments to understand any caveats or limitations of the code and/or data and follow-up with the code author or Code Library admins if you have questions on how to adapt the sample to your specific use case.
Purpose: A data scan to understand scope of Homeownership voucher program. It provides examples of how to use Stata dataframes, reshape, collapse, and export excel to create easy-to-digest excel tables.
Data: Public Data from HUD on the HCV program and the homeownership voucher option within that program.
Author: Dan Teles (June 2022)