Contents
- Contents
- Setting up Development Enviroment
- Functions in this package
- Datatypes in this package
- Typical Julia Workflow
Setting up Development Enviroment
- Install Julia and GitHub Desktop - not strictly required but never hurts to have it!
- Install
vscode
and follow basic instructions in https://github.com/ubcecon/tutorials/blob/master/vscode.md- In particular, https://github.com/ubcecon/tutorials/blob/master/vscode.md#julia, making sure to do the code formatter step.
- and the git settings in https://github.com/ubcecon/tutorials/blob/master/vscode.md#general-packages-and-setup
- Clone the repo by either:
- Clicking on the
Code
thenOpen in GitHub Desktop
. - Alternatively, you can go
] dev https://github.com/HighDimensionalEconLab/VarianceComponentsHDFE.jl
in a Julia REPL and it will clone it to the.julia/dev
folder. - If you wanted to, you could then drag that folder back into github desktop.
- Clicking on the
- Open it in vscode by right-clicking on the folder it installed to, and then opening a vscode project.
- Open the Julia repl in vscode (
Ctrl-Shift-P
and then goJulia REPL
or something to find it. - type
] instantiate
to install all of the packages. Get coffee. - In the REPL run
] test
and it should do the full unit test.
Functions in this package
Main Function
VarianceComponentsHDFE.leave_out_KSS
— Methodleave_out_KSS(y, first_id, second_id; controls, do_lincom, Z_lincom, lincom_labels, settings)
Returns a tuple with the observation number of the original dataset that belongs to the Leave-out connected set as described in Kline,Saggio, Solvesten. It also provides the corresponding outcome and identifiers in this connected set.
Arguments
y
: outcome vectorfirst_id
: first identifier (e.g. worker id)second_id
: second identifier (e.g. firm id)controls
: covariates that will be partialled out from outcome before it performs KSS.do_lincom
: boolean that indicates whether it runs inference.Z_lincom
: matrix of covariates to be used in lincom inference.lincom_labels
: vector of labels of the columns of Z_lincom.settings
: settings based onVCHDFESettings
controls
: at this version onlycontrols=nothing
is supported.
Auxiliary Functions
VarianceComponentsHDFE.find_connected_set
— Methodfind_connected_set(y, first_idvar, second_idvar, settings)
Returns a tuple of observation belonging to the largest connected set with the corresponding identifiers and outcomes. This requires to have the data sorted by first identifier, and time period (e.g. we sort by worked id and year). This is also the set where we can run AKM models with the original data.
Arguments
y
: outcome (e.g. log wage)first_id
: first identifier (e.g. worker id)second_id
: second identifier (e.g. firm id)settings
: settings based on data typeVCHDFESettings
. Please see the reference provided below.
VarianceComponentsHDFE.get_leave_one_out_set
— Methodget_leave_one_out_set(y, first_id, second_id, settings, controls)
Returns a tuple with the observation number of the original dataset that belongs to the Leave-out connected set as described in Kline,Saggio, Solvesten. It also provides the corresponding outcome and identifiers in this connected set.
Arguments
y
: outcome vectorfirst_id
: first identifier (e.g. worker id)second_id
: second identifier (e.g. firm id)settings
: settings based onVCHDFESettings
controls
: at this version onlycontrols=nothing
is supported.
VarianceComponentsHDFE.leave_out_estimation
— Methodleave_out_estimation(y, first_id, second_id, controls, settings)
Returns the bias-corrected components, the vector of coefficients, the corresponding fixed effects for every observation, and the diagonal matrices containing the Pii and Biis.
Arguments
y
: outcome vectorfirst_id
: first identifier (e.g. worker id)second_id
: second identifier (e.g. firm id)settings
: settings based onVCHDFESettings
controls
: matrix of control variables. At this version it doesn't work properly for very large datasets.
VarianceComponentsHDFE.compute_movers
— Methodcompute_movers(first_id, second_id)
Returns a vector that indicates whether the first_id
(e.g. worker) is a mover across second_id
(e.g. firms), as well as a vector with the number of periods that each first_id
appears.
Arguments
first_id
: first identifier (e.g. worker id)second_id
: second identifier (e.g. firm id)
VarianceComponentsHDFE.compute_matchid
— Methodcompute_matchid(second_id, first_id)
Computes a match identifier for every combination of first and second identifier. For example, this can be the match identifier of worker-firm combinations.
Arguments
first_id
: first identifier (e.g. worker id)second_id
: second identifier (e.g. firm id)
VarianceComponentsHDFE.lincom_KSS
— Methodlincom_KSS(y, X, Z, Transform, sigma_i; lincom_labels)
This function regresses fixed effects based onto some observables. See appendix in KSS for more information.
Arguments
y
: outcome variable.X
: the design matrix in the linear model.Z
: matrix of observables to use in regression.Transform
: matrix to compute fixed effects (e.g. Transform = [0 F] recovers second fixed effects).sigma_i
: estimate of the unbiased variance of observation i.lincom_labels
: labels of the columns of Z.settings
: settings based on data typeVCHDFESettings
. Please see the reference provided below.
Datatypes in this package
VarianceComponentsHDFE.ExactAlgorithm
— Typestruct ExactAlgorithm <: AbstractLeverageAlgorithm
Data type to pass to VCHDFESettings type, to indicate Exact algorithm
VarianceComponentsHDFE.JLAAlgorithm
— Typestruct JLAAlgorithm <: AbstractLeverageAlgorithm
Data type to pass to VCHDFESettings type, to indicate JLA algorithm
Fields
num_simulations
: number of simulations in estimation. If num_simulations = 0, defaults to 100 * log(#total fixed effect)"
VarianceComponentsHDFE.VCHDFESettings
— Typestruct VCHDFESettings{LeverageAlgorithm}
The VCHDFESettings type is to pass information to methods regarding which algorithm to use.
Fields
cg_maxiter
: maximum number of iterations (default = 300)leave_out_level
: leave-out level (default = match)leverage_algorithm
: which type of algorithm to use (default = JLAAlgorithm())first_id_effects
: includes first id effects. At this version it is required to include the firstideffects. (default = true)cov_effects
: includes covariance of first-second id effects. At this version it is required to include the cov_effects. (default = true)print_level
: prints the state of the program in std output. If print_level = 0, the app prints nothing in the std output. (default = 1)first_id_display_small
: name of the first id in lower cases (default = person)first_id_display
: name of the first id (default = Person)second_id_display_small
: name of the second id in lower cases (default = firm)second_id_display
: name of the second id (default = Firm)outcome_id_display_small
: name of the observation id in lower cases (default = wage)outcome_id_display
: name of the observation id (default = Wage)
Typical Julia Workflow
#Load the required packages
using VarianceComponentsHDFE, DataFrames, CSV, SparseArrays
#Load dataset
data = DataFrame(CSV.File("test.csv"; header=false))
#Extract vectors of outcome, workerid, firmid
id = data[:,1]
firmid = data[:,2]
year = data[:, 3]
y = data[:,4]
#You can define the settings using our structures
JL = JLAAlgorithm(num_simulations = 300)
mysettings = VCHDFESettings(leverage_algorithm = JL, first_id_effects=true, cov_effects=true)
#Run KSS with no controls
θ_first, θ_second, θCOV = leave_out_KSS(y,id,firmid)
#Create some controls and run the routine where we partial out them
controls = indexin(year,unique(sort(year)))
controls = sparse(collect(1:size(y,1)), controls, 1, size(y,1), maximum(controls))
controls = controls[:,1:end-1]
θ_first, θ_second, θCOV = leave_out_KSS(y,id,firmid; controls)
#Perform Lincom Inference using a Region Dummy
data = DataFrame!(CSV.File("lincom.csv"; header=false))
id = data[:,1]
firmid = data[:,2]
y = data[:,5]
region = data[:,4]
region[findall(region.==-1)].=0
θ_first, θ_second, θCOV = leave_out_KSS(y,id,firmid; do_lincom = true , Z_lincom = region, lincom_labels = ["Region Dummmy"] )