Tutorial¶
We now illustrate the basic capabilities of the respy
package. We start with the model specification and then turn to some example use cases.
Model Specification¶
The model is specified in an initialization file. For an example, check out the first parameterization analyzed in Keane and Wolpin (1994) here. Let us discuss each of its elements in more detail.
BASICS
Key | Value | Interpretation |
---|---|---|
periods | int | number of periods |
delta | float | discount factor |
Warning
There are two small differences compared to Keane and Wolpin (1994). First, all coefficients enter the return function with a positive sign, while the squared terms enter with a minus in the original paper. Second, the order of covariates is fixed across the two occupations. In the original paper, own experience always comes before other experience.
OCCUPATION A
Key | Value | Interpretation |
---|---|---|
coeff | float | intercept |
coeff | float | return to schooling |
coeff | float | experience Occupation A, linear |
coeff | float | experience Occupation A, squared |
coeff | float | experience Occupation B, linear |
coeff | float | experience Occupation B, squared |
OCCUPATION B
Key | Value | Interpretation |
---|---|---|
coeff | float | intercept |
coeff | float | return to schooling |
coeff | float | experience Occupation A, linear |
coeff | float | experience Occupation A, squared |
coeff | float | experience Occupation B, linear |
coeff | float | experience Occupation B, squared |
EDUCATION
Key | Value | Interpretation |
---|---|---|
coeff | float | consumption value |
coeff | float | tuition cost |
coeff | float | adjustment cost |
max | int | maximum level of schooling |
start | int | initial level of schooling |
Warning
Again, there is a small difference between this setup and Keane and Wolpin (1994). There is no automatic change in sign for the tuition and adjustment costs. Thus, a $1,000 tuition cost must be specified as -1000.
HOME
Key | Value | Interpretation |
---|---|---|
coeff | float | mean value of non-market alternative |
SHOCKS
Key | Value | Interpretation |
---|---|---|
coeff | float | \(\sigma_{1}\) |
coeff | float | \(\sigma_{12}\) |
coeff | float | \(\sigma_{13}\) |
coeff | float | \(\sigma_{14}\) |
coeff | float | \(\sigma_{2}\) |
coeff | float | \(\sigma_{23}\) |
coeff | float | \(\sigma_{24}\) |
coeff | float | \(\sigma_{3}\) |
coeff | float | \(\sigma_{34}\) |
coeff | float | \(\sigma_{4}\) |
SOLUTION
Key | Value | Interpretation |
---|---|---|
draws | int | number of draws for \(E\max\) |
store | bool | persistent storage of results |
seed | int | random seed for \(E\max\) |
SIMULATION
Key | Value | Interpretation |
---|---|---|
file | str | file to print simulated sample |
agents | int | number of simulated agents |
seed | int | random seed for agent experience |
ESTIMATION
Key | Value | Interpretation |
---|---|---|
file | str | file to read observed sample |
tau | float | scale parameter for function smoothing |
agents | int | number of agents to read from sample |
draws | int | number of draws for choice probabilities |
maxfun | int | maximum number of function evaluations |
seed | int | random seed for choice probability |
optimizer | str | optimizer to use |
PROGRAM
Key | Value | Interpretation |
---|---|---|
debug | bool | debug mode |
version | str | program version |
PARALLELISM
Key | Value | Interpretation |
---|---|---|
flag | bool | parallel executable |
procs | int | number of processors |
INTERPOLATION
Key | Value | Interpretation |
---|---|---|
points | int | number of interpolation points |
flag | bool | flag to use interpolation |
DERIVATIVES
Key | Value | Interpretation |
---|---|---|
version | str | approximation scheme |
eps | float | step size |
SCALING
Key | Value | Interpretation |
---|---|---|
flag | bool | apply scaling to parameters |
minimum | float | minimum value for gradient approximation |
The implemented optimization algorithms vary with the program’s version. If you request the Python version of the program, you can choose from the scipy
implementations of the BFGS (Norcedal and Wright, 2006) and POWELL (Powell, 1964) algorithm. Their implementation details are available here. For Fortran, we implemented the BFGS and NEWUOA (Powell, 2004) algorithms.
SCIPY-BFGS
Key | Value | Interpretation |
---|---|---|
gtol | float | gradient norm must be less than gtol before successful termination |
maxiter | int | maximum number of iterations |
SCIPY-POWELL
Key | Value | Interpretation |
---|---|---|
maxfun | int | maximum number of function evaluations to make |
ftol | float | relative error in func(xopt) acceptable for convergence |
xtol | float | line-search error tolerance |
SCIPY-LBFGSB
Key | Value | Interpretation |
---|---|---|
eps | float | Step size used when approx_grad is True, for numerically calculating the gradient |
factr | float | Multiple of the default machine precision used to determine the relative error in func(xopt) acceptable for convergence |
m | int | Maximum number of variable metric corrections used to define the limited memory matrix. |
maxiter | int | maximum number of iterations |
maxls | int | Maximum number of line search steps (per iteration). Default is 20. |
pgtol | float | gradient norm must be less than gtol before successful termination |
FORT-BFGS
Key | Value | Interpretation |
---|---|---|
gtol | float | gradient norm must be less than gtol before successful termination |
maxiter | int | maximum number of iterations |
FORT-NEWUOA
Key | Value | Interpretation |
---|---|---|
maxfun | float | maximum number of function evaluations |
npt | int | number of points for approximation model |
rhobeg | float | starting value for size of trust region |
rhoend | float | minimum value of size for trust region |
FORT-BOBYQA
Key | Value | Interpretation |
---|---|---|
maxfun | float | maximum number of function evaluations |
npt | int | number of points for approximation model |
rhobeg | float | starting value for size of trust region |
rhoend | float | minimum value of size for trust region |
Constraints for the Optimizer¶
If you want to keep any parameter fixed at the value you specified (i.e. not estimate this parameter) you can simply add an exclamation mark after the value. If you want to provide bounds for a constrained optimizer you can specify a lower and upper bound in round brackets. A section of such an .ini file would look as follows:
coeff -0.049538516229344
coeff 0.020000000000000 !
coeff -0.037283956168153 (-0.5807488086366478,None)
coeff 0.036340835226155 ! (None,0.661243603948984)
In this example, the first coefficient is free. The second one is fixed at 0.2. The third one will be estimated but has a lower bound. In the fourth case, the parameter is fixed and the bounds will be ignored.
If you specify bounds for any free parameter, you have to choose a constraint optimizer such as SCIPY-LBFGSB or FORT-BOBYQA.
Dataset¶
To use respy, you need a dataset with the following columns:
- Identifier: identifies the different individuals in the sample
- Period: identifies the different rounds of observation for each individual
- Choice: an integer variable that indicates the labor market choice
- 1 = Occupation A
- 2 = Occupation B
- 3 = Education
- 4 = Home
- Earnings: a float variable that indicates how much people are earning. This variable is missing (indicated by a dot) if individuals don’t work.
- Experience_A: labor market experience in sector A
- Experience_B: labor market experience in sector B
- Years_Schooling: years of schooling
- Lagged_Choice: choice in the period before the model starts. Codes are the same as in Choice.
Datasets for respy are stored in simple text files, where columns are separated by spaces. The easiest way to write such a text file in Python is to create a pandas DataFrame with all relevant columns and then storing it in the following way:
with open('my_data.respy.dat', 'w') as file:
df.to_string(file, index=False, header=True, na_rep='.')
Examples¶
Let us explore the basic capabilities of the respy
package with a couple of examples. All the material is available online.
Simulation and Estimation
We always first initialize an instance of the RespyCls
by passing in the path to the initialization file.
from respy import RespyCls
respy_obj = RespyCls('example.ini')
Now we can simulate a sample from the specified model.
respy_obj.simulate()
During the simulation, several files will appear in the current working directory. sol.respy.log
allows to monitor the progress of the solution algorithm, while sim.respy.log
records the progress of the simulation. The simulated dataset with the agents’ choices and state experiences is stored in data.respy.dat
, data.respy.info
provides some basic descriptives about the simulated dataset. See our section on Additional Details for more information regarding the output files.
Now that we simulated some data, we can start an estimation. Here we are using the simulated data for the estimation. However, you can of course also use other data sources. Just make sure they follow the layout of the simulated sample. The coefficient values in the initialization file serve as the starting values.
x, crit_val = respy_obj.fit()
This directly returns the value of the coefficients at the final step of the optimizer as well as the value of the criterion function. However, some additional files appear in the meantime. Monitoring the estimation is best done using est.respy.info
and more details about each evaluation of the criterion function are available in est.respy.log
.
We can now simulate a sample using the estimated parameters by updating the instance of the RespyCls
.
respy_obj.update_model_paras(x)
respy_obj.simulate()
Recomputing Keane and Wolpin (1994)
Just using the capabilities outlined so far, it is straightforward to recompute some of the key results in the original paper with a simple script.
#!/usr/bin/env python
""" This module recomputes some of the key results of Keane and Wolpin (1994).
"""
from respy import RespyCls
# We can simply iterate over the different model specifications outlined in
# Table 1 of their paper.
for spec in ['kw_data_one.ini', 'kw_data_two.ini', 'kw_data_three.ini']:
# Process relevant model initialization file
respy_obj = RespyCls(spec)
# Let us simulate the datasets discussed on the page 658.
respy_obj.simulate()
# To start estimations for the Monte Carlo exercises. For now, we just
# evaluate the model at the starting values, i.e. maxfun set to zero in
# the initialization file.
respy_obj.unlock()
respy_obj.set_attr('maxfun', 0)
respy_obj.lock()
respy_obj.fit()
In an earlier working paper, Keane and Wolpin (1994b) provide a full account of the choice distributions for all three specifications. The results from the recomputation line up well with their reports.