Dear connoisseurs of gretl,
I am currently stuck with an issue that can be described as “premature optimisation” (the root of all evil, as we know).
I
am trying to evaluate the distribution of Dickey---Fuller's $T(\hat
\alpha-1)$ statistic in the model $x_t = \alpha x_{t-1} + \varepsilon$
with the unit root $\alpha=1$ to the highest precision. My goal is to
run several million regressions and save the column of $\alpha$'s in a
separate file which is to be processed in other software (the same is
about to be done later for Durbin---Watson, other Dickey---Fuller's
distributions etc.). Since my goal is >10m iterations, every second
is crucial for me. At first, I wrote the following code (10k iterations
without the loss of ostensiveness):
set stopwatch
nulldata 10000
scalar iterations=10000
loop for (i=0; i<iterations; i+=1) --progressive --quiet
smpl --full
series eps=normal()
series x=0
series x=x(-1)+eps
series xlag=x(-1)
smpl 3 10000
ols x xlag
scalar ahat=$coeff(xlag)
scalar DFT=$T*(ahat-1)
store df.csv DFT --no-header
endloop
printf "Time taken: %f seconds\n", $stopwatch
Note: the sample is restricted since x[1]=0 and xlag[2]=0, we do not need those meaningless numbers during the estimation.
The
average time for my PC was 27.6 seconds. However, I thought that the
*ols* command invoked all sorts of sideway calculations (residuals, t
ratios, R squared, criteria etc.); thus I decided to bypass possible
unnecessary tricks, obtaining the $\hat\alpha$ by hand:
set stopwatch
nulldata 10000
scalar iterations=10000
loop for (i=0; i<iterations; i+=1) --progressive --quiet
smpl --full
series eps=normal()
series x=0
series x=x(-1)+eps
series xlag=x(-1)
smpl 3 10000
scalar DFT=9998*(cov(x,xlag)/var(xlag)-1)
store df.csv DFT --no-header
endloop
printf "Time taken: %f seconds\n", $stopwatch
Indeed
there was an improvement in time (now it is 25.2 on average).
Nevertheless, the happiness was alloyed by the seeming impurity of the
result since by definition, the true coefficient in the ratio of two
sums, not just covariance divided by variance (due to finite samples and
possible non-zero mean). So I took the liberty of evaluating the sums
manually:
set stopwatch
nulldata 10000
scalar iterations=10000
loop for (i=0; i<iterations; i+=1) --progressive --quiet
smpl --full
series eps=normal()
series x=0
series x=x(-1)+eps
series xlag=x(-1)
series xxlag=x*xlag
series xlag_sq=xlag^2
smpl 3 10000
scalar DFT=9998*(sum(xxlag)/sum(xlag_sq)-1)
store df.csv DFT --no-header
endloop
printf "Time taken: %f seconds\n", $stopwatch
Much
to my regret, the result was very disappointing, 33.0 seconds on
average. Why does the optimisation turn out to be harmful in this case?
What else can be done in order to reduce running time without the loss
of precision or consistency?
In addition, could you make a slight
correction to the manual, please? The commands *cov* and *corr* are
expecting a comma in their input, but it is not mentioned in the command
reference and the awakening comes through the error message “Expected
',' but found ...” in the output. Thank you in advance!
Yours faithfully,
Andreï V. Kostyrka
Department of Mathematical Economics and Econometrics
Higher School of Economics
Moscow, Russia