Dear connoisseurs of gretl,

I am currently stuck with an issue that can be described as “premature optimisation” (the root of all evil, as we know).

I am trying to evaluate the distribution of Dickey---Fuller's $T(\hat \alpha-1)$ statistic in the model $x_t = \alpha x_{t-1} + \varepsilon$ with the unit root $\alpha=1$ to the highest precision. My goal is to run several million regressions and save the column of $\alpha$'s in a separate file which is to be processed in other software (the same is about to be done later for Durbin---Watson, other Dickey---Fuller's distributions etc.). Since my goal is >10m iterations, every second is crucial for me. At first, I wrote the following code (10k iterations without the loss of ostensiveness):

set stopwatch
nulldata 10000
scalar iterations=10000
loop for (i=0; i<iterations; i+=1) --progressive --quiet
    smpl --full
    series eps=normal()
    series x=0
    series x=x(-1)+eps
    series xlag=x(-1)
    smpl 3 10000
    ols x xlag
    scalar ahat=$coeff(xlag)
    scalar DFT=$T*(ahat-1)
    store df.csv DFT --no-header
endloop
printf "Time taken: %f seconds\n", $stopwatch

Note: the sample is restricted since x[1]=0 and xlag[2]=0, we do not need those meaningless numbers during the estimation.
The average time for my PC was 27.6 seconds. However, I thought that the *ols* command invoked all sorts of sideway calculations (residuals, t ratios, R squared, criteria etc.); thus I decided to bypass possible unnecessary tricks, obtaining the $\hat\alpha$ by hand:

set stopwatch
nulldata 10000
scalar iterations=10000
loop for (i=0; i<iterations; i+=1) --progressive --quiet
    smpl --full
    series eps=normal()
    series x=0
    series x=x(-1)+eps
    series xlag=x(-1)
    smpl 3 10000
    scalar DFT=9998*(cov(x,xlag)/var(xlag)-1)
    store df.csv DFT --no-header
endloop
printf "Time taken: %f seconds\n", $stopwatch

Indeed there was an improvement in time (now it is 25.2 on average). Nevertheless, the happiness was alloyed by the seeming impurity of the result since by definition, the true coefficient in the ratio of two sums, not just covariance divided by variance (due to finite samples and possible non-zero mean). So I took the liberty of evaluating the sums manually:

set stopwatch
nulldata 10000
scalar iterations=10000
loop for (i=0; i<iterations; i+=1) --progressive --quiet
    smpl --full
    series eps=normal()
    series x=0
    series x=x(-1)+eps
    series xlag=x(-1)
    series xxlag=x*xlag
    series xlag_sq=xlag^2
    smpl 3 10000
    scalar DFT=9998*(sum(xxlag)/sum(xlag_sq)-1)
    store df.csv DFT --no-header
endloop
printf "Time taken: %f seconds\n", $stopwatch

Much to my regret, the result was very disappointing, 33.0 seconds on average. Why does the optimisation turn out to be harmful in this case? What else can be done in order to reduce running time without the loss of precision or consistency?

In addition, could you make a slight correction to the manual, please? The commands *cov* and *corr* are expecting a comma in their input, but it is not mentioned in the command reference and the awakening comes through the error message “Expected ',' but found ...” in the output. Thank you in advance!

Yours faithfully,
Andreï V. Kostyrka
Department of Mathematical Economics and Econometrics
Higher School of Economics
Moscow, Russia