On Sat, 10 Oct 2015, Clive Nicholas wrote:
On 9 October 2015 at 18:24, Allin Cottrell <cottrell(a)wfu.edu>
wrote:
[...]
For the record, then, let's point out that the two basic approaches to
> heteroskedasticity in gretl -- namely, switching to "robust" standard
> errors, or switching from OLS to GLS via the "hsk" command -- do not
> require taking logs of negative numbers. The following script illustrates.
> The series y and x contain both positive and negative values, and the
> data-generating process is heteroskedastic by construction.
>
> <hansl>
> nulldata 50
> set seed 3711
> series x = normal()
> # generate heteroskedastic y
> series y = -1 + 3*x + normal()*x
> # verify we have negative values in both y and x
> print y x --byobs
> # run OLS
> ols y 0 x
> # try robust standard errors: no problem
> ols y 0 x --robust
> # try GLS: again, no problem
> hsk y 0 x
> </hansl>
>
> In this case the "hsk" command produces a closer approximation to the true
> x-slope of 3.0 (2.997, versus 3.098 from OLS), although obviously one would
> have to replicate the example a large number of times to verify that (as
> theory says) the hsk estimates are more efficient, given heteroskedasticity.
> [...]
Very interesting things happen when you keep increasing the simulation
sample size by factors of 10.
At N=500, the OLS estimate is closer to the true value of X (3.01) than the
GLS estimate (3.06). At N=5000, they were pretty much identical, yet
further away from X than before (2.97). At N=50000, they both tended back
to the true value at 2.99.
As Arthur Atkinson without his washboard would have said, "How queer!"
Actually (sorry, but) not really so interesting, or queer. The
sample size is basically irrelevant. What matters is how many
replications you run at a given sample size. Consider the following:
<hansl>
nulldata 50
set seed 3711
series x = normal()
# generate heteroskedastic y
series y = -1 + 3*x + normal()*x
# verify we have negative values in both y and x
print y x --byobs
# run OLS
ols y 0 x
# try robust standard errors: no problem
ols y 0 x --robust
# try GLS: again, no problem
hsk y 0 x
scalar N=5000
matrix B1 = zeros(N, 2)
loop i=1..N -q
series y = -1 + 3*x + normal()*x
ols y 0 x --quiet
B1[i,1] = $coeff[2]
hsk y 0 x --quiet
B1[i,2] = $coeff[2]
endloop
eval meanc(B1)
eval sdc(B1)
</hansl>
This produces (the tail of the output):
? eval meanc(B1)
3.0002 2.9993
? eval sdc(B1)
0.26747 0.19028
Since heteroskedasticity does not bias the coefficient estimates it
is unsurprising that the means of both columns are very close to 3.0
(the known "true" slope). But heteroskedasticity makes OLS
inefficient compared to GLS, and that is amply confirmed by the
substantially larger standard deviation of the estimated slopes
under OLS (column 1 of matrix B1) relative to GLS (column 2).
(Reckoned worth replying since this sort of thing shows off the ease
of doing Monte Carlo in hansl.)
Allin Cottrell