Actually (sorry, but) not really so interesting, or queer. The sample size is basically irrelevant. What matters is how many replications you run at a given sample size. Consider the following:On Sat, 10 Oct 2015, Clive Nicholas wrote:
On 9 October 2015 at 18:24, Allin Cottrell <cottrell@wfu.edu> wrote:
[...]
For the record, then, let's point out that the two basic approaches to
heteroskedasticity in gretl -- namely, switching to "robust" standard
errors, or switching from OLS to GLS via the "hsk" command -- do not
require taking logs of negative numbers. The following script illustrates.
The series y and x contain both positive and negative values, and the
data-generating process is heteroskedastic by construction.
<hansl>
nulldata 50
set seed 3711
series x = normal()
# generate heteroskedastic y
series y = -1 + 3*x + normal()*x
# verify we have negative values in both y and x
print y x --byobs
# run OLS
ols y 0 x
# try robust standard errors: no problem
ols y 0 x --robust
# try GLS: again, no problem
hsk y 0 x
</hansl>
In this case the "hsk" command produces a closer approximation to the true
x-slope of 3.0 (2.997, versus 3.098 from OLS), although obviously one would
have to replicate the example a large number of times to verify that (as
theory says) the hsk estimates are more efficient, given heteroskedasticity.
[...]
Very interesting things happen when you keep increasing the simulation
sample size by factors of 10.
At N=500, the OLS estimate is closer to the true value of X (3.01) than the
GLS estimate (3.06). At N=5000, they were pretty much identical, yet
further away from X than before (2.97). At N=50000, they both tended back
to the true value at 2.99.
As Arthur Atkinson without his washboard would have said, "How queer!"
<hansl>
nulldata 50
set seed 3711
series x = normal()
# generate heteroskedastic y
series y = -1 + 3*x + normal()*x
# verify we have negative values in both y and x
print y x --byobs
# run OLS
ols y 0 x
# try robust standard errors: no problem
ols y 0 x --robust
# try GLS: again, no problem
hsk y 0 x
scalar N=5000
matrix B1 = zeros(N, 2)
loop i=1..N -q
series y = -1 + 3*x + normal()*x
ols y 0 x --quiet
B1[i,1] = $coeff[2]
hsk y 0 x --quiet
B1[i,2] = $coeff[2]
endloop
eval meanc(B1)
eval sdc(B1)
</hansl>
This produces (the tail of the output):
? eval meanc(B1)
3.0002 2.9993
? eval sdc(B1)
0.26747 0.19028
Since heteroskedasticity does not bias the coefficient estimates it is unsurprising that the means of both columns are very close to 3.0 (the known "true" slope). But heteroskedasticity makes OLS inefficient compared to GLS, and that is amply confirmed by the substantially larger standard deviation of the estimated slopes under OLS (column 1 of matrix B1) relative to GLS (column 2).
(Reckoned worth replying since this sort of thing shows off the ease of doing Monte Carlo in hansl.)
Allin Cottrell
_______________________________________________
Gretl-users mailing list
Gretl-users@lists.wfu.edu
http://lists.wfu.edu/mailman/listinfo/gretl-users