OK, this is not really gretl-specific, but hopefully it might be of
interest to some. A colleague recently asked me what would be the
best way of testing the fairness of the US Powerball lottery, and he
suggested as candidates a chi-square test or translation via the
inverse normal cdf followed by a test for normality. You can find
historical data at
http://www.powerball.com/powerball/pb_frequency.asp ; I looked
at the frequency of "White balls" for each ball number, 1 to 69.
I simulated the two tests (well, three, since I used two variants of
the normality test) on the null hypothesis of fairness (each of the
69 balls has an equal chance of being selected on each drawing), and
got the following statistics for their p-values ("X2" = Pearson
chi-square, "DH" = Doornik-Hansen normality test, "SW" =
Shapiro-Wilk) based on 5000 replications:
Mean Std. Dev. Minimum Maximum
X2pv 0.50262 0.28698 4.81e-05 0.99914
DHpv 0.92881 0.04055 0.71212 0.99992
SWpv 0.99924 0.00141 0.97667 1.00000
It seems that under the null the p-value ought to be distributed
uniformly on (0,1). That appears to the case for the chi-square
test, but not at all for the two tests that employ the inverse
normal transformation.
Question: Am I suffering from a conceptual confusion, have I coded
the simulation wrongly (below), or what?
<hansl>
set echo off
set messages off
function scalar X2test (const matrix O)
scalar n = rows(O)
scalar N = sumc(O)
scalar E = N/n
scalar X2 = sumc(((O - E).^2) / E)
return pvalue(X, n-1, X2)
end function
function matrix cdftest (series count)
matrix ret = zeros(1, 2)
series relf = count/sum(count)
series EDF = cum(sort(relf))
# skip the last value (1.0) ?
smpl ; -1
series z = invcdf(N, EDF)
normtest z --quiet
ret[1] = $pvalue
normtest z --quiet --swilk
ret[2] = $pvalue
return ret
end function
# simulate a total of N draws for balls numbered 1 to 69,
# where N is chosen to roughly simulate the counts for
# White Balls shown on the powerball website
nulldata 69
series White
scalar N = 605 # draws
scalar K = 5000 # replications
matrix PV = zeros(K, 3)
loop i=1..K -q
White = 0
loop N -q
j = randint(1, 69)
White[j] += 1
endloop
PV[i,1] = X2test({White})
PV[i,2:3] = cdftest(White)
endloop
colnames(PV, "X2pv DHpv SWpv")
summary --matrix=PV --simple
</hansl>
Allin