powerball puzzle

Monday, 5 December 2016

OK, this is not really gretl-specific, but hopefully it might be of 
interest to some. A colleague recently asked me what would be the 
best way of testing the fairness of the US Powerball lottery, and he 
suggested as candidates a chi-square test or translation via the 
inverse normal cdf followed by a test for normality. You can find
historical data at
http://www.powerball.com/powerball/pb_frequency.asp ; I looked
at the frequency of "White balls" for each ball number, 1 to 69.

I simulated the two tests (well, three, since I used two variants of 
the normality test) on the null hypothesis of fairness (each of the 
69 balls has an equal chance of being selected on each drawing), and 
got the following statistics for their p-values ("X2" = Pearson 
chi-square, "DH" = Doornik-Hansen normality test, "SW" = 
Shapiro-Wilk) based on 5000 replications:

           Mean    Std. Dev.    Minimum     Maximum
X2pv   0.50262     0.28698     4.81e-05    0.99914
DHpv   0.92881     0.04055     0.71212     0.99992
SWpv   0.99924     0.00141     0.97667     1.00000

It seems that under the null the p-value ought to be distributed 
uniformly on (0,1). That appears to the case for the chi-square 
test, but not at all for the two tests that employ the inverse
normal transformation.

Question: Am I suffering from a conceptual confusion, have I coded 
the simulation wrongly (below), or what?

<hansl>
set echo off
set messages off

function scalar X2test (const matrix O)
   scalar n = rows(O)
   scalar N = sumc(O)
   scalar E = N/n
   scalar X2 = sumc(((O - E).^2) / E)
   return pvalue(X, n-1, X2)
end function

function matrix cdftest (series count)
   matrix ret = zeros(1, 2)
   series relf = count/sum(count)
   series EDF = cum(sort(relf))
   # skip the last value (1.0) ?
   smpl ; -1
   series z = invcdf(N, EDF)
   normtest z --quiet
   ret[1] = $pvalue
   normtest z --quiet --swilk
   ret[2] = $pvalue
   return ret
end function

# simulate a total of N draws for balls numbered 1 to 69,
# where N is chosen to roughly simulate the counts for
# White Balls shown on the powerball website

nulldata 69
series White
scalar N = 605    # draws
scalar K = 5000   # replications
matrix PV = zeros(K, 3)

loop i=1..K -q
   White = 0
   loop N -q
     j = randint(1, 69)
     White[j] += 1
   endloop
   PV[i,1] = X2test({White})
   PV[i,2:3] = cdftest(White)
endloop

colnames(PV, "X2pv DHpv SWpv")
summary --matrix=PV --simple
</hansl>

Allin

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004