On Mon, 5 Dec 2016, Allin Cottrell wrote:
On Mon, 5 Dec 2016, Riccardo (Jack) Lucchetti wrote:
> On Mon, 5 Dec 2016, Sven Schreiber wrote:
>
>
>>> The way I see it, the series z you're generating in the
"cdftest"
>>> function is not really normally distributed. Rather, is constructed in a
>>> way such that its frequency distribution resembles a Gaussian density,
>>> which wouldn't be guaranteed if data were truly normal. In other words,
>>> your normals are "too good to be true"; hence, your p-values are
mostly
>>> very close to 1.
>>
>> Jack, I know you must mean something else than what you've written -- the
>> data's density "too" Gaussian to be Gaussian??
>
> That's _exactly_ what I meant. The histogram of the values generated by
> Allin's script are so neatly aligned along the Gaussian density that the
> p-values of any normality tests are much more often close to 1 than they
> would be if the data were truly Gaussian. You can see this in a different
> perspective as a lack of independence between observations.
Yes, this seems to be an artifact coming from the fact that the vector of
cumulated sorted relative white-ball frequencies is, so to speak,
"super-uniform" -- much more uniform than a set of draws from U[0,1].
So I guess my question at this point is how to (or whether it's even possible
to) map from a set of counts produced by equi-probable draws to U[0,1]. Note
that the counts themselves will surely not be uniform: counts in the
neighborhood of the expected value should be more numerous than extreme
values.
I'd guess that since _a priori_ each number has a fixed chance of being in
the set of the white balls (p = 5/69), each member of the "White" series
may be regarded as a sum of 605 independent draws from a Bernoulli rv with
parameter p. Therefore, the "White" series should contain draws from a
Binomial distribution with n = 605 and probability p. This will be nearly
indostinguishable from a Normal rv.
The only problem is that the elements of "White" are not independent,
since their sum MUST equal 5 * 605, so that complicates things a little.
In practice, you'd use something like this:
<hansl>
function matrix pball(scalar n, scalar k)
X = seq(1,n)' ~ muniform(n,1)
X = msortby(X,2)
return msortby(X[1:5,1],1)
end function
# ... set things up ...
loop i=1..K -q
White = 0
loop N -q
pb = pball($nobs, k)
loop j = 1..k --quiet
White[pb[j]] += 1
endloop
endloop
PV[i,1] = X2test({White})
normtest White --quiet
PV[i,2] = $pvalue
normtest White --quiet --swilk
PV[i,3] = $pvalue
endloop
</hansl>
-------------------------------------------------------
Riccardo (Jack) Lucchetti
Dipartimento di Scienze Economiche e Sociali (DiSES)
Università Politecnica delle Marche
(formerly known as Università di Ancona)
r.lucchetti(a)univpm.it
http://www2.econ.univpm.it/servizi/hpp/lucchetti
-------------------------------------------------------