On Mon, 5 Dec 2016, Riccardo (Jack) Lucchetti wrote:
On Mon, 5 Dec 2016, Sven Schreiber wrote:
>> The way I see it, the series z you're generating in the "cdftest"
>> function is not really normally distributed. Rather, is constructed in a
>> way such that its frequency distribution resembles a Gaussian density,
>> which wouldn't be guaranteed if data were truly normal. In other words,
>> your normals are "too good to be true"; hence, your p-values are
mostly
>> very close to 1.
>
> Jack, I know you must mean something else than what you've written -- the
> data's density "too" Gaussian to be Gaussian??
That's _exactly_ what I meant. The histogram of the values generated by
Allin's script are so neatly aligned along the Gaussian density that the
p-values of any normality tests are much more often close to 1 than they
would be if the data were truly Gaussian. You can see this in a different
perspective as a lack of independence between observations.
Yes, this seems to be an artifact coming from the fact that the vector
of cumulated sorted relative white-ball frequencies is, so to speak,
"super-uniform" -- much more uniform than a set of draws from U[0,1].
So I guess my question at this point is how to (or whether it's even
possible to) map from a set of counts produced by equi-probable draws
to U[0,1]. Note that the counts themselves will surely not be uniform:
counts in the neighborhood of the expected value should be more
numerous than extreme values.
Allin