On Thu, 31 Dec 2009, Gordon Hughes wrote:
A word of warning about running the BigCrush test. Looking through
the results of the first run I noticed that some of the tests
generate test statistics that would reject the relevant hypothesis at
the 1% confidence level, even though all of the tests were reported
as having passed (since the criterion is p in the range
[0.001,0.999]). Since we are dealing with a random number generator
it is possible that one run may lead to no failures but another may
generate a number of failures.
If all is well, one expects to see p-values randomly distributed
on (0, 1): if you didn't get some values < .01 on a large set of
tests that would itself be suspicious.
It would require a modified test program, but in case one gets
p-values that look "too small" on some tests, it's possible to
rerun those tests in particular rather than redoing the whole
thing.
Hence I ran the same test a second time. The execution times were
very similar (29h 40m vs 29h 41m). The second run reported a single
failure - Test 89 PeriodsInStrings , r = 20 with a p-value of
6.4e-4.
Personally I wouldn't be too worried about that -- it looks
within the bounds of what you might expect. P-values of less
than, say, 10^{-6} would be problematic. The failures that
Doornik talks about are values < 10^{-300} and I've seen nothing
like that with our code.
Sven reported that the updated version of the ziggurat executes
substantially faster than the earlier version. The early tests
in the BigCrush suite give a different picture - the execution
times are all slightly longer using the new gretl_one_snormal
than using the previous ran_normal_ziggurat.
As Sven said later, this is expected. The new code is
substantially faster than what we had before we started down the
Ziggurat road; but it's necessarily a little slower than the Voss
code, which uses 24-bit values.
Allin.