On Tue, 28 Nov 2017, Sven Schreiber wrote:
Hi,
following some off-list discussion, I took a closer look at the 'difftest'
command. Here are some issues, I think some might be (small) bugs, others
feature requests.
Let's start with the easy stuff, feature requests :-)
1) Can we have a --quiet or --silent switch to access only the $test and
$pvalue accessors in a script?
OK, you now have "diffest ... --quiet" in git.
2) Can somebody later edit Wikipedia to tell the world that gretl has
the
Wilcoxon test, too?
Good idea, though we might want to refine our implementation first.
3) Could the parametric t-test of differing means in (paired) samples
be
subsumed under difftest? (Currently there is no direct scripting way of
performing it, I think.)
lib/src/nonparam.c: have at it!
OK, now other issues:
Consider first the corner case of all differences zero:
<hansl>
nulldata 10
series x = seq(1,10)
series y = x
difftest x y --signed-rank
</hansl>
which gives a somewhat "Missing values encountered" error.
"somewhat misleading", I suppose. That's now fixed in git.
Next, for a reduced sample n=5 (without ties this time) gretl reports
"Sample
too small for statistical significance". However, according to this page
linked from Wikipedia
http://vassarstats.net/textbook/ch12a.html for n=5
there is at least a 5% one-sided critical value (check the last table at the
bottom).
Furthermore, for small samples n=5..8, gretl appears to calculate the
ingredients to what is called "original test" in the Wikipedia entry, namely
"two sums of ranks of given sign", denoted by gretl with W+ and W-. However,
the actual test statistic (according to
https://en.wikipedia.org/wiki/Wilcoxon_signed-rank_test) is "the smaller of"
those. But gretl never calculates that minimum-statistic explicitly, and
never records it in $test. Here's an example:
<hansl>
nulldata 10
series x = seq(1,10)
series y = -x
smpl 1 8
difftest x y --signed-rank
eval $test # gives NA wrongly
eval $pvalue # understandable that this is NA
</hansl>
We're looking for a z-statistic, and that's NA for the given sample
size. If anyone wishes to pursue tiny-sample statistics in this
area, that's fine by me.
Allin