On Tue, 2 Dec 2014, Sven Schreiber wrote:
Am 02.12.2014 um 17:29 schrieb Allin Cottrell:
> On Tue, 2 Dec 2014, Sven Schreiber wrote:
>
> I believe the results you get from these calls will depend on the C
> library. We should probably generate an error if you try to sort a
> matrix containing NaNs since the sort order is indeterminate.
It's clear that the position of nans is undefined after sorting, but I
still think that being able to sort is quite useful/important. Whether
the nans then show up in the beginning or end or in the middle is
irrelevant and the user's responsibility. So almost all of the functions
are ok IMO. Only msortby() does something very odd.
It's more complicated than that, and msortby is actually not doing
anything odd. What's odd, I gather, is the behavior of the Microsoft C
library in the cases you said came out "OK".
Our qsort callback for comparing doubles is a pretty standard trope:
given doubles a and b we return
(a > b) - (a < b)
Now if one or the other (or both) of a or b is NaN, both of these
comparisons should (one would think) return 0 -- a NaN is neither
greater than nor less than any given value. So the callback will
return 0, which happens to signal equality to qsort. It's true that a
NaN is not equal to any value (even itself), but the way our callback
is set up it should always appear to be equal to its comparator.
Therefore a NaN will never move in the sort, and it will create a
barrier such that sorting occurs only before and after. So the msortby
behavior is expected.
If you're seeing "OK" results from sorting functions other than
msortby. I guess your C library must be returning non-zero from a
greater/less comparison of a NaN and a non-NaN. But on Linux (glibc
2.20) this is what I'm getting:
<gretl>
? matrix in = {2; 1; NA; 0; -5}
Generated matrix in
Warning: generated non-finite values
? matrix check = sort(in) ~ dsort(in) ~ values(in) ~ uniq(in)
Generated matrix check
Warning: generated non-finite values
? print check
check (5 x 4)
1 2 1 2
2 1 2 1
nan nan nan nan
-5 0 -5 0
0 -5 0 -5
</gretl>
That's with our original qsort callback. In yesterday's CVS I tried
experimenting with a more complicated qsort callback which arbitrarily
stipulates that a NaN is bigger than anything else. Using that variant
I get the following:
<gretl>
? print check
check (5 x 4)
-5 nan -5 2
0 2 0 1
1 1 1 nan
2 0 2 0
nan -5 nan -5
</gretl>
I guess this is in a sense convenient but I don't think it's really
justified. It will also slow down sorting.
Allin