Am 03.12.2014 um 16:59 schrieb Allin Cottrell:
It's more complicated than that, and msortby is actually not doing
anything odd. What's odd, I gather, is the behavior of the Microsoft C
library in the cases you said came out "OK".
How can I test the C library behavior? Maybe a simple C code snippet --
but compiling on a non-dev Windows machine is tricky I guess?
Therefore a NaN will never move in the sort, and it will create a
barrier such that sorting occurs only before and after. So the msortby
behavior is expected.
Ok, I see. I searched a bit for other software's behavior, FWIW:
- Matlab: "If A includes any NaN elements, sort places these at the high
end." (
http://matlab.izmiran.ru/help/techdoc/ref/sort.html)
- Python/Numpy: "In numpy versions >= 1.4.0 nan values are sorted to the
end. " (
http://docs.scipy.org/doc/numpy/reference/generated/numpy.sort.html)
- R: "sort ... na.last for controlling the treatment of NAs. If TRUE,
missing values in the data are put last; if FALSE, they are put first;
if NA, they are removed."
(
https://stat.ethz.ch/R-manual/R-devel/library/base/html/sort.html)
- gretl (with series, where admittedly it's about NA not NaNs, but for
example 10/0 also produces a NA there not a NaN...): just ignores the
NA, leaves them in their original position and sorts the entire rest.
...
That's with our original qsort callback. In yesterday's CVS I tried
experimenting with a more complicated qsort callback which arbitrarily
stipulates that a NaN is bigger than anything else. Using that variant
I get the following:
<gretl>
? print check
check (5 x 4)
-5 nan -5 2
0 2 0 1
1 1 1 nan
2 0 2 0
nan -5 nan -5
</gretl>
I guess this is in a sense convenient but I don't think it's really
justified. It will also slow down sorting.
I think there are various options: refuse to sort with NaNs and let the
user deal with the NaNs. Or do as the other software does, put it at the
end or at the start. Or mimick the situation with series and leave them
in their original position. Or treat it as always bigger (if that is
what Matlab does). Any or all of this could be done as an optional user
choice. But the current situation is bad IMHO.
Thanks,
sven