Am 27.09.2018 um 11:33 schrieb Riccardo (Jack) Lucchetti:
I was just thinking about it. I vaguely remember a debate we had on
one of the two lists at some point (but I may be wrong, I tried
googling for it, to no avail) about implementing a mode() function and
we got collectively stuck on the case when you have multimodal data.
Plus, diff-ing the output of ecdf() wull give you trouble if the mode
happens to be the smallest element of the vector.
True, that was a bug. Try this corrected version:
<hansl>
function matrix mode(matrix v)
# v should be a vector
E = ecdf(vec(v))
howmuch = diff(0 | E[,2])[2:] # make sure the 1st is also diffed
where = imaxc(howmuch)
return E[where, 1] | howmuch[where]
end function
</hansl>
but then, in case of multi-modal data, what should be done is debatable.
This is partly inherited from the imax*/imin* suite of gretl functions.
In 2015 we briefly discussed this on-list. I wrote:
' FWIW, (Python's) Numpy's argmax() and argmin() functions explicitly
note that:
"In case of multiple occurrences of the maximum values, the indices
corresponding to the first occurrence are returned."'
And you answered:
'This makes sense. I'm not sure if in fact we follow this policy, but if
we agree we should I can have a go the the C code to make sure we do,
and of course update the docs.'
It looks as if the gretl docs are still silent on the issue. Actually I
was surprised that apparently imaxc in your example uno gave the last
value, I thought it would return the first of the multi-modes.
I see you're using the mode function as a heuristic criterion to
find
the data dimensions; maybe we could figure out something else.
I'm happy to hear suggestions. The max line length didn't work because
of the file structure.
thanks,
sven