Am 27.09.2018 um 11:33 schrieb Riccardo (Jack) Lucchetti:

I was just thinking about it. I vaguely remember a debate we had on one of the two lists at some point (but I may be wrong, I tried googling for it, to no avail) about implementing a mode() function and we got collectively stuck on the case when you have multimodal data. Plus, diff-ing the output of ecdf() wull give you trouble if the mode happens to be the smallest element of the vector.


True, that was a bug. Try this corrected version:

<hansl>

function matrix mode(matrix v)
    # v should be a vector
    E = ecdf(vec(v))
    howmuch = diff(0 | E[,2])[2:] # make sure the 1st is also diffed
    where = imaxc(howmuch)
    return E[where, 1] | howmuch[where]
end function

</hansl>


but then, in case of multi-modal data, what should be done is debatable.

This is partly inherited from the imax*/imin* suite of gretl functions. In 2015 we briefly discussed this on-list. I wrote:

' FWIW, (Python's) Numpy's argmax() and argmin() functions explicitly note that:
"In case of multiple occurrences of the maximum values, the indices corresponding to the first occurrence are returned."'

And you answered:

'This makes sense. I'm not sure if in fact we follow this policy, but if we agree we should I can have a go the the C code to make sure we do, and of course update the docs.'

It looks as if the gretl docs are still silent on the issue. Actually I was surprised that apparently imaxc in your example uno gave the last value, I thought it would return the first of the multi-modes.


I see you're using the mode function as a heuristic criterion to find the data dimensions; maybe we could figure out something else.

I'm happy to hear suggestions. The max line length didn't work because of the file structure.

thanks,

sven