On Thu, 27 Sep 2018, Sven Schreiber wrote:
Speaking about the helper functions, what do you think of
<hansl>
function matrix mode(matrix v)
# v should be a vector
E = ecdf(vec(v))
howmuch = diff(E[,2])
where = imaxc(howmuch)
return E[where, 1] | howmuch[where]
end function
</hansl>
I must say I quite like it. Should it go into the extra addon?
I was just thinking about it. I vaguely remember a debate we had on one of
the two lists at some point (but I may be wrong, I tried googling for it,
to no avail) about implementing a mode() function and we got collectively
stuck on the case when you have multimodal data. Plus, diff-ing the output
of ecdf() wull give you trouble if the mode happens to be the smallest
element of the vector.
For example, consider this (I'm using Sven's function above)
<hansl>
matrix uno = seq(1,3)' ** ones(3,1)
matrix due = seq(1,3)' ** {1;0;0}
print uno due
eval mode(uno)
eval mode(due)
</hansl>
As you can see if you run it, mode(uno) is debatable, mode(due) is plain
wrong.
A remedy for the second case may be something like
<hansl>
function matrix mode(matrix v)
vv = values(v)
A = sumc(vec(v) .= vv')
where = imaxr(A)
return vv[where] | A[where]/nelem(v)
end function
</hansl>
but then, in case of multi-modal data, what should be done is debatable.
I see you're using the mode function as a heuristic criterion to find the
data dimensions; maybe we could figure out something else.
-------------------------------------------------------
Riccardo (Jack) Lucchetti
Dipartimento di Scienze Economiche e Sociali (DiSES)
Università Politecnica delle Marche
(formerly known as Università di Ancona)
r.lucchetti(a)univpm.it
http://www2.econ.univpm.it/servizi/hpp/lucchetti
-------------------------------------------------------