Am 21.06.2018 um 09:27 schrieb Sven Schreiber:
Hi fellow gretl users,
do we have any clustering methods or algorithms in gretl?
Sorry to answer to myself, but it seems not. I'm attaching a function
that uses gretl's internal kernel density estimate to cluster the input
data based on that. The idea is that all values "around" a peak in the
density belong to the same cluster (or group, or region). So the number
of estimated peaks determines the number of clusters.
Quoting the comment in the function:
Maps the input values in z to clusters defined as the neighborhoods
around the local maxima of the estimated (kernel) density.
Each cluster range goes from one (local) minimum to the next.
If the density is single-peaked, there is only one pseudo-cluster,
and the result is a vector of ones.
Returns a vector of the same length as z where each observed value
is replaced with the assigned cluster number 1..K. (Clusters are
ordered ascending.)
I'm sure there is a fancy name in the literature out there for this type
of algorithm, and I'm happy to learn it. Also I'm glad for any feedback
as to what are optimal/reasonable/common choices for the kernel
smoothing parameter in such an application.
A test call to use the function is something like this:
<hansl>
# random test input for densclust:
clear
include densclust.inp # put the attached file in the same path
matrix z = 0.5 * mnormal(100,1) - 4
z |= 2 * mnormal(50,1) + 2 # different cluster region around 2
z |= mnormal(50,1) + 8 # and another one around 8
# test call:
matrix m = densclust(z)
eval z ~ m
# same thing with capturing the interim results
matrix tt dd
densclust(z, ,,, &tt, &dd)
print tt
gnuplot 2 1 --matrix=dd --output=display
</hansl>
thanks,
sven