Re: [Gretl-users] clustering

Thursday, 21 June 2018

Am 21.06.2018 um 09:27 schrieb Sven Schreiber:
...
 Hi fellow gretl users,

 do we have any clustering methods or algorithms in gretl? 
Sorry to answer to myself, but it seems not. I'm attaching a function 
that uses gretl's internal kernel density estimate to cluster the input 
data based on that. The idea is that all values "around" a peak in the 
density belong to the same cluster (or group, or region). So the number 
of estimated peaks determines the number of clusters.

Quoting the comment in the function:
Maps the input values in z to clusters defined as the neighborhoods
     around the local maxima of the estimated (kernel) density.
     Each cluster range goes from one (local) minimum to the next.
     If the density is single-peaked, there is only one pseudo-cluster,
     and the result is a vector of ones.

     Returns a vector of the same length as z where each observed value
     is replaced with the assigned cluster number 1..K. (Clusters are
     ordered ascending.)

I'm sure there is a fancy name in the literature out there for this type 
of algorithm, and I'm happy to learn it. Also I'm glad for any feedback 
as to what are optimal/reasonable/common choices for the kernel 
smoothing parameter in such an application.

A test call to use the function is something like this:
<hansl>
# random test input for densclust:
clear
include densclust.inp    # put the attached file in the same path

matrix z = 0.5 * mnormal(100,1) - 4
z |= 2 * mnormal(50,1) + 2 # different cluster region around 2
z |= mnormal(50,1) + 8 # and another one around 8

# test call:
matrix m = densclust(z)
eval z ~ m

# same thing with capturing the interim results
matrix tt dd
densclust(z, ,,, &tt, &dd)

print tt
gnuplot 2 1 --matrix=dd --output=display
</hansl>

thanks,
sven

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

Re: [Gretl-users] clustering