On Wed, 23 May 2018, Sven Schreiber wrote:
Am 23.05.2018 um 15:48 schrieb Andreï V. Kostyrka:
> But why not Silverman’s rule-of-thumb estimator (1.06*sd(X)*n^-0.2)? The
> optimal bandwidth *should* be of *order* n^-0.2, but should also depend on
> the spread of X! The exact formula is (R(K)/(\sigma^2_K*R(f'')))^-0.2 *
> n^-0.2, where R(f'') depends on the actual density and therefore the
> distribution of X’s, and therefore, its spread.
> If I have a dataset, say, with length/weight data, it would be very silly
> to use the same bandwidth for length in metres and for length in
> millimetres because it would over-smooth in the first case and under-smooth
> in the second!
> For reference, see Scott (2015) “Multivariate estimation”, 2e, p. 144,
> “Normal reference rule”.
> R uses this rule by default (bw.nrd). In fact, it guards against outliers
> by using 1.06*min(sd(X), IQR(X)/1.34)*n^-0.2.
What you're saying makes sense I think. (I haven't used this estimator myself
yet.)
> Or is anyone in favour of the old rule for the sake of backwards
> compatibility?
Fortunately there is no compatibility issue here, because there never has
been a default. So far it was just a remark in the docs.
So should we go with this default?
I like this.
-------------------------------------------------------
Riccardo (Jack) Lucchetti
Dipartimento di Scienze Economiche e Sociali (DiSES)
Università Politecnica delle Marche
(formerly known as Università di Ancona)
r.lucchetti(a)univpm.it
http://www2.econ.univpm.it/servizi/hpp/lucchetti
-------------------------------------------------------