Am 23.05.2018 um 15:48 schrieb Andreï V. Kostyrka:
But why not Silverman’s rule-of-thumb estimator (1.06*sd(X)*n^-0.2)?
The optimal bandwidth *should* be of *order* n^-0.2, but should also
depend on the spread of X! The exact formula is
(R(K)/(\sigma^2_K*R(f'')))^-0.2 * n^-0.2, where R(f'') depends on the
actual density and therefore the distribution of X’s, and therefore,
its spread.
If I have a dataset, say, with length/weight data, it would be very
silly to use the same bandwidth for length in metres and for length in
millimetres because it would over-smooth in the first case and
under-smooth in the second!
For reference, see Scott (2015) “Multivariate estimation”, 2e, p. 144,
“Normal reference rule”.
R uses this rule by default (bw.nrd). In fact, it guards against
outliers by using 1.06*min(sd(X), IQR(X)/1.34)*n^-0.2.
What you're saying makes sense I think. (I haven't used this estimator
myself yet.)
Or is anyone in favour of the old rule for the sake of backwards
compatibility?
Fortunately there is no compatibility issue here, because there never
has been a default. So far it was just a remark in the docs.
So should we go with this default?
thanks,
sven