But why not Silverman’s rule-of-thumb estimator (1.06*sd(X)*n^-0.2)? The optimal bandwidth *should* be of *order* n^-0.2, but should also depend on the spread of X! The exact formula is (R(K)/(\sigma^2_K*R(f'')))^-0.2 * n^-0.2, where R(f'') depends on the actual density and therefore the distribution of X’s, and therefore, its spread.
If I have a dataset, say, with length/weight data, it would be very silly to use the same bandwidth for length in metres and for length in millimetres because it would over-smooth in the first case and under-smooth in the second!
For reference, see Scott (2015) “Multivariate estimation”, 2e, p. 144, “Normal reference rule”.
R uses this rule by default (bw.nrd). In fact, it guards against outliers by using 1.06*min(sd(X), IQR(X)/1.34)*n^-0.2.
Or is anyone in favour of the old rule for the sake of backwards compatibility?