But why not Silverman’s rule-of-thumb estimator (1.06*sd(X)*n^-0.2)? The optimal bandwidth *should* be of *order* n^-0.2, but should also depend on the spread of X! The exact formula is (R(K)/(\sigma^2_K*R(f'')))^-0.2 * n^-0.2, where R(f'') depends on the actual density and therefore the distribution of X’s, and therefore, its spread.

If I have a dataset, say, with length/weight data, it would be very silly to use the same bandwidth for length in metres and for length in millimetres because it would over-smooth in the first case and under-smooth in the second!

For reference, see Scott (2015) “Multivariate estimation”, 2e, p. 144, “Normal reference rule”.

R uses this rule by default (bw.nrd). In fact, it guards against outliers by using 1.06*min(sd(X), IQR(X)/1.34)*n^-0.2.

Or is anyone in favour of the old rule for the sake of backwards compatibility?

--
Bien cordialement, | Yours sincerely,
Andreï V. Kostyrka.
http://kostyrka.ru, http://kostyrka.ru/blog

On 23 May 2018 at 12:03, Sven Schreiber <svetosch@gmx.net> wrote:

Am 08.12.2017 um 13:24 schrieb Sven Schreiber:

the doc for nadarwat says: "a popular choice is n^-0.2". Why not make that the default value officially, such that the third/final arg becomes optional?

I think this was never resolved, because the thread got distracted (by myself). So what about it?

thanks,
sven
_______________________________________________
Gretl-devel mailing list
Gretl-devel@lists.wfu.edu
http://lists.wfu.edu/mailman/listinfo/gretl-devel