Responding to Allin's suggestion:
for series x, in which case all the dummies are generated; and
also support
list L = dummify(x, val)
which treats 'val' as the omitted category. (That is, the second
argument to dummify() is optional).
That leaves a question: is it easier/more intuitive to read
'val'
as denoting the val'th category when the distinct values of x are
ordered, or as the condition x == val? I tend to think the latter
is better.
I agree. It is very difficult to ensure that the first option
produces predictable results in a function context when there might
be missing categories. Hence, in practice one would have to adopt
the "list DL = dummify(x, max(x))".
However, without wanting to raise unnecessary difficulties, won't
this imply a change in the use of "dummify(x)" as an argument in,
say, OLS as in "OLS y Z dummify(x)"? At the moment, this seem to
drop one category automatically, so that list Z can contain const. I
assume that this is the backward-incompatible change and that you let
the OLS function deal with linear dependence between Z and dummify(x).
Gordon