On Wed, 24 Sep 2008, Gordon Hughes wrote:
 In August I raised the possibility of extending the dummify 
 function to accommodate syntax such as
 
 list dlist = dummify(x, n)
 
 where n <= 0 means that no category is dropped, while n > 0 
 means that the n-th category is dropped.  For this to work, it 
 would be necessary to require that x is a series, whereas in its 
 current version dummify(X) will work with a list X... 
I'm playing with this at present.  If I remember right, there 
seemed to a consensus last time round that we don't really lose by 
confining the dummify() function to a single series argument (not 
a list), since it's likely to be confusing to run dummify on a 
list anyway (and if you really want to do that you can use a 
"foreach" loop).
Suggestion: allow the syntax 
  list L = dummify(x)
for series x, in which case all the dummies are generated; and 
also support
  list L = dummify(x, val)
which treats 'val' as the omitted category.  (That is, the second 
argument to dummify() is optional).
That leaves a question: is it easier/more intuitive to read 'val' 
as denoting the val'th category when the distinct values of x are 
ordered, or as the condition x == val?  I tend to think the latter 
is better.  Example: in relation to the variable Y in greene22_2,
we have
? matrix v = values(Y)
Generated matrix v
? v
v (6 x 1)
   0 
   1 
   2 
   3 
   7 
  12 
To generate dummies for all values of Y other than 7, do we do
list DL = dummify(Y, 5) # or 0-based, (Y, 4)??
or
list DL = dummify(Y, 7) # what I tend to favor
On the latter approach, you could do
DL = dummify(x, min(x))
DL = dummify(x, max(x))
to skip the first or last categories without counting them.
Allin.