Re: [Gretl-devel] Changes to dummify

Friday, 26 September 2008

On Wed, 24 Sep 2008, Gordon Hughes wrote:

...
 In August I raised the possibility of extending the dummify 
 function to accommodate syntax such as

 list dlist = dummify(x, n)

 where n <= 0 means that no category is dropped, while n > 0 
 means that the n-th category is dropped.  For this to work, it 
 would be necessary to require that x is a series, whereas in its 
 current version dummify(X) will work with a list X... 
I'm playing with this at present.  If I remember right, there 
seemed to a consensus last time round that we don't really lose by 
confining the dummify() function to a single series argument (not 
a list), since it's likely to be confusing to run dummify on a 
list anyway (and if you really want to do that you can use a 
"foreach" loop).

Suggestion: allow the syntax 

  list L = dummify(x)

for series x, in which case all the dummies are generated; and 
also support

  list L = dummify(x, val)

which treats 'val' as the omitted category.  (That is, the second 
argument to dummify() is optional).

That leaves a question: is it easier/more intuitive to read 'val' 
as denoting the val'th category when the distinct values of x are 
ordered, or as the condition x == val?  I tend to think the latter 
is better.  Example: in relation to the variable Y in greene22_2,
we have

? matrix v = values(Y)
Generated matrix v
? v
v (6 x 1)

   0 
   1 
   2 
   3 
   7 
  12 

To generate dummies for all values of Y other than 7, do we do

list DL = dummify(Y, 5) # or 0-based, (Y, 4)??

or

list DL = dummify(Y, 7) # what I tend to favor

On the latter approach, you could do

DL = dummify(x, min(x))
DL = dummify(x, max(x))

to skip the first or last categories without counting them.

Allin.

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

Re: [Gretl-devel] Changes to dummify