Hi,
I really appreciate the new functionalities of aggregate()!
At first some comments:
a) including the cases is very useful
b) in terms of the NaN rows in a matrix, well I'm not that sure, in
my function I skip them at the moment, but perhaps sometimes
it's exactly those combinations one likes to know. Is there
already a way that generates a copy of a matrix without lines including
missings?
c) what about the mcov() function as 'aggregator function'?
Second, I'm of the opinion that there is a quite important and user
friendly option missing:
It's about combinatorics.
Although one has e.g. 5 discrete variables one wants e.g. ONLY ALL
BIVARIATE combinations.
This should be done by GRETL and not by hand.
I found a solution that looks like this (including the nice pack and
unpack functions!).
In the GRETL function this could look like: aggregate(X, Y, mean,2)
where the last argument is 'nelem(Y)' as standard.
<hansl>
open pizza4.gdt
set echo off
series ages = age<=20 ? 0 : (age>20 && age <=40) ? 1 : 2
list X = pizza income
list Y = female college ages
# by hand bivariate interaction, rather easy with three variables...
list Y1 = female college
list Y2 = female ages
list Y3 = college ages
matrix m2a = aggregate(X, Y1, mean)
matrix m2b = aggregate(X, Y2, mean)
matrix m2c = aggregate(X, Y3, mean)
# all three
matrix m3 = aggregate(X, Y, mean)
# comparison to my solution
bundle MVS = null
MVS["LIDs"] = getIDs(Y)
# for automatic bivariate interaction
scalar inter = 3
matrix codes ={}
loop for i=2..inter -q
Com$i =COMBIS(MVS[LIDs],i,&MVS)
loop for c=1..cols(Com$i) -q
string listName = "L$i_$c"
list @listName = Com$i[,c]
agg(X,@listName,&codes)
# or if u want to store the matrices in a bundle
agg(X,@listName,&codes,1,&MVS)
endloop
endloop
# FUNCTIONS
************************************************************************
#********************* function 1
*****************************************************
function scalar choose(scalar n, scalar k)
scalar binC = gammafun(n+1)/(gammafun(k+1)*gammafun(n-k+1))
return binC
end function
#********************* function 2
*****************************************************
function matrix COMBIS (matrix x, scalar m, bundle *bun[null])
scalar n = rows(x)
if (n < m)
print "Not possible!"
endif
scalar e = 0
scalar h = m
matrix a = seq(1,m)
matrix r = x[a]
scalar lenr = rows(r)
scalar count = choose(n, m) # choose is 'n over k'
matrix dimuse = {m, count}
# gerneral structur of output matrix
matrix out = mshape(r, lenr, count)
if (m > 0)
scalar i = 2
scalar nmmp1 = n - m + 1
loop while (a[1] != nmmp1) -q
if (e < n - h)
scalar h = 1
scalar e = a[m]
matrix j = 1
else
scalar e = a[m - h]
scalar h += 1
matrix j = seq(1,h)
endif
a[m - h + j] = e + j
r = x[a]
out[, i] = r
i+= 1
endloop
endif
if !isnull(bun)
sprintf num "%g", m
bun["combis_@num"]=out
endif
return out
end function
#********************* function 3
*****************************************************
function series pack(list X, matrix *pwrof2)
/*
a check should be necessary that all series in X
are discrete; for the moment, let's just skip it
*/
series ret = 0
pwrtot = 1
k = nelem(X)
# first build the nvalues vector
# starting values are
pwrof2 = {}
n = 1
k = 1
loop foreach i X --quiet
ret += k*X.$i
# some fun to generate a unique number
n = 2^ceil(log2(1+max(X.$i)))
k *= n
# save the key for unpacking
pwrof2 |= n
end loop
return ret
end function
#********************* function 4
*****************************************************
function matrix unpack(scalar x, matrix pwr)
# this function retrurnes the conditions for the factor steps
# number of codes which has to be decrypted
# for each factor one code (not for each factor step!)
k = rows(pwr)
matrix ret = zeros(k,1)
loop i=1..k --quiet
h = pwr[i]
a = x % h
ret[i] = a
# prepare for next factor
x = (x-a)/h
end loop
return ret
end function
#********************* function 5
*****************************************************
function void agg(list endos, list groups, matrix *codes, scalar
func[1], bundle *bun[null], scalar MLE[0], series *Indices[null])
matrix calc = {}
scalar m = nelem(endos)
scalar COLsneeded = func*m
u = pack(groups, &codes)
n_u = values(u)
matrix calc = zeros(rows(n_u),1+nelem(groups)+COLsneeded)
matrix calc_start = zeros(rows(n_u),1)
sprintf lName "%s", strsub(varname(groups),",","_")
loop i = 1..rows(n_u) -q
r = n_u[i]
smpl u==r --res --rep
scalar Ns = $nobs
calc[i,1] = Ns
matrix c = unpack(r, codes)
step1 = nelem(groups) +1
calc[i,2:step1] = c'
step2 = step1 +1
step3 = step1 + m
if func==1
calc[i,step2:step3] = meanc({endos})
step3++
elif func==2
calc[i,step2:step3] = meanc({endos})
step4 = step3 + m
step3++
calc[i,step3:step4] = sd({endos})
steps = step4+1
endif
if !isnull(Indices)
calc_start[i,1]=r
endif
end loop
if !isnull(Indices)
string NAMEcolMVS = "ID cases "
else
string NAMEcolMVS = "cases "
endif
NAMEcolMVS += strsub(varname(groups),","," ")
string NAMEcolENDOS = strsub(varname(endos),","," ")
colnames(calc_start,"ID")
matrix take1 = seq(1,nelem(groups)+1)
if !isnull(Indices)
matrix adding = calc_start~calc[,take1]
colnames(adding,NAMEcolMVS)
if !isnull(bun)
bun["MVS_@lName"]=adding
else
print "Infos"
eval adding
endif
else
matrix adding = calc[,take1]
colnames(adding,NAMEcolMVS)
if !isnull(bun)
bun["MVS_@lName"]=adding
else
print "Infos"
eval adding
endif
endif
matrix take2 = seq(1,nelem(endos))
loop for f=1..func -q
if f==1
take2 = take2.+cols(take1)
matrix adding2 = calc[,take2]
colnames(adding2,NAMEcolENDOS)
if !isnull(bun)
bun["means_@lName"]=adding2
else
print "means"
eval adding2
endif
elif f==2
take2 = take2 .+ m
matrix adding2 = calc[,take2]
colnames(adding2,NAMEcolENDOS)
if !isnull(bun)
bun["sds_@lName"]=adding2
else
print "sds"
eval adding2
endif
endif
endloop
smpl full
if !isnull(Indices)
Indices = u
endif
end function
#********************* function 6
*****************************************************
function matrix getIDs(list G)
matrix IDs = {}
loop foreach i G -q
IDs |= varnum(G.$i)
endloop
return IDs
end function
# END FUNCTIONS
*************************************************************
<hansl>
27.02.2013 23:54, Allin Cottrell:
There was some interest expressed here lately in the business of
"aggregating" some variable of interest "by" the distinct values of
some (discrete) variable -- e.g. finding mean income by gender or
race.
In response to this, we added the aggregate() function (it's in CVS
and the snapshots for Windows and OS X). Now I've extended the
functionality of aggregate(); the new stuff is not yet documented in
the Function Reference but I'd like to explain what we've got and
invite comments.
First, in the current doc, the "x" and "byvar" parameters must be
(single) series, but now they can be either single series or named
lists of series. If you give "x" as a series you get extra columns
to the right, holding the aggregated values of each of the members
of x. If you give "y" as a series you get a multi-level "by". The
number of rows in the output matrix is then the number of
combinations of the distinct values of the "byvar" variables.
Second, we've added a column to the output matrix (between the
"byvar" values and the aggregated "x" columns) showing the count of
observations associated with each (combination of) "byvar" values.
It seems to me this would be useful, but what do you think?
Third, when we allow more than one "by" variable a new policy
question comes up: what do we do about combinations of by-values
that are not actually found in the dataset (count = 0)? Either we
skip such combinations, or we explicitly show a count of zero, and
fill out the aggregated-x columns with NaN (not-a-number). I'm
inclined to think it might be helpful to show the zeros explicitly,
and so that's what we do right now, but again I'd like to hear what
people think.
Example script using the new functionality:
<hansl>
open pizza4.gdt
series ages = age<=20 ? 0 : (age>20 && age <=40) ? 1 : 2
list Y = female college ages
list X = pizza income
matrix m = aggregate(X, Y, mean)
print m
</hansl>
(BTW you can find pizza4.gdt via google)
Allin Cottrell
_______________________________________________
Gretl-users mailing list
Gretl-users(a)lists.wfu.edu
http://lists.wfu.edu/mailman/listinfo/gretl-users