On Sat, 30 May 2009, [ISO-8859-1] Sebasti�n Goinheix wrote:
OK.This is a great proyect, and i`m very happy to participate (at
least in
the user list).
Thank you very much.
2009/5/30 Allin Cottrell <cottrell(a)wfu.edu>
> I'm in transit right now, but will try to offer and answer before
> long.
Meanwhile Jack Lucchetti has posted a possible solution. But I'll
go ahead and give mine too -- it's more complicated but I think
it may be more general.
I'm supposing you have a data set that is structurally similar to
this simple hypothetical example:
hhid y x
10004 100 1
10004 110 4
24532 90 4
24532 120 4
24532 100 2
39800 150 5
46541 100 4
46541 80 3
46541 90 6
where "hhid" records the household identifier for various
individuals, and y and x are the variables of interest. I'm
assuming you want to consolidate the data by household, either by
summing the values or possibly taking a household average. Here's
my solution:
<script>
# Supose the above data are in hh.txt
open hh.txt
scalar n = $nobs
# how many households are there?
matrix hhvals = values(hhid)
scalar nhh = rows(hhvals)
printf "Found %d households\n", nhh
# how many variables are there? (excluding the constant)
scalar nv = $nvars - 1
printf "We have %d variables\n", nv
# create a matrix to hold the household data (with an extra
# column for the number of members)
matrix X = zeros(nhh, nv + 1)
# create list of variables (excluding hhid)
list vars = dataset
vars -= hhid
# scalars for accounting
scalar j, Xrow, Xcol
# form household-level variables in matrix X: here I'm just
# summing the values for the members of the household
loop i=1..n --quiet
loop j=1..nhh --quiet
if hhid[i] = hhvals[j]
printf "obs %d belongs to household %d\n", i, hhvals[j]
Xrow = j
break
endif
endloop
# column 1 holds the household ID
X[Xrow,1] = hhid[i]
Xcol = 2
loop foreach k vars --quiet
X[Xrow, Xcol] += $k[i]
Xcol++
endloop
# in the last column of X, cumulate the number of members
# in the given household
X[Xrow,Xcol] += 1
endloop
# print HH data in matrix form to check
print X
# replace original dataset with household version (one could
# form household means here, if wanted)
loop i=1..nhh --quiet
hhid[i] = X[i,1]
Xcol = 2
loop foreach k vars --quiet
$k[i] = X[i, Xcol]
Xcol++
endloop
endloop
# restrict the sample to the number of households and save
smpl 1 nhh
series nmembers = X[, nv+1]
setinfo nmembers -d "Number of people in household"
print --byobs
store hh2.gdt
</script>
The outline is that we take the original data, cumulate it into a
matrix, then use the matrix to overwrite the first nhh rows of the
original dataset, then finally chop off the unwanted rows with
"smpl" and save under a new name. The household IDs don't have to
be consecutive, or 1-based, and the rows do not have to be
organized by household.
Allin Cottrell