Re: [Gretl-users] individuals to household

Sunday, 31 May 2009

On Sat, 30 May 2009, [ISO-8859-1] Sebasti�n Goinheix wrote:

...
 OK.This is a great proyect, and i`m very happy to participate (at
least in
 the user list).
 Thank you very much.

 2009/5/30 Allin Cottrell <cottrell(a)wfu.edu&gt;
 > I'm in transit right now, but will try to offer and answer before
 > long. 
Meanwhile Jack Lucchetti has posted a possible solution.  But I'll
go ahead and give mine too -- it's more complicated but I think
it may be more general.

I'm supposing you have a data set that is structurally similar to
this simple hypothetical example:

hhid y x
10004 100 1
10004 110 4
24532 90 4
24532 120 4
24532 100 2
39800 150 5
46541 100 4
46541 80 3
46541 90 6

where "hhid" records the household identifier for various
individuals, and y and x are the variables of interest.  I'm
assuming you want to consolidate the data by household, either by
summing the values or possibly taking a household average.  Here's
my solution:

<script>
# Supose the above data are in hh.txt
open hh.txt
scalar n = $nobs

# how many households are there?
matrix hhvals = values(hhid)
scalar nhh = rows(hhvals)
printf "Found %d households\n", nhh

# how many variables are there? (excluding the constant)
scalar nv = $nvars - 1
printf "We have %d variables\n", nv
# create a matrix to hold the household data (with an extra
# column for the number of members)
matrix X = zeros(nhh, nv + 1)
# create list of variables (excluding hhid)
list vars = dataset
vars -= hhid

# scalars for accounting
scalar j, Xrow, Xcol

# form household-level variables in matrix X: here I'm just
# summing the values for the members of the household
loop i=1..n --quiet
   loop j=1..nhh --quiet
      if hhid[i] = hhvals[j]
         printf "obs %d belongs to household %d\n", i, hhvals[j]
         Xrow = j
         break
      endif
   endloop
   # column 1 holds the household ID
   X[Xrow,1] = hhid[i]
   Xcol = 2
   loop foreach k vars --quiet
      X[Xrow, Xcol] += $k[i]
      Xcol++
   endloop
   # in the last column of X, cumulate the number of members
   # in the given household
   X[Xrow,Xcol] += 1
endloop

# print HH data in matrix form to check
print X

# replace original dataset with household version (one could
# form household means here, if wanted)
loop i=1..nhh --quiet
  hhid[i] = X[i,1]
  Xcol = 2
  loop foreach k vars --quiet
     $k[i] = X[i, Xcol]
     Xcol++
  endloop
endloop

# restrict the sample to the number of households and save
smpl 1 nhh
series nmembers = X[, nv+1]
setinfo nmembers -d "Number of people in household"
print --byobs
store hh2.gdt
</script>

The outline is that we take the original data, cumulate it into a
matrix, then use the matrix to overwrite the first nhh rows of the
original dataset, then finally chop off the unwanted rows with
"smpl" and save under a new name.  The household IDs don't have to
be consecutive, or 1-based, and the rows do not have to be
organized by household.

Allin Cottrell

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

Re: [Gretl-users] individuals to household