On Fri, 13 Sep 2013, Artur T. wrote:
I am currently working on a rather large cross-sectional data set.
Overall I have data comprising around half a Million individuals, about
225000 households and 26 countries.
The command I use is quiet standard I guess:
<hansl>
join "(a)WD/hfile.csv" hhgr --ikey=country,hid
</hansl>
[ and join takes a long time ]
Can you tell us the dimensions of the two datasets -- the "left-hand" one
that you're joining onto and the "right-hand" one that's the source for
join? How many rows on the left, and how many rows and columns on the
right?
I'd like to try to simulate this and see what's going on.
Allin Cottrell