On Fri, 13 Sep 2013, Artur T. wrote:
Hi all,
I am currently working on a rather large cross-sectional data set.
Overall I have data comprising around half a Million individuals, about
225000 households and 26 countries.
I tried to merge the different datasets, after dropping irrelevant
variables before. But merging households (identified by HID) and
countries (country) by "join" really takes a lot of time. Actually,
joining one variable takes 25 minutes on my linux machine (2.6GHz). If I
use STATA it may take a minute or so.
Why does it take that long here? I am surprised because typically gretl
operates pretty fast.
I find this very surprising. "join" is, normally, quite fast. Which
options are you using?
Also, is there a way to merge all cross-sectional variables from the
"outside" dataset with the "inner" one by a single command? At the
moment one has to specify a join command for each variable separately,
right? I am just asking out of curiosity as I am fine with the way it is
currently implemented.
No, that's by design. However, you can use a foreach loop as in
<hansl>
loop foreach i foo bar baz
join outer.csv $i <... your options...>
end loop
</hansl>
-------------------------------------------------------
Riccardo (Jack) Lucchetti
Dipartimento di Scienze Economiche e Sociali (DiSES)
Università Politecnica delle Marche
(formerly known as Università di Ancona)
r.lucchetti(a)univpm.it
http://www2.econ.univpm.it/servizi/hpp/lucchetti
-------------------------------------------------------