Am 13.09.2013 16:23, schrieb Riccardo (Jack) Lucchetti:
On Fri, 13 Sep 2013, Artur T. wrote:
> Hi all,
>
> I am currently working on a rather large cross-sectional data set.
> Overall I have data comprising around half a Million individuals, about
> 225000 households and 26 countries.
>
> I tried to merge the different datasets, after dropping irrelevant
> variables before. But merging households (identified by HID) and
> countries (country) by "join" really takes a lot of time. Actually,
> joining one variable takes 25 minutes on my linux machine (2.6GHz). If I
> use STATA it may take a minute or so.
>
> Why does it take that long here? I am surprised because typically gretl
> operates pretty fast.
I find this very surprising. "join" is, normally, quite fast. Which
options are you using?
The command I use is quiet standard I guess:
<hansl>
join "(a)WD/hfile.csv" hhgr --ikey=country,hid
</hansl>
> Also, is there a way to merge all cross-sectional variables from the
> "outside" dataset with the "inner" one by a single command? At
the
> moment one has to specify a join command for each variable separately,
> right? I am just asking out of curiosity as I am fine with the way it is
> currently implemented.
No, that's by design. However, you can use a foreach loop as in
<hansl>
loop foreach i foo bar baz
join outer.csv $i <... your options...>
end loop
</hansl>
That's also the way I've used it.
-------------------------------------------------------
Riccardo (Jack) Lucchetti
Dipartimento di Scienze Economiche e Sociali (DiSES)
Università Politecnica delle Marche
(formerly known as Università di Ancona)
r.lucchetti(a)univpm.it
http://www2.econ.univpm.it/servizi/hpp/lucchetti
-------------------------------------------------------
_______________________________________________
Gretl-users mailing list
Gretl-users(a)lists.wfu.edu
http://lists.wfu.edu/mailman/listinfo/gretl-users
Artur