Re: [Gretl-users] Speed of "join"

Saturday, 14 September 2013

On Fri, 13 Sep 2013, Artur T. wrote:

...
 Hi all,

 I am currently working on a rather large cross-sectional data set.
 Overall I have data comprising around half a Million individuals, about
 225000 households and 26 countries.

 I tried to merge the different datasets, after dropping irrelevant
 variables before. But merging households (identified by HID) and
 countries (country) by "join" really takes a lot of time. Actually,
 joining one variable takes 25 minutes on my linux machine (2.6GHz). If I
 use STATA it may take a minute or so.

 Why does it take that long here? I am surprised because typically gretl
 operates pretty fast. 
I find this very surprising. "join" is, normally, quite fast. Which 
options are you using?

...
 Also, is there a way to merge all cross-sectional variables from the
 "outside" dataset with the "inner" one by a single command? At the
 moment one has to specify a join command for each variable separately,
 right? I am just asking out of curiosity as I am fine with the way it is
 currently implemented. 
No, that's by design. However, you can use a foreach loop as in

<hansl>
loop foreach i foo bar baz
     join outer.csv $i <... your options...>
end loop
</hansl>

-------------------------------------------------------
   Riccardo (Jack) Lucchetti
   Dipartimento di Scienze Economiche e Sociali (DiSES)

   Università Politecnica delle Marche
   (formerly known as Università di Ancona)

   r.lucchetti(a)univpm.it
   http://www2.econ.univpm.it/servizi/hpp/lucchetti
-------------------------------------------------------

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

Re: [Gretl-users] Speed of "join"