Re: [Gretl-users] Speed of "join"

Friday, 13 September 2013

On Fri, 13 Sep 2013, Artur T. wrote:

...
 Am 13.09.2013 18:47, schrieb Riccardo (Jack) Lucchetti:
>
> The following script will generate two testfiles whose size depends on
> the two parameters ncountries and mean_n_hh (mean number of people per
> household). In order to get nearly the same size as your real data, you
> could set ncountries to 30 and mean_n_hh to 7500 (roughly). Then, a
> "join" will be performed and the time taken.
>
> On my pc this takes about half a second with mean_n_hh=200, nearly a
> minute with mean_n_hh=4000 and about 8 minutes with mean_n_hh=10000.
> From some experimenting, it would seem that time is approximately
> quadratic; I suppose we could try something to make it less convex
> (although I suspect it won't be easy to make it linear).
>

 The "mean_n_hh=10000" case takes around 20 min. here. But interestingly,
 I ran the following on STATA 11 using this "10000" case:

 <STATA>
 insheet using "outer.csv", clear
 sort cntry hid
 save "Z:\home\artur\gretl\outer.dta", replace
 insheet using "inner.csv", clear
 merge cntry hid using "outer.dta"	/// Didn't work with cvs
 </STATA>

 On STATA it only takes about 3 seconds or so. 
Yeah. As Jack said it seems quadratic and that should not be the case. I'm 
working on it.

Allin

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

Re: [Gretl-users] Speed of "join"