Re: [Gretl-users] Speed of "join"

Friday, 13 September 2013

Am 13.09.2013 18:47, schrieb Riccardo (Jack) Lucchetti:
...

 The following script will generate two testfiles whose size depends on
 the two parameters ncountries and mean_n_hh (mean number of people per
 household). In order to get nearly the same size as your real data, you
 could set ncountries to 30 and mean_n_hh to 7500 (roughly). Then, a
 "join" will be performed and the time taken.

 On my pc this takes about half a second with mean_n_hh=200, nearly a
 minute with mean_n_hh=4000 and about 8 minutes with mean_n_hh=10000.
 From some experimenting, it would seem that time is approximately
 quadratic; I suppose we could try something to make it less convex
 (although I suspect it won't be easy to make it linear).

The "mean_n_hh=10000" case takes around 20 min. here. But interestingly,
I ran the following on STATA 11 using this "10000" case:

<STATA>
insheet using "outer.csv", clear
sort cntry hid
save "Z:\home\artur\gretl\outer.dta", replace
insheet using "inner.csv", clear
merge cntry hid using "outer.dta"	/// Didn't work with cvs
</STATA>

On STATA it only takes about 3 seconds or so.

Artur

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

Re: [Gretl-users] Speed of "join"