Based on the code Jack provided before, it takes about 20 minutes to
join the outer.csv file in the case of mean_n_hh = 10000 on my machine.
I added some lines using R's merge function for a comparison (added to
Jack's original code). It takes only 29 seconds to merge the data files
here (with starting R out of gretl!). On STATA the merge process takes
about 10 seconds only.
Thus, the differences are still very significant.
Well, there is the Stata benchmark you mentioned which is a natural
competitor. I'd say if your machine is fast enough to do it with Stata
the goal should be that it also works reasonably well with gretl. So
how's that comparison now after the accelerations in gretl?
cheers,
sven
Artur
<hansl>
set echo off
set messages off
set seed 123456
ncountries = 30
mean_n_hh = 10000
n = 0
# generate the outer dataset
printf "generating outer file\n"
outfile "outer.csv" --write
printf "cntry, hid, x\n"
loop i=1..ncountries --quiet
nind = ceil(randgen1(z,mean_n_hh,10))
n += nind
loop j=1..nind --quiet
printf "%d,%d,%12.5f\n", i, j, randgen1(z,0,1)
endloop
end loop
outfile "outer.csv" --close
open outer.csv --quiet --preserve
# generate the inner dataset
printf "generating inner file\n"
outfile "inner.csv" --write
printf "hid, iid, cntry, y\n"
loop i=1..$nobs --quiet
nh = randgen1(i,1,4)
loop j=1..nh --quiet
printf "%d,%d,%d,%12.5f\n", hid[i], j, cntry[i], randgen1(z,0,1)
endloop
end loop
outfile "inner.csv" --close
# do the join
open inner.csv --preserve
printf "performing join\n"
set stopwatch
join outer.csv x --ikey=hid,cntry
printf "individuals = %d (%d countries, %d households); time = %g
seconds\n", \
$nobs, ncountries, n, $stopwatch
smpl 1 30
print -o
smpl full
# Do it with R
set stopwatch
foreign language=R
inner <- read.table("/home/artur/gretl/inner.csv",sep=",",
header=TRUE)
outer <- read.table("/home/artur/gretl/outer.csv",sep=",",
header=TRUE)
m = merge(inner, outer,by=c("cntry","hid"))
summary(m)
end foreign
printf "R: %.2f sec.\n", $stopwatch / 60
# Comparison of some descriptive stats
summary cntry hid iid y x
</hansl>