On Fri, 10 Sep 2021, Sven Schreiber wrote:
[shows incomplete example from Artur, suggesting inefficiency in 
"join"]
 As some of you may have noticed, since back in March a feature
request
 ticket had been created and some discussion took place there
 (
https://sourceforge.net/p/gretl/feature-requests/151/)
 I believe in current git Allin has worked on fixing this, but I haven't
 tested myself yet - part of the reason was that Artur's script above was
 not self-contained. 
Looking into the join code it became apparent that there was at 
least one possible source of inefficiency. That is, if you're 
importing n series via a single invocation of the join command, we 
were calculating the matching of inner and outer keys n times, where 
in principle we could do this once, store the results and apply them 
to each of the imports -- since when you specify multiple series for 
joining, the keys (if any) must be the same for all of them. But 
note that this is not an entirely "free lunch", since storing the 
matching results requires allocation of extra memory that's not 
needed otherwise.
In current git we employ the "calculate once and store" method for 
the first key (not yet for the second, if present, but most of the 
work goes on the first one).
I've tested an example where the dataset has 20000 observations and 
there are 20 series to import in one go, with a single matching key 
and aggregation via the average of the matching values. What I found 
was a speed-up of about 2 or 3 percent with the new method. So with 
these parameters it appears that the key-matching code actually 
takes a trivial proportion of the overall compute time, hardly worth 
bothering with.
At this point it would be good to have an example which exhibits a 
substantial (supra-linear) slowdown on importing more series. Maybe 
that's the case with Artur's example. Anyway, a minimal but 
informative test case would be very useful.
Allin