Inefficiency in join command?
by atecon
Hi all,
I just have to work a with a large panel dataset (left-hand side) to
which I would like to join a couple of series from a RHS-dataset. The
correct mapping is done via two keys.
I did some performance check, and it seems that the current
implementation runs the sorting/ mapping for each series joined
separately even though a single sorting/ mapping should be sufficient
(if I am not wrong).
In a first experiment I join all series from the RHS dataset by means of
the wildcard operator:
<join "@NAME_RHS_DATA" * --ikey=datedim,unitdim>
which takes about 5 sec. here.
Then I re-run the experiment by successively increasing the number of
series to join:
<hansl>
loop i=1..nelem(RHS_SERIES_NAMES)
printf "\nInfo: Start joining %d series.\n", $i
flush
strings tojoin = RHS_SERIES_NAMES[1:$i]
set stopwatch
join "@NAME_RHS_DATA" tojoin --ikey=datedim,unitdim
printf "\nInfo: Joining took %.2f sec.\n", $stopwatch
flush
list New = dataset - Base
delete New --force
endloop
</hansl>
The output is as follows:
<output>
Info: Joining all series took 4.91 sec.
Info: Start joining 1 series.
Info: Joining took 1.91 sec.
Info: Start joining 2 series.
Info: Joining took 2.88 sec.
Info: Start joining 3 series.
Info: Joining took 3.88 sec.
Info: Start joining 4 series.
Info: Joining took 4.84 sec.
Script done
</output>
Do you agree that the sorting or mapping overhead can in principle be
reduced when joining multiple series at once?
Thanks,
Artur
3 years
Gretl conference 2021 (apologies for cross-posting)
by Riccardo (Jack) Lucchetti
Hi everyone,
I'm writing you on behalf of the Gretl development team.
Since June 2009, we've had a Gretl conference every other year, starting
in Bilbao; the 2019 conference was in Naples. This year, it should have
been London but the pandemic forced us to change our plans. Therefore,
here's what's happening:
* the actual conference is probably going to take place during the winter,
if the COVID situation makes it possible.
* we're having an online event on the 3rd and 4th of June. Since the
majority of the gretl community lives on one side of the Atlantic or the
other, we decided that the event will take place between 1200 and 1800 UTC
(approximately) so that the inconvenience of the time zones is hopefully
minimised.
The only two fixed events at this stage are
1) a presentation by the project leader (Allin) on the 3rd
2) an invited speech by Mark Steel (Warwick University), Bayesian
extraordinaire, on the 4th.
For the rest of the time, we plan to have
- an open Q&A session in which the community can meet the development team
to ask question, make proposals or simply tell us how much you love us;
- presentations by package authors to show the community some lesser-known
features;
- a few presentations of scientific work done using gretl, or perhaps done
by using some other package but explaining why it would have been
impossible to do in gretl, and what needs to be done to make it
possible;
- a virtual meeting of the newly-born gretl association (become a
member!);
- (possibly) the usual analysis by me on the trends of the download data
and future perspectives.
Marcin was kind enough to set up a website for the conference
(https://www.gretlconference.org/), so you can visit that page to get the
info as it becomes available.
If you'd like to present something, or have any other comments or
suggestions, feel free to contact me or any other member of the
development team.
-------------------------------------------------------
Riccardo (Jack) Lucchetti
Dipartimento di Scienze Economiche e Sociali (DiSES)
Università Politecnica delle Marche
(formerly known as Università di Ancona)
r.lucchetti(a)univpm.it
http://www2.econ.univpm.it/servizi/hpp/lucchetti
-------------------------------------------------------
3 years, 3 months
The any() function
by Riccardo (Jack) Lucchetti
Folks,
from time to time I need to check if a certain object belongs or not to a
set of integers. For example, suppose you have a panel and you want to
select only certain units, whose id you list in a vector.
This can of course be generalised, so I wrote a little hansl function (I
called it any() because of my lack of imagination: alternative proposals
are very welcome):
<hansl>
set verbose off
set seed 999
function numeric any(numeric X, const matrix A)
# checks if X belongs to the set A
if typeof(X) == 1
# scalar
scalar ret = max(A .= X)
elif typeof(X) == 2
# series
series valid = ok(X)
series ret = NA
smpl valid --dummy
matrix tmp = { X } .= vec(A)'
ret = sumr(tmp) .> 0
smpl full
elif typeof(X) == 3
# matrix
scalar r = rows(X)
scalar c = cols(X)
matrix ret = maxr(vec(X) .= vec(A)')
ret = mshape(ret, r, c)
endif
return ret
end function
###
### usage example
###
nulldata 20
# scalar
foo = {1, 3, 11}
loop i = 1 .. 6
eval any(i, foo)
endloop
# matrix
matrix Z = mrandgen(i, 1, 6, 6, 3)
print Z
eval any(Z, foo)
# series
series x = randgen(i, 1, 6)
x[5] = NA
series y = any(x, foo)
print x y --byobs
</hansl>
If this is useful, we could either (a) add this to the extra package or
(b) make this a native libgretl function. Comments?
-------------------------------------------------------
Riccardo (Jack) Lucchetti
Dipartimento di Scienze Economiche e Sociali (DiSES)
Università Politecnica delle Marche
(formerly known as Università di Ancona)
r.lucchetti(a)univpm.it
http://www2.econ.univpm.it/servizi/hpp/lucchetti
-------------------------------------------------------
3 years, 4 months
buttons at bottom of settings dialog off-screen
by Sven Schreiber
Hi,
I remember there was a similar problem recently with a gnuplot dialog:
On a Linux Budgie desktop system on a 15 inch Laptop screen (not very
high definition/resolution) I couldn't see nor move the buttons of the
main gretl settings dialog to be visible on screen. This is not the
latest gretl version here (instead some post-2020e git state I believe),
but I guess this hasn't changed recently (?). The window is simply too
tall with the otherwise standard settings, and the top edge of the
window cannot be moved up to make the bottom visible instead.
Not sure if it's a problem with Budgie's and or GTK's window manager
logic or something that falls in the area of gretl's responsibility.
cheers
sven
3 years, 4 months
crash when opening broken gfn file for editing
by Sven Schreiber
Hi,
attached is a test gfn file which was messed up by gretl's own function
package editor, but that's a different story.
When I try to open this file with yesterday's snapshot on Windows, gretl
crashes.
thanks
sven
3 years, 4 months
very slight smart indentation glitch
by Sven Schreiber
Hi, with the April 14th snapshot I'm seeing a hiccup when a comment line
ends with a comma.
In gretl's script editor, type this line:
# indeed,
hit return and type:
# aha?
hit return again -> the second line is indented, where it shouldn't be.
thanks
sven
3 years, 4 months
doc/tex/refbody.tex broken - BDS test doc
by Sven Schreiber
Hi,
there are weird things in the mentioned file, e.g.: "considered close if
they lie within ε of each other."
This breaks building the pdf docs in current git for me. Are you also
seeing that?
thanks
sven
3 years, 5 months