Inefficiency in join command?
by atecon
Hi all,
I just have to work a with a large panel dataset (left-hand side) to
which I would like to join a couple of series from a RHS-dataset. The
correct mapping is done via two keys.
I did some performance check, and it seems that the current
implementation runs the sorting/ mapping for each series joined
separately even though a single sorting/ mapping should be sufficient
(if I am not wrong).
In a first experiment I join all series from the RHS dataset by means of
the wildcard operator:
<join "@NAME_RHS_DATA" * --ikey=datedim,unitdim>
which takes about 5 sec. here.
Then I re-run the experiment by successively increasing the number of
series to join:
<hansl>
loop i=1..nelem(RHS_SERIES_NAMES)
printf "\nInfo: Start joining %d series.\n", $i
flush
strings tojoin = RHS_SERIES_NAMES[1:$i]
set stopwatch
join "@NAME_RHS_DATA" tojoin --ikey=datedim,unitdim
printf "\nInfo: Joining took %.2f sec.\n", $stopwatch
flush
list New = dataset - Base
delete New --force
endloop
</hansl>
The output is as follows:
<output>
Info: Joining all series took 4.91 sec.
Info: Start joining 1 series.
Info: Joining took 1.91 sec.
Info: Start joining 2 series.
Info: Joining took 2.88 sec.
Info: Start joining 3 series.
Info: Joining took 3.88 sec.
Info: Start joining 4 series.
Info: Joining took 4.84 sec.
Script done
</output>
Do you agree that the sorting or mapping overhead can in principle be
reduced when joining multiple series at once?
Thanks,
Artur
3 years, 2 months
a small generalization of the replace() interface
by Sven Schreiber
Hi,
the replace() function takes as 2nd and 3rd arguments the mapping pairs
of values. What about enabling a little bit of syntactic sugar for the
case of a matrix with exactly 2 columns, standing in for the respective
vectors, making the 3rd argument optional in cases like the following
example:
matrix m = {3, 0.5; 2, 0.7} # arbitrary stuff for the example
series y = replace(x, m) # not working yet
which would be equivalent to:
series y = replace(x, m[,1], m[,2])
BTW, thinking about this, there seems to be some similarity to the
strsub() function. What I mean is that replace() could be overloaded to
nest the usage of strsub(), based on the type of the first argument. And
strings arrays might be supported as arguments as well. This is just a
general observation, I don't have a concrete need right now, but I guess
there would be use cases, replacing some handwritten loops.
Does replace() actually work already on a string-valued series? If not,
I guess it should?
thanks
sven
3 years, 2 months
make it easier to specify panel dimensions for a new dataset in the GUI
by Sven Schreiber
Hi,
when creating a new panel dataset from the GUI, gretl wants to know N*T
first. Actually, for something like N = 827 and T = 48 this isn't so
easy to tell, need to grab pencil and paper or a calculator first.
Wouldn't it be possible if gretl asked separately for N and T? The
default settings in the dialog could be 1 for both, and leaving one of
them at 1 would obviously yield a cross section or a time series. The
new dialog could also contain hints like "leave at 1 for time series" or
something like that.
thanks
sven
3 years, 3 months
string vs. strings (array) type issues (and bundles)
by Sven Schreiber
Hi, I'm tripping over subtleties of 1-element arrays, combined with bundles.
First, indexing into a strings array appears to return a string type
instead of a 1-element array. (For example: eval defarray("A","B")[2]
gives the string "B".) I guess that's intended, OK.
Secondly, however, if I want to stuff such a single string into a bundle
as an array, I get an error:
<hansl>
bundle b = null
strings b.S = "A" # error: expected strings, got string
</hansl>
Note that doing this without the bundle works fine, as expected:
<hansl>
strings aS = "A"
print aS # gives 1-element strings array
</hansl>
Haven't checked whether other arrays (matrices...) have the same problem.
thanks
sven
3 years, 3 months
silent failure of string functions with non-UTF8 encoding (Windows codepage)
by Sven Schreiber
Hi,
with a July 7th snapshot I experienced messed-up results from working
with hansl string functions, where the textual input comes from a text
file with non-ASCII stuff (German Umlaute), and the file used the
Windows codepage.
I'm only saying that gretl should throw errors when it encounters weird
stuff, not saying that gretl should support that non-UTF8 encoding!
The messing-up means for example that "print" doesn't work anymore with
the string variable, doesn't show anything, although strlen reports a
positive value. Or the resulting array from using strsplit is supposed
to have 8 elements, but only the first 7 are printed out. (The 8th being
the one holding -among other characters- the Umlaut.) And so on.
Maybe the check should already be done at the readfile() stage.
The obvious workaround and solution is to save the file using UTF8.
thanks
sven
3 years, 3 months
failure with 1x1 matrix to scalar conversion
by Sven Schreiber
Hi,
consider this:
<hansl>
function scalar hey(void)
matrix out = {3}
return out
end function
eval hey() # yields NA
</hansl>
I would expect a scalar return value of 3. Or alternatively, if
matrix-to-scalar retyping isn't supported in this context, then an error
when executing the function.
thanks
sven
3 years, 3 months
gdtb format and compression
by Sven Schreiber
Hi,
I have a bit of a deja-vu feeling, but with a July snapshot I (still?)
see the behavior that when I save a dataset to the gdtb format, the
following happens:
- I'm being shown a compression setting, which doesn't seem to have any
effect, however. I kind of remember that compression is not intended for
gdtb, but that we discussed at some point that it should then be greyed out.
- I also get a choice about which binary gdtb format to use, and I
believe that was supposed to be only a transitory thing and should not
be offered anymore.
thanks
sven
3 years, 3 months
Files not found by join (or readfile)
by Sven Schreiber
Hi,
I'm struggling with the problem that the join command and also the
readfile() function claim my file is not there. In both cases I'm giving
the absolute path as input. I've checked "visually" that the file exists.
I'm wondering whether the fact that the path has a space in it (together
with Windows-backslashes) may be a reason for the problem. (The path is
in a string variable 'fname' which I then pass to the join cmd as
"@fname" to have it properly quoted, and plainly as readfile(fname) to
the function.)
I can do more testing and narrowing-down later, but maybe someone
already has an idea.
Thanks
sven
3 years, 3 months
recent tab key problem in editor
by Sven Schreiber
Hi,
I think it's a recent phenomenon (in Windows snapshots) that at the end
of a line hitting tab doesn't have an effect. I suspect that's due to
changes in the autocompletion setup. My setting is "automatic" (not "on
demand via tab"), but apparently this isn't working 100% correctly.
July 8th snapshot here.
thanks
sven
3 years, 4 months
uniq() on NA-vector
by atecon
Hi all,
I stumbled about the following:
<hansl>
matrix m = {NA; NA}
eval uniq(m)
? eval uniq(m)
Data error
</hansl>
The doc is silent about the case when all entries of the column are NA.
I am a bit surprised that this leads to a data error and hence full
stop. What about returning an empty matrix in this case?
Actually, the same "error" applies to the value() function.
Thanks,
Artur
3 years, 4 months