Am 11.11.2024 um 03:04 schrieb Cottrell, Allin:
On Sun, Nov 10, 2024 at 1:27 PM Sven Schreiber
<sven.schreiber(a)fu-berlin.de> wrote:
> I'm currently looking at ch. 21 of the guide, "Cheat sheet". I'd
propose
> the following cleanups (which I could apply if people agree):
Following-up on this earlier list, here's an update:
(and see below for a crash...)
> section 21.1:
>
> - Time averaging of panel datasets: I think it would be nice to use a
> real-world dataset such as grunfeld.gdt instead of having the slightly
> distracting code for creation of artificial data.
Here's a tested variant of what I meant:
<hansl>
open grunfeld
# how many periods (here: years) to average
newfreq = 4
# a dummy for endpoints
series endpoint = (time % newfreq == 0) # 'time' already in dataset
list X = invest value kstock # time-varying variables
# compute averages
loop foreach i X
series $i = movavg($i, newfreq)
endloop
# drop extra observations
smpl endpoint --dummy --permanent
# restore panel structure
setobs firm year --panel-vars
print firm year X -o
</hansl>
OK with you guys to replace the old example with artificial data?
> section 21.2:
>
> - Generating a dummy variable for a specific observation: Instead of
> t=="Italy" one can also write obs=="Italy", which may be more
intuitive
> for cross-sectional data.
Already done (by Allin).
>
> - Generating a “subset of values” dummy: Nowadays one could use the
> contains() function I think, which would be more readable.
Here's an artificial but also tested example of what I mean:
<hansl>
nulldata 10
series src = {1,2,3,12,13,14,22,23,24,25}
matrix sel = {2,13,14,25}
series D1 = contains(src, sel)
</hansl>
So I think that the long-ish paragraph about the "clever solution" could
be deleted. Also, I'm not sure that what is then labeled as the "proper
solution" using the replace() function is actually "more proper" than
the one I gave using contains(). Opinions?
>
> section 21.3:
>
> - Interaction dummies (p. 194 of the A4 guide version from October):
> remove the old string-substitution-based code that pre-dates the
> interaction operator (^; which is also already mentioned there).
Again, is the
old solution (starting with "But back in my day...")
really still needed?
>
> - Realized volatility: Is this example even consistent? It starts by
> talking about minutes and hours, but then switches over to seconds and
> minutes. Maybe that's part of the clever trick, I don't know... Apart
> from that, it seems that another trick in the cheat sheet could be
> re-used here, namely "Moving functions for time series".
OK, so here's something much more straightforward IMHO to calculate a
per-hour volatility, using the aggregate function:
<hansl>
nulldata 720
setobs 60 1:1 --time-series # 60 minutes per hour
series x = normal()
matrix v = aggregate(x, $obsmajor, var) # $obsmajor means hour here
print v
dataset compact 1 # yields error !
series rv = v[,end]
</hansl>
HOWEVER, for the "dataset compact 1" line gretl tells me "not
supported", and I don't understand why. Shouldn't it be quite easy to
compact from any periodicity down to 1?
Plus, when I try the compaction in the GUI on this artificially created
dataset, the program crashes (disappears). This is 2024d on Windows.
>
> - Looping over two paired lists: Can't this one be generalized, by using
> Lx[i] and Ly[i] instead of y$i and x$i ?
Already done.
>
> - Cross-validation: Could it be that using some feature of the regls
> apparatus or a contributed package (by Artur?) would be more practical
> nowadays?
It mentions the leverage command - could be that this was already the
answer to my previous remark, not sure
>
> - Is my matrix result broken? - One could now use sum() instead of
> sumc(sumr()).
Already done.
These are all good points. Let's see if we can address them.
As indicated item by item above, some of it was already addressed,
thanks for that.
cheers
sven