On Fri, 12 Jan 2018, Sven Schreiber wrote:
Am 11.01.2018 um 22:44 schrieb Allin Cottrell:
> On Thu, 11 Jan 2018, Riccardo (Jack) Lucchetti wrote:
>> On Thu, 11 Jan 2018, Sven Schreiber wrote:
>>> or perhaps the 'append' command is better suited than join here? So
>
>> I vote for join. IMO, the join command is kind of a relatively unknown gem
>> that we have and I think your proposal, if implemented, will make it quite
>> popular among people who work with time series datasets.
>
> I agree that it would be nice to take the opportunity to raise the profile
> of "join".
> 1) How do we actually implement this?
> The other option would be to extend the "join" code itself to
> handle compaction on the fly. In a sense this would be cleaner but
> I'm pretty sure it would be more complicated; "join" is already
> quite complicated.
Not sure if I understand all the ramifications; in principle I've
thought that the compaction would always come at the very end of
what join does, which would then seem relatively straightforward.
EXCEPT: given that AFAIK join may read inputs line by line instead
of variable-by-variable this of course makes it complex. Is this the
reason for a difficult implementation?
The difficulty I perceived is this: join is set up nicely to do
matching by time on left and right, but in its basic form this assumes
that the time-series frequency is the same on both sides. From that
point of view it would be "too late" to do compaction as the last
step; it would be easier to do compaction first, then match. And that
works fine, as in:
<hansl>
open hamilton.gdt -q
dataset compact 4 spread
store midastemp.gdt
open AWM.gdt -q
join midastemp.gdt PC6IT_*
remove(midastemp.gdt)
list PCIT = PC6IT_*
setinfo PCIT --midas
</hansl>
Support for wildcards is new in git, and is confined to the case of
importation from a gdt file, for now. Otherwise one would do
join midastemp.gdt PC6IT_m3 PC6IT_m2 PC6IT_m1
In addition, however, join works pretty nicely without the help of a
temporary datafile, provided the high-frequency data start in the
first sub-period (e.g. the first month of a quarter or year): in
that case the "seq" parameter to the "aggr" option does the job:
<hansl>
open AWM.gdt -q
list PCIT = zeros(1,3)
loop i=1..3 -q
vname = sprintf("PC6IT_m%d", i)
join hamilton.gdt @vname --data=PC6IT --aggr=seq:$i
PCIT[4-i] = @vname
endloop
setinfo PCIT --midas
</hansl>
I'll update the gretl-midas doc with one or both of these examples.
I believe that any real solution would also have to cover the
real-time data aspect, because forecasting in real time is going to
be an important application of mixed-frequency data. At first glance
this would suggest join, since there is an entire chapter on it for
real-time data.
That makes sense, yes.
Allin