I sent a version of the following to the gretl-devel list, but with
a typo in the subject line: "jon" for "join". Reposting here for
wider dissemination.
Thanks to Sven for getting me going on this. The "join" command now
supports the "spreading" of high-frequency series (as wanted by our
MIDAS apparatus) in a single operation. This requires use of the
--aggr option with parameter "spread". There are two acceptable
forms of usage, illustrated below. (AWM is a quarterly dataset, and
hamilton monthly.)
open AWM.gdt -q
join hamilton.gdt PC6IT --aggr=spread
open AWM.gdt -q
join hamilton.gdt PCI --data=PC6IT --aggr=spread
In the first case MIDAS series PC6IT_m3, PC6IT_m2 and PC6IT_m1 are
added. In the second case "PCI" is used as the base name for the
imported series, giving PCI_m3, PCI_m2 and PCI_m1.
Only one high-frequency series can be imported in a given "join"
invocation with --aggr=spread (this already implies the writing of
multiple series in the lower frequency dataset).
An important point to note: this "--aggr=spread" thing (where we map
from one higher-frequency series to a set of lower-frequency ones)
relies on finding a known, reliable time-series structure in the
"outer" data file. Native gretl data files (gdt, gdtb) will be OK,
and also well-formed gretl-friendly CSV files, but not arbitrary CSV
files.
One of the pertinent features of "join" is that in general it
assumes almost nothing about the structure of the outer data file.
It just crawls across the rows of the file looking for matches, so
it can extract time-series data from a file that looks nothing like
"proper" time series. Here's a case in point:
$ cat thing.csv
thing1,thing2,Z
-1,0,1
999,999,2
1981,1,3
1980,1,4
1982,1,5
3556,14,6
$cat join-thing.inp
open data9-7 # quarterly data
series yr = $obsmajor
series qtr = $obsminor
join thing.csv Z --ikey=yr,qtr --okey=thing1,thing2
print QNC Z -o
Running join-thing.inp works fine: join plods through the nonsense
in thing.csv, finds three matches (in random order) and puts the "Z"
data in the right places in the working dataset.
If you have difficulty importing data MIDAS-style from a given CSV
file using --aggr=spread you might want to drop back to a more
agnostic, piece-wise approach (agnostic in the sense of assuming
less about gretl's ability to detect any time-series sructure that
might be present). Here's an example of what I mean:
<hansl>
open hamilton.gdt
# create month-of-quarter series for filtering
series mofq = ($obsminor - 1) % 3 + 1
# write example CSV file: the first column holds, e.g. "1973M01"
store test.csv PC6IT mofq
open AWM.gdt -q
# import monthly components one at a time, using a filter
join test.csv PCI_m3 --data=PC6IT --tkey=",%YM%m" --filter="mofq==3"
join test.csv PCI_m2 --data=PC6IT --tkey=",%YM%m" --filter="mofq==2"
join test.csv PCI_m1 --data=PC6IT --tkey=",%YM%m" --filter="mofq==1"
list PCI = PCI_*
setinfo PCI --midas
print PCI_m* -o
</hansl>
The example is artificial in that a time-series CSV file written by
gretl itself should work OK without special treatment. In fact this
will work fine:
<hansl>
open hamilton.gdt
store hamilton.csv
open AWM.gdt -q
join hamilton.csv PC6IT --aggr=spread
</hansl>
But you may have to add "helper" columns to a third-party CSV file
to enable a piece-wise MIDAS join via filtering.
Allin Cottrell
Show replies by date