Hello all panel-interested people,
while using gretl for teaching with panel data (which I hadn't done much
before) I noticed the following, let's say, interface nuisances compared
to the usual luxury gretl offers for time series:
1: The sample and/or range in the main window (bottom) are given as pure
index numbers, even if "panel data marker strings" (cf. user guide p.23)
are defined. At least for the time dimension it would be useful to show
the sample periods in a human-readable form (through the markers). Also,
I noticed that the period numbers shown do not always coincide with the
values of the "time" index variable, if subsampling is in effect. (Seen
in the CEL.gdt dataset after applying the sample restriction year>1970
1b: A slightly more general suggestion, also for non-panel data: The
active sample restriction criterion could be shown next to the resulting
active sample in the main window. (At least for simple restrictions,
maybe not for complex, multiple ones.)
2: Menu Sample -> Set range: Only the group range can be chosen, not the
periods. Actually, given the often arbitrary ordering of groups, this is
really the less useful dimension to choose a contiguous range from. (I
know I can use "set sample based on criterion" for periods, but that's
not the point.)
3: About pshrink(): A version that returns a full panel series (with
repeated values like pmean() etc.) could be useful -- practical example:
in growth regressions one needs the initial value of output-per-worker
as a regressor. Also maybe it should be called "pfirst()" or something
4: Time-constant variables: I'm not sure how to create variables that
only vary along the cross-section, like it is done with the built-in
pmean() etc. functions. Or how to append them (like the user guide p.114
"adding a time series", but along the other panel dimension).
5: Constant in a fixed-effects regression: I don't understand what gretl
reports as the global constant term in a fixed-effects model, and it
doesn't seem to be defined in the guide. It's also confusing that gretl
complains if one wants to discard the constant in the specification
dialog (when fixed effects are selected). (But obviously gretl
estimates the right thing as the comparison with explicit LSDV
regression shows, just the constant is mysterious -- even if it's the
average of the fixed effects it's not clear where the standard errors
6: Lags not showing in model spec dialog when sample is restricted to a
single period: If I restrict the CEL.gdt data with year==1985, I cannot
include any previously created lags (of y for example) in the
regression, because they don't show up in the variable selector. Because
the subsampled dataset is now treated as "undated", there's also no
"lags..." button in the dialog. -- Actually I don't understand why gretl
"temporarily forgets" the panel structure of the dataset when a single
period is active. It would seem less problematic to treat even a T=1
sample as a special case of panel data if the underlying dataset has a
panel structure; especially in conjunction with point 1 above about
showing the selected periods in the sample.
Ok, that was a long post, sorry, but still necessary I think.
the two following constructs should be equivalent IMHO, but the second
one with the conditional assignment gives an error:
# 'check' really doesn't exist
scalar heyho = 0
scalar heyho = check
scalar heyho = isnull(check) ? 0 : check
And on an unrelated further note, the inbundle() function seems to
require a quoted string, as in inbundle(mybundle,"keystring"). I don't
have a big problem with that (although it seems to go a little against
gretl style), but it would be nice if it could be documented in the
function reference. Also, inbundle() currently doesn't appear in the
bundle chapter of the manual. (I know, could add it myself...)
I have extended the coverage of my Alfred-related functions to perform
an adjustment for the quite frequently occurring baseperiod changes (in
the case of index numbers which have arbitrary starting values).
These adjustments are only necessary when mixing data with different
publication dates, like for example considering the first published
release of each obs. The adjustments (must) assume that the recorded
change of the respective publication for the new base period is entirely
due to the rescaling, not due to further "substantive" revisions. This
is an identifying assumption which should be relatively harmless,
because when the base change happens in real time, the new base periods
are a couple of years ago already.
Also, I have drastically changed the interface, using a bundle now to
store and pass around the re-used meta parameters. So a typical usage
now looks like this:
# required meta information:
bundle bAlf = null
bAlf.fpath = "@workdir\temp.csv" # platform-specific pathsep
bAlf.vname = vname # e.g. "INDPRO"
# some examples:
series @vname_latest = getAlf_ithPub(bAlf,0)
series @vname_19830302 = getAlfVintage(bAlf,"1983-03-02")
series @vname_after3 = getAlf_nthPer(bAlf,3)
# Base revision information from the INDPRO readme file from Alfred:
bAlf.brevdates = "1943-09-22 1953-12-01 1960-01-15 1963-11-15 \ 1971-08-16 \
1985-07-18 1990-04-17 1997-01-27 2002-12-05 2005-11-07 2010-06-25"
bAlf.brevperiods = "1935/1939 1947/1949 1957 1957/1959 1967 1977 \
1987 1992 1997 2002 2007"
## now adjust extracted series to the base changes:
addAlfBrevfactors(&bAlf) # must be called before the next lines
series pubdates_after2 = 0 # requ'd for adjustment
# (to be passed in pointer form)
series @vname_after2 = getAlf_nthPer(bAlf,2,&pubdates_after2)
series @vname_after2adj = \
# same for after-3-periods series...
series pubdates_after3 = NA
series @vname_after3 = getAlf_nthPer(bAlf,3,&pubdates_after3)
series @vname_after3adj = \
# ... to calculate a revision (log-) diff:
series @vname_relrev23 = log(@vname_after3adj/@vname_after2adj)
Reminder: This still applies to a preprocessed csv file, not directly
from Alfred as-is.
Other useful stuff for the future:
- Adjusting series like real GDP for changes of base periods, where
however the new base values are not constants like 1 or 100, but are
themselves functions of the data.
- Adjusting series for structural definition shifts at certain times,
when the shifts can be reasonably assumed as additive.
this could actually be a question for the user instead of the devel
list, but since it is a pretty recent feature, I'll ask here -- maybe we
even have discussed this before.
How does it work to specify instruments in a multi-equ system that is
specified at runtime, with the 'equations' (plural) apparatus? The
list-of-lists on the RHS already contains semicolons, which would seem
to conflict with the semicolons normally used for specifiying instruments.
well, not Alfred himself, but "his" data -- I'm talking about the
real-time database in St. Louis, of course.
I'm working on some functions which would allow to access the Alfred
data directly from the (tab-delimited text) files that you can download
there, and of course I'm running into problems (which is not
unexpected). For example, consider the following attempt to get the
latest vintage/publication of each datapoint from the attached partial
nulldata 500 --preserve
setobs 12 1960:01 --time-series # monthly =12 in this example
join INDPRO_excerpt.txt INDPRO --tkey=observation_date
There are at least two issues here already:
1) I had to transform the "observation_date" column to match the monthly
frequency in the gretl workfile: Alfred uses the first day of the month
to indicate that month, like "1980-02-01" for February 1980. So I
chopped off the trailing "-01" part of that column to get
gretl-compatible monthly date strings.
In the future it would be nice if the join command could do that
automatically/internally for the column indicated by the "--tkey"
option, in order to support Alfred.
(For completeness, for quarterly data it's the same, Alfred writes e.g.
"1980-04-01" for 1980Q2.)
2) The above command still produces an error, namely:
"join: missing string in filtering"
I thought it has to do with the dot "." (which indicates NA in Alfred's
text file output, with Alfred's XLS files it's "#NV"), perhaps it wasn't
recognized by gretl/join as a string constant. So I replaced these dots
with explicit "na" in the text file, changed the --filter option
accordingly, but still the same error.
This is where I could use some ideas, really! -- Note that the
"realtime_end_date" column also holds daily date strings apart from the
dots, so maybe gretl gets confused there?
Hi, I encountered the following problem:
setobs 12 1960:1
string label = "1960:1"
string two = "1960:1 1970:1"
string labelintwo = strsplit(two,1)
print labelintwo # gives 1960:1
c1 = obsnum(1960:1)
print c1 # gives 1
c2 = obsnum(label)
print c2 # gives 1
c3 = obsnum(strsplit(two,1))
print c3 # gives NA
Of course easy to workaround, but since strsplit() is supposed to return
a string, shouldn't the last variant also work? And if it doesn't work,
shouldn't it throw an error instead?
it seems that sscanf() doesn't distinguish between \t and \n:
sprintf sc "one \t two \t three \n nextline"
string scheck = ""
Looks like a bug to me, no?
just a small fallout from my recent use of Python within gretl:
Wrapping foreign/Python code in functions doesn't work, because the
indentation is not preserved. (in the resulting gretltmp.py file)