On Mon, 26 May 2014, Sven Schreiber wrote:
today I hit the limit in the GUI that the earliest year can be set
to 1500.
OK, that ought to be made more flexible.
But I was looking at the really historic time series from here:
http://www.ggdc.net/maddison/maddison-project/orihome.htm, which
actually starts at 1 A.D. It worked ok via script, but I think it also
should work via the dialog window.
Now let's see if I manage to load that gappy data into the
workfile... no, there are problems, and I think some of them are
bugs. (This is 1.9.90 on Win7.)
When I start with an empty annual dataset from 1 to 2100 and try
to append the Maddison data from an Excel worksheet (where I have
named the year column with "date"), the rows/years are not
properly matched against the inner years ("inner" in the sense
from 'join'). That's because of the (huge) gaps in the source
file. Strangely, when I use "obs" instead of "date" then gretl
says instead I must not use this as a variable name.
We're talking about (I think)
http://www.ggdc.net/maddison/maddison-project/data/mpd_2013-01.xlsx
Well, "obs" is a reserved word in gretl while "date" is not.
Normally that wouldn't matter but in this case it does because the
hugely gappy Maddison data are not recognized by gretl as any sort
of time series so gretl is apparently trying to read the first
column as data.
I also have to rename many many variables in the xls file before
gretl accepts them, and I think this is really not the optimal way
to handle this because it's very time-consuming and dull; there
should be some automagic "mangling" of the names by gretl [...]
I see that Stata has infinite patience with silly-buggers column
headings and in CVS I've made a move in that direction. But I need
to rant just a little: I suppose the data are carefully done, but
the column headings look as if they've been added by a total
computer illiterate with the attitude "I don't care; if anyone
really wants to make sense of this they'll find a way".
Then I tried to treat the whole thing as a (country) panel
structure -- but I'm noticing (for the first time although it must
have been there for ages) that when I choose "new dataset" from
the menu, the dialog forces on me the detour to specify the
overall number of obs (anybody got a calculator ready?)
Yes, on the gretl toolbar ;-)
and then afterwards only can I impose the panel structure [...]
Another suggestion: why not allow the use of a time index variable
for time series the same way that index variables are allowed for
panels?
Maybe, but this is not needed for any ordinary time series.
I haven't succeeded so far with the import, the only solution I
can think of right now are to add hundreds of empty rows to the
source file to remove the gaps.
Importing these data is not trivial since although they have a time
dimension they do not resemble any ordinary time-series. I have to
wonder what the advantage is in having them recognized as a "time
series" of sorts; I can't imagine that any time-series methods could
be applied to them.
Anyway, you can get them to appear as annual data in gretl as
follows. I began by opening the file in gnumeric and exporting it as
text, with all fields quoted. One could use ssconvert. You then have
to fix a few variable names (and add a couple of missing ones) -- or
use current CVS to have the varnames handled automatically. In
addition I gave the first column a heading of "myear" so that gretl
wouldn't try to interpret it as time. Then,
<hansl>
open maddison.txt -q
set skip_missing off
matrix X = {dataset}
string varnames = ""
nv = $nvars - 1
loop j=1..nv -q
varnames += varname(j)
varnames += " "
endloop
nulldata 2010 --preserve
# annual, starting in 1 AD
setobs 1 1 --time-series
list L = null
loop j=1..nv -q
string vname = strsplit(varnames, j)
series @vname
L += @vname
endloop
loop s=1..rows(X) -q
# year is in first col of X
scalar t = X[s,1]
loop foreach j L -q
$j[t] = X[s,j]
endloop
endloop
</hansl>
It's a bit complicated, but our importers are just not designed to
handle this sort of thing. (For anyone who hasn't seen these data,
they are "sort of" annual time series but with gaps of many
centuries between the earlier observations.)
Allin