Am 27.09.2018 um 02:00 schrieb Allin Cottrell:
On Wed, 26 Sep 2018, Sven Schreiber wrote:
> OK, sorry for the delay. I'm attaching an attempt to read in the data
> from such a file (as you posted, imf-gdp.csv) and convert it to a
> gretl matrix right away. This string processing is probably not very
> efficient, but OTOH it wouldn't be done often, so top speed is not
> really an issue I guess.
>
> What still has to be done is to shuffle around the various blocks of
> the matrix to match the wanted dataset layout, but that shouldn't be
> so difficult.
Nice work! My only question is: how sure are we that the IMF data
files are consistent in their presentation, to the extent that your
function will work on any example? Well, also: are there other data
sources where panel data are provided in a similar manner (time series
running horizontally) and if so, would they also be amenable to your
approach?
The orientation of the data is a matter of the to-be-written second (or
top-level) function. Probably the user would have to specify whether
time is in rows or columns and so on.
This present function just reads in a rectangular data array (matrix)
from a text file, automatically discarding rows that for whatever reason
differ in their length.
(These could be empty rows or descriptive stuff at the bottom, or rows
with just the variable names in between.)
This should even cover missing data (unbalanced panels for example) as
long as the layout is coherent in that missing data (a) either has some
string code or (b) the entries are simply empty but the entry as such is
signalled by the correct amount of column separators. But this hasn't
been tested yet.
So right now I'm somewhat optimistic that it would work.
One other thought: IMO the most desirable replacement for the
existing
stack() should probably also be able to handle (I suppose, via some
option) the (simpler) case where we have a dataset with one or more
blocks of N time-series of length T, where time goes vertically but
the individuals are arrayed horizontally, in separate series, and we'd
like to stack the N series into one panel series.
For example, we open a CSV file which has N columns holding GDP
1990-2017 for N countries, and we want to panelize GDP. This may be a
"join" task; I haven't thought it through yet, but I suspect that even
if its doable via join there may be an advantage in automating it via
some pstack() variant.
If I understand correctly the resulting matrix from my present function
would be TxN in this case, and since gretl's native panel format is
stacked time series (right?) doing vec(M) should then be enough, as in
"series GDP = vec(M)", possibly preceded by "nulldata nelem(M)
--preserve" and "setobs 1 1:1 --stacked-time-series" or something.
cheers,
sven