On Wed, 26 Sep 2018, Sven Schreiber wrote:
Am 13.09.2018 um 19:42 schrieb Allin Cottrell:
> On Thu, 13 Sep 2018, Sven Schreiber wrote:
>
>> I have started some string-processing work to handle this, but I'm having
>> doubts -- what are we really targeting? It would be helpful to have a
>> real-world example.
>
> I'd be surprised if it's possible to automate the pre-processing with any
> useful generality. But as to what we're targeting, I'm attaching an example
> from the IMF (see
https://www.imf.org/en/Data , under
> "IMF Data Mapper"). IIRC, such files can sometimes have extra lines of
> metadata above or below the data block.
OK, sorry for the delay. I'm attaching an attempt to read in the data from
such a file (as you posted, imf-gdp.csv) and convert it to a gretl matrix
right away. This string processing is probably not very efficient, but OTOH
it wouldn't be done often, so top speed is not really an issue I guess.
What still has to be done is to shuffle around the various blocks of the
matrix to match the wanted dataset layout, but that shouldn't be so
difficult.
Nice work! My only question is: how sure are we that the IMF data
files are consistent in their presentation, to the extent that your
function will work on any example? Well, also: are there other data
sources where panel data are provided in a similar manner (time
series running horizontally) and if so, would they also be amenable
to your approach?
One other thought: IMO the most desirable replacement for the
existing stack() should probably also be able to handle (I suppose,
via some option) the (simpler) case where we have a dataset with one
or more blocks of N time-series of length T, where time goes
vertically but the individuals are arrayed horizontally, in separate
series, and we'd like to stack the N series into one panel series.
For example, we open a CSV file which has N columns holding GDP
1990-2017 for N countries, and we want to panelize GDP. This may be
a "join" task; I haven't thought it through yet, but I suspect that
even if its doable via join there may be an advantage in automating
it via some pstack() variant.
Allin