Current gretl CVS includes several improvements in respect of data
importation. Here are the main points.
1) In relation to a file such as
http://www.ggdc.net/maddison/maddison-project/data/mpd_2013-01.xlsx
there were a few issues raised by Sven in
http://lists.wfu.edu/pipermail/gretl-devel/2014-May/005091.html
This is an historical data file, not a times-series in the usual
sense but with a time dimension. In addition, several column
headings are far from being valid gretl variable names (e.g. they
start with numbers or punctuation) and two of them are missing
altogether. It was a fair amount of work to get this to open at all
in gretl.
Now you can open such a file directly, with a row offset of 2 to
skip the header:
open mpd_2013-01.xlsx --rowoffset=2
The column headings are automatically purged of junk and the missing
ones are filled in with v<number>. Gretl does not treat the dataset
as time-series, but it does import the years in the first column as
observation markers. If you want to treat the data as annual time
series (with many more gaps than data-years) you can now achieve
this with
nulldata 2010
setobs 1 1 --time-series
append mpd_2013-01.xlsx --rowoffset=2
Here we force the issue by creating an annual time series running
from the year 1 to 2010, then importing the Maddison data, whose
observation markers are compatible with the annual dataset
structure.
2) I recently visited FRED and downloaded an xls file containing
daily data on Treasury Bill rates. I noticed that there were a
couple of issues with such files.
i) The daily dates in the first column were not being recognized by
gretl as such, because they don't use a built-in Excel date format.
However, we now guess that if a custom numerical format is used in
column 1 this probably implies dates.
ii) Missing values came into gretl as zeros. This is because FRED
records NAs using the Excel formula NA(). Logical enough, but when
gretl encounters an Excel formula it reads the result that's stored
along with the formula, and in XLS the result stored by NA() is 0.
Nice, not! So now when we get a 0 result from a formula we check to
see if the formula is in fact NA().
There's also a relatively minor third issue: as the xls importer
stood it could produce garbage in place of the name of an xls
worksheet if the name involved "rich text" and/or "extended
characters". Handling of sheet names in seriously non-ASCII cases is
now better but by no means perfect.
Allin