data importer improvements

Sunday, 1 June 2014

Current gretl CVS includes several improvements in respect of data 
importation. Here are the main points.

1) In relation to a file such as 
http://www.ggdc.net/maddison/maddison-project/data/mpd_2013-01.xlsx 
there were a few issues raised by Sven in 
http://lists.wfu.edu/pipermail/gretl-devel/2014-May/005091.html

This is an historical data file, not a times-series in the usual 
sense but with a time dimension. In addition, several column 
headings are far from being valid gretl variable names (e.g. they 
start with numbers or punctuation) and two of them are missing 
altogether. It was a fair amount of work to get this to open at all 
in gretl.

Now you can open such a file directly, with a row offset of 2 to 
skip the header:

open mpd_2013-01.xlsx --rowoffset=2

The column headings are automatically purged of junk and the missing 
ones are filled in with v<number>. Gretl does not treat the dataset 
as time-series, but it does import the years in the first column as 
observation markers. If you want to treat the data as annual time 
series (with many more gaps than data-years) you can now achieve 
this with

nulldata 2010
setobs 1 1 --time-series
append mpd_2013-01.xlsx --rowoffset=2

Here we force the issue by creating an annual time series running 
from the year 1 to 2010, then importing the Maddison data, whose 
observation markers are compatible with the annual dataset 
structure.

2) I recently visited FRED and downloaded an xls file containing 
daily data on Treasury Bill rates. I noticed that there were a 
couple of issues with such files.

i) The daily dates in the first column were not being recognized by 
gretl as such, because they don't use a built-in Excel date format. 
However, we now guess that if a custom numerical format is used in 
column 1 this probably implies dates.

ii) Missing values came into gretl as zeros. This is because FRED 
records NAs using the Excel formula NA(). Logical enough, but when 
gretl encounters an Excel formula it reads the result that's stored 
along with the formula, and in XLS the result stored by NA() is 0. 
Nice, not! So now when we get a 0 result from a formula we check to 
see if the formula is in fact NA().

There's also a relatively minor third issue: as the xls importer 
stood it could produce garbage in place of the name of an xls 
worksheet if the name involved "rich text" and/or "extended 
characters". Handling of sheet names in seriously non-ASCII cases is 
now better but by no means perfect.

Allin

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006