On Wed, 31 Oct 2007, Sven Schreiber wrote:
> If MS changed the xls format so gretl's unable to read the
newer files,
> there are a few things we may do: in the short run, we should advise
> Excel users to save their files using an older format. When we have time
> (presumably, after 1.6.6), we should IMO provide support for it.
While I guess I agree, this goes against your previously stated
preference for gretl to focus on econometrics and let other programs do
the data management. I'll remind you of it next time I have to argue
with you ;-)
Touché.
Apart from that, I think Office 2003 should be the last supported
format, I would be against MS "openxml" support for political reasons
(not even counting that it probably would be a lot of work). Which
brings me to the issue of ODF support, which gretl should have ASAP IMHO.
> From a
> quick web search, it looks like the xls 2003 format isn't officially
> documented anywhere, but apparently it has been reverse-engineered by
> the
Openoffice.org people. I _think_ gnumeric should handle it too: if
> it does, that's great, because the way gnumeric is written is much more
> similar to gretl than openoffice, and it should be much easier to rip
> code from them.
Unfortunately,
http://www.gnome.org/projects/gnumeric/features.shtml
only talks about formats up to Office XP which predates Office 2003.
For Python there seem to exist some libraries (pyExcelerator, xlrd and
xlwt), so *possibly* I might be able to write a py4gretl function to
read excel 2003 files. But of course native support would be much nicer.
I found someone who's got Office 2003 and I did an experiment. I created a
file containing
x y
1 3.2
2 3.2
3 4.1
4 4.1
and saved it in Office2003 format. Both CVS gretl and gnumeric 1.6.3 read
it perfectly, so it must be a subtler issue. Possibly, there are some record
types we can't handle properly. It'd be very useful if someone could post
an example file which gretl can't open.
IMO it all boils down to comparing how likely a use is to have data in
xls2003 format, how difficult it would be to convert it to another format
outside gretl and how much effort it would take to provide native support.
As for openoffice, ODF support would be nice. Besides, it's a
well-documented xml format. However, this is the theory; in theory,
there's no difference between theory and practice, but in practice there
is. The gnumeric C file which contains the algorithm for reading odf
spreadsheets is a 115K gorilla. True, we don't need most of the stuff they
do (charts, styles and so on), but even ripping what we need from their
implementation would be quite a big task.
Now that we have a feature request tracker, I will add two issues
there
about xls 2003 and ods support!
Excellent.
Riccardo (Jack) Lucchetti
Dipartimento di Economia
Università Politecnica delle Marche
r.lucchetti(a)univpm.it
http://www.econ.univpm.it/lucchetti