On Thu, 16 Jan 2014, Allin Cottrell wrote:
[...]
5. Discussion: When I first introduced this idea, Sven and Jack
remarked that it would be desirable to use an extension other than
".gdt" for the XML component of a metadata/binary pair of files on the
new pattern, so as to avoid potential confusion. I can see the case
for this, but I'm not sure it's a good idea. I explain my misgivings
below.
Internally, a gdt file is just a gdt file, regardless of whether it
has a binary companion file: it's XML conforming to a common DTD. The
functions to read and write such files are in common. There are many
places in the gretl code where it's assumed that a native data file
has the ".gdt" extension and it would be a pain to go through all of
those and adjust for the possibility of another extension. In other
words, there's no internal rationale for a different extension, this
would be purely for users' convenience.
Ok.
But would it in fact be convenient for users? So far as GUI use is
concerned, I don't see any reason why users should care. The format is
mostly "hidden", all you have to bother with is (say) marking a check
box saying "Use binary format" if you have a huge data file and
write/read speed is an issue. (And I'm thinking this box might not be
shown for datasets smaller than some reasonable threshold.) It would
seem "fussy" to have a drop-down selector for different extensions in
file dialogs pertaining to native gretl data files.
This is ok as long as you're _saving_ data; some problems may arise when
you try to _read_ data. See below.
It's true, there is some possibility of confusion in CLI use. The
main
issue I see is that someone might save a dataset as binary, then later
decide to send it to a colleague or move it to another directory: in
that case she has to know to send/move the bdt file as well as the gdt.
Of course, she'd have to know to do that even if the XML component
were named differently, but there would be some visual clue if it
had, say, an ".mdt" suffix. On the other hand, it would be easy enough
to check the size of the gdt file: if you've stored tens or hundreds
of MB of data and the gdt file is 3K, there's your clue. We could also
provide a little command-line helper program that tests a gdt file and
tells you some stuff about it, including whether it has a binary
companion (this could also be provided as a GUI menu item).
The scenario you're imagining could be extended to GUI use. For example,
allow me to introduce to you coauthors Alice, Bob and Carla. Alice saves
data in non-binary format on a shared directory (or a Dropbox folder,
whatever). Later that day, Bob opens the file and adds lots of variables,
so he thinks "this is getting big, I'd better save this in binary format;
(note to self: tell Alice I modified the file)"; an hour later, Alice
wants to send the data to Carla, so she renames the gdt file to
"ABCdata_new.gdt" and emails it to Carla; while she's at it, she also zaps
all the rest of the directory away, to clean up some cruft. Carla receives
the gdt file and she can't open it, but in the meantime the binary file
has gone forever. Note that no character has ever intentionally deleted
what they knew was a data file.
This little story (which admits several variations on the same theme) is
perhaps a bit stretched here and there; and sure, there is some
carelessness involved, but losing all the data for a project out of one or
two silly moves is unacceptable. IMO, there should be some mechanism by
which the potential for confuson is kept at a minimum. The openoffice
format, for example, is a zip file containing several xml files. How about
something similar? "Foo.bdt" may be a zip file containing pre-defined xml
files: "data.gdt" (perhaps containing some additional metadata) and
"data.bin" (the binary blob); we could use a "light" compression level
so
to preserve decent speed when reading and writing.
-------------------------------------------------------
Riccardo (Jack) Lucchetti
Dipartimento di Scienze Economiche e Sociali (DiSES)
Università Politecnica delle Marche
(formerly known as Università di Ancona)
r.lucchetti(a)univpm.it
http://www2.econ.univpm.it/servizi/hpp/lucchetti
-------------------------------------------------------