On Tue, 21 Aug 2012, Sven Schreiber wrote:
Am 21.08.2012 01:14, schrieb Riccardo (Jack) Lucchetti:
> On Mon, 20 Aug 2012, Sven Schreiber wrote:
>
>>
>> I'm not sure, but isn't Stata's .dta format a binary (non-text) file
>> format? If so, then I guess many big micro datasets are distributed as
>> binary. (I'm thinking of the German SOEP for example.)
>
> True. I'm currently working on the SOEP database myself and, if I had to
> start from scratch now that we have "join" in gretl, I think I'd use
> Stata just to turn the whole thing into csv. Instead we had to use this
> diabolical stata add-on called "PanelWhiz". Brrrr.
But I think that's my point -- it would be good if 'join' worked on some
binary format, gretl's own formats being the obvious premier choice.
Being able to process Stata's .dta would also be nice of course, but
that's probably a luxury.
Specifically I believe that there may be a good chance to get the SOEP
team to distribute their data also in some gretl format in the medium
term, once the equivalent functionality of Stata's merge exists (and is
tested). I'm not an insider there in any way, but that's my educated guess.
I have the feeling that anything different from csv may be quite a
technical challenge, in that I can't see a way to extract a given column
from a dta file (or a gdt file, for that matter) without reading it into
memory in its entirety, but I'm no authority on this.
>>>>> * I find the '--data' option naming
unintuitive or too
>>>>> generic; why not call it '--name' if it's about
renaming?
>>>>
>>>> Jack originally suggested that this option should be called
>>>> "payload". Maybe that's better than "data".
>>>
>>> Well, IMO "name" is just as generic as "data". I
don't mind either. I
>>> originally found "payload" mildly amusing. Anybody else out there
with
>>> strong a preference?
>>>
>>
>> Well I'm not anybody else in this discussion's context, but I don't
get
>> the pun with payload, I must confess.
>
> There's no pun. I just enjoyed the idea of likening the join command to
> the space shuttle or something like that, skillfully carrying something
> precious across. Besides, the "payload" is a well-established term in
> the computer virus jargon, too.
I'm lost here. Maybe I didn't understand what --data actually does.
Well, it just tells join what data from the right-hand file you want to
bring into the left-hand dataset.
--------------------------------------------------
Riccardo (Jack) Lucchetti
Dipartimento di Economia
Università Politecnica delle Marche
(formerly known as Università di Ancona)
r.lucchetti(a)univpm.it
http://www2.econ.univpm.it/servizi/hpp/lucchetti
--------------------------------------------------