On 08/20/2012 04:46 PM, Riccardo (Jack) Lucchetti wrote:
On Mon, 13 Aug 2012, Allin Cottrell wrote:
> On Mon, 13 Aug 2012, Sven Schreiber wrote:
...
>>
>> * Terminology: in relational database theory, there are "inner joins"
>> and "outer joins" AFAIR. In your docs, "inner" and
"outer" seem to have
>> a different meaning. Maybe this can be separated somehow. ("Incumbent"
>> and "incoming" perhaps? Or simply "first" and
"second"?) This would
>> probably also affect the naming of --ikey and --okey.
>>
>> * To push this argument a little further, since gretl's join seems to
>> work on single series only (which is fine!), the whole thing seems
>> rather different from a database/SQL join, and the name could therefore
>> be misleading. Maybe call it "importseries" or somesuch instead?
>
>
> Maybe. I guess the force of this comment depends on how wedded
> are potential users of this command to database/SQL
> terminology. I'll await Jack's reaction when he gets back
> online.
The idea of using the word "join" was primarily inspired by the
"join"
unix command, rather than the SQL JOIN statement. I admit SQL users may
find the terms "inner" and "outer" confusing at first (but then, the
same goes for "left" and "right"); but how many gretl users are so
adept
at SQL syntax to find the terminologic short-circuit problematic?
Ok, maybe I misunderstood the intended (non-) relation to the
functionality of relational databases.
>> * Datafile format: you note the connection to large datasets. Yet so far
>> only text format files are supported. At the risk of stating the obvious
>> ("breaking into open doors" as we say in German), for large datasets
>> some binary format is probably wanted -- or do you include gzipped text
>> files when saying text files?
>
> We could read gzipped CSV without too much difficulty, though
> we don't at present. We could also apply the "join" apparatus
> to native gretl binary databases. However, our focus so far
> has been on processing big "third party" data sources, and
> these mostly seem to be in delimited text format.
Or perhaps, fixed-format, though I haven't seen one in years.
I'm not sure, but isn't Stata's .dta format a binary (non-text) file
format? If so, then I guess many big micro datasets are distributed as
binary. (I'm thinking of the German SOEP for example.)
>> * You don't seem to mention the decimal separator issue, what is allowed
>> in this context?
>
> Yes, that should be mentioned in the doc. In fact, the
> handling of the decimal separator is exactly the same as for
> regular CSV reading via "open" (i.e. the decimal comma is
> supported).
I'll be a good boy and I'll pretend I never read this, ok? ;-)
Well the consensus rule for gretl was to enforce decimal points only in
hansl scripts, wasn't it?
>> * I find the '--data' option naming unintuitive or too
>> generic; why not call it '--name' if it's about renaming?
>
> Jack originally suggested that this option should be called
> "payload". Maybe that's better than "data".
Well, IMO "name" is just as generic as "data". I don't mind
either. I
originally found "payload" mildly amusing. Anybody else out there with
strong a preference?
Well I'm not anybody else in this discussion's context, but I don't get
the pun with payload, I must confess.
cheers,
sven