Re: [Gretl-devel] [Gretl-users] new command, "join"

Tuesday, 21 August 2012

On Mon, 13 Aug 2012, Allin Cottrell wrote:

...
 On Mon, 13 Aug 2012, Sven Schreiber wrote:

> [reply probably more suitable for the devel list, thus switching]
>
> On 08/09/2012 10:43 PM, Allin Cottrell wrote:
>>
>> Last month in Ancona, Jack Lucchetti, Claudia Pigini and I spent an
>> intensive week cooking up a new command for gretl. It's called
>> "join", its job is to pull together data from two or more sources
>> with the help of keys and/or filters, and -- casting modesty aside
>> -- we think it's a killer! Stata has a deservedly good reputation
>> for this sort of thing but we think that in some respects "join" may
>> put gretl ahead.
>>
>> It's in CVS and snapshots and we invite you to try it out and give
>> us your comments. Full documentation with examples of use is
>> available at:
>>
>> http://ricardo.ecn.wfu.edu/~cottrell/tmp/join.pdf (US letter)
>> http://ricardo.ecn.wfu.edu/~cottrell/tmp/join-a4.pdf (A4)
>
> Yes this looks like a "great leap forward"! Allow me some more or less
> ad-hoc reactions while browsing the documentation:
>
> * Terminology: in relational database theory, there are "inner joins"
> and "outer joins" AFAIR. In your docs, "inner" and
"outer" seem to have
> a different meaning. Maybe this can be separated somehow. ("Incumbent"
> and "incoming" perhaps? Or simply "first" and
"second"?) This would
> probably also affect the naming of --ikey and --okey.
>
> * To push this argument a little further, since gretl's join seems to
> work on single series only (which is fine!), the whole thing seems
> rather different from a database/SQL join, and the name could therefore
> be misleading. Maybe call it "importseries" or somesuch instead?

 Maybe. I guess the force of this comment depends on how wedded
 are potential users of this command to database/SQL
 terminology. I'll await Jack's reaction when he gets back
 online. 
The idea of using the word "join" was primarily inspired by the "join"

unix command, rather than the SQL JOIN statement. I admit SQL users 
may find the terms "inner" and "outer" confusing at first (but then,
the 
same goes for "left" and "right"); but how many gretl users are so
adept 
at SQL syntax to find the terminologic short-circuit problematic?

...
> * Datafile format: you note the connection to large datasets. Yet
so far
> only text format files are supported. At the risk of stating the obvious
> ("breaking into open doors" as we say in German), for large datasets
> some binary format is probably wanted -- or do you include gzipped text
> files when saying text files?

 We could read gzipped CSV without too much difficulty, though
 we don't at present. We could also apply the "join" apparatus
 to native gretl binary databases. However, our focus so far
 has been on processing big "third party" data sources, and
 these mostly seem to be in delimited text format. 
Or perhaps, fixed-format, though I haven't seen one in years.

...
> * You don't seem to mention the decimal separator issue, what
is allowed
> in this context?

 Yes, that should be mentioned in the doc. In fact, the
 handling of the decimal separator is exactly the same as for
 regular CSV reading via "open" (i.e. the decimal comma is
 supported). 
I'll be a good boy and I'll pretend I never read this, ok? ;-)

...
> * I find the '--data' option naming unintuitive or too
> generic; why not call it '--name' if it's about renaming?

 Jack originally suggested that this option should be called
 "payload". Maybe that's better than "data". 
Well, IMO "name" is just as generic as "data". I don't mind
either. 
I originally found "payload" mildly amusing. Anybody else out there with 
strong a preference?

--------------------------------------------------
  Riccardo (Jack) Lucchetti
  Dipartimento di Economia

  Università Politecnica delle Marche
  (formerly known as Università di Ancona)

  r.lucchetti(a)univpm.it
  http://www2.econ.univpm.it/servizi/hpp/lucchetti
--------------------------------------------------

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

Re: [Gretl-devel] [Gretl-users] new command, "join"