On Sat, 17 Oct 2020, Riccardo (Jack) Lucchetti wrote:
On Sat, 17 Oct 2020, Artur Tarassow wrote:
> Am 16.10.20 um 23:53 schrieb Allin Cottrell:
>> I'm not documenting this yet because it needs more testing, but in current
>> git I've enabled a new operation on string-valued series: logical product.
>> (Snapshots to follow.)
>>
>> If sv1 and sv2 are string-valued series, then
>>
>> series sv3 = sv1 * sv2
>>
>> now yields another string-valued series with value si_sj at observations
>> where sv1 has value si and sv2 has value sj.
>>
>> A simple example is afforded by the R dataset warpbreaks (csv version
>> attached). This includes two "factor" series, wool (with values
"A" and
>> "B") and tension (with values "L", "M" and
"H").
>> If you multiply them together you get a series with 6 distinct values,
>> "A_L" to "B_H".
>
> This is pretty useful, Allin! I often face the "problem" that I have two or
> more string-valued series which need to be concatenated for creating a
> unique identifier before one can set a panel structure.
I concur: this is a very nice idea, very useful, and handled in a very
elegant way. I only have two remarks: a suggestion and a proposal.
(a) we already use the "^" operator for performing what I see as a very
similar operation on lists. It would seem to me more consistent if we used
"^" instead of "*".
Good point. I wasn't quite sure which operator symbol to borrow for
this purpose but consistency with lists makes a good argument for
'^'. I'll make that change.
(b) why not extend this syntax to string arrays, or even vectors?
Imagine how
cool it would be to do something like
<pseudo-hansl>
s = defarray("a", "b")
ss = s ^ seq(1,3)
</pseudo-hansl>
and have ss be a 6-element string array containing "a_1", "a_2" and
so on.
Would be cool, yes. But to my mind the next priority for work on
string-valued data would be arranging for string values to be used
in plots. For example, in a plot with the --dummy option we should
clearly be setting x-tics with strings, not numbers, if the discrete
x variable is string-valued.
> Also, let me go even a step further. A more flexible version may
be a
> function with three arguments: "sv1", "bridge" and
"sv2" where "bridge"
> (not the most ideal parameter name) may be optional allowing for a
> user-defined bridging string such as underscore ("_") in the current
> implementation.
Or perhaps, the "bridge" character could be a libset variable.
Yes, could be. To make the bridge an argument we'd have to implement
this via a function rather than an operator.
And/or, a fairly simple generalization that could help here is to
make strsub() and regsub() apply to string-valued series (and arrays
of strings) as well as plain strings.
One other thought: R uses "." as bridge. At first I thought we
couldn't do that since "." can't be used in a gretl identifier. But
that was a confusion: the result is not supposed to be an
identifier! So in the first instance I'm inclined to switch to dot
as default bridge character, unless anyone sees a good reason not
to.
Allin