Am 16.10.20 um 23:53 schrieb Allin Cottrell:
I'm not documenting this yet because it needs more testing, but
current git I've enabled a new operation on string-valued series:
logical product. (Snapshots to follow.)
If sv1 and sv2 are string-valued series, then
series sv3 = sv1 * sv2
now yields another string-valued series with value si_sj at observations
where sv1 has value si and sv2 has value sj.
A simple example is afforded by the R dataset warpbreaks (csv version
attached). This includes two "factor" series, wool (with values "A"
"B") and tension (with values "L", "M" and "H").
If you multiply them together you get a series with 6 distinct values,
"A_L" to "B_H".
This is pretty useful, Allin! I often face the "problem" that I have two
or more string-valued series which need to be concatenated for creating
a unique identifier before one can set a panel structure.
I also ran the script below with a slightly modified data set where
column "tension_w_nan" includes an NA value. In this case the resulting
series is NA as well which makes somehow sense as multiplying a valid
number with NA yields NA. I guess that's as expected given the current
series wt = wool * tension
series wt2 = wool * tension_w_nan
series tw2 = tension_w_nan * wool
print wt tension_w_nan wt2 tw2 -o
However, one comment on the current syntax/ implementation using the
asterisk symbol: In Python/Pandas (we don't have to mimic that behavior
though!) string concatenation is usually done by the summation symbol:
<series sv3 = sv1 + sv2> -- at least that's my experience.
In case one of the RHS variables is NA, let's say sv1[i]=NA, you still
get sv3[i]=NA+sv2[i] which equals the string value of sv[i].
Also, let me go even a step further. A more flexible version may be a
function with three arguments: "sv1", "bridge" and "sv2"
(not the most ideal parameter name) may be optional allowing for a
user-defined bridging string such as underscore ("_") in the current