On Wed, 15 Jul 2020, Allin Cottrell wrote:
On Wed, 15 Jul 2020, Artur Tarassow wrote:
> But what about the case when adding the " --permanent" flag?
I can see a case for shrinking the strings array when the
--permanent option is given, though it's not totally clear-cut.
Here's a follow-up. You could think of this as a prototype of what
we might do internally with a string-valued series on permanent
sub-sampling.
First, a little CSV file:
<file name="strfoo.csv">
x,s
1,"aaa"
2,"bbb"
3,"ddd"
4,"ccc"
5,"aaa"
6,"ddd"
7,"eee"
8,"bbb"
9,"aaa"
</file>
Then a script that sub-samples it permanently and revises the range
and coding of the string variable appropriately:
<hansl>
function void string_recode (series *y, series s)
matrix snums = uniq(s)
strings S = strvals(s)[snums]
y = replace(s, snums, seq(1, nelem(snums)))
stringify(y, S)
end function
open strfoo.csv -q
series scodes = s # get the numeric codes
print -o
Ss = strvals(s)
print Ss
smpl x % 2 == 0 --restrict --permanent
series y
string_recode(&y, s)
series ycodes = y
print -o
Sy = strvals(y)
print Sy
</hansl>
In the main script (but not in a function) one could finish the job
with:
delete s
rename y s
Allin