[Gretl-devel] String-valued series encoding

Monday, 30 September 2019

Hi all,

I've had a recurring problem with string-valued series, and I'm struggling 
to find a solution.

Suppose you have two or more string-valued series that you get from a csv 
or Stata file, and that they represent comparable variables, so they 
contain the same strings. Currently, we encode string-valued series by 
creating string arrays that get filled by occurrence; this, however, 
impleis that there is no guarantee that the correspondence between 
internal numerical values and strings is the same for the different 
series. This makes it awkward to read the output from commands such as 
freq or xtab.

Writing a script to correct for that has proven quite difficult, and what 
I was able to come up with is VERY far from elegant. An example script 
follows, and suggestions are much appreciated.

<hansl>
set verbose off

function series string_reorder(strings new, series x)
     strings ss = strvals(x)
     n = nelem(ss)
     m = nelem(new)
     series tmp = NA
     loop i = 1 .. n --quiet
         si = ss[i]
         k = 0
         loop j = 1 .. m --quiet
             if si == new[j]
                 k = j
                 break
             endif
         endloop

         if k>0 # found
             tmp = (x == si) ? k : tmp
         endif
     endloop
     return tmp
end function

clear
set verbose off
outfile &quot;(a)dotdir/tmp.csv&quot;
     printf "var1,var2,var3\n"
     printf "a,b,c\na,c,b\nb,b,b\nc,a,b\na,b,c\na,c,c"
end outfile

open &quot;(a)dotdir/tmp.csv&quot; --quiet

print var1 var2 --byobs
xtab var1 var2 # no good

# record encoding for var1
ss = strvals(var1)

# note: you can't just assign to var2
var2new = string_reorder(ss, var2)
stringify(var2new, ss)
delete var2
rename var2new var2

# values are the same, but the encoding is reordered
print var1 var2 --byobs

xtab var1 var2 # better
</hansl>

-------------------------------------------------------
   Riccardo (Jack) Lucchetti
   Dipartimento di Scienze Economiche e Sociali (DiSES)

   Università Politecnica delle Marche
   (formerly known as Università di Ancona)

   r.lucchetti(a)univpm.it
   http://www2.econ.univpm.it/servizi/hpp/lucchetti
-------------------------------------------------------

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

[Gretl-devel] String-valued series encoding