On Sun, 11 Mar 2018, Sven Schreiber wrote:
Am 11.03.2018 um 01:02 schrieb Allin Cottrell:
> On Sat, 10 Mar 2018, Sven Schreiber wrote:
>> Of course, for bundles, arrays, and matrix we have nelem(). I was also
>> wondering why nelem() isn't working for strings. I'm aware of strlen,
>> and it's no big deal, but in principle in the hansl logic I don't see
>> why nelem("") couldn't give 0 and nelem("abc")
couldn't give 3, for
>> example.
>
> Alright, no reason why nelem() on strings shouldn't be (in effect) an
> alias for strlen() -- unless, that is, nelem() were to count UTF-8
> code-points rather than bytes! (In UTF-8, non-ASCII characters are coded
> in two or more bytes.)
It wasn't my intention to count bytes in any case. For a high-level
language such as hansl I don't see the need for that. So in my view
nelem("a") and nelem("รค") [a-umlaut, if somebody doesn't get this
displayed properly] should give the same result.
If there's a need to count bytes I'd suggest an optional switch to nelem
which would only apply to strings and then trigger that behavior. (Or
alternatively add that option to strlen.)
On checking the current code, I'm reminding myself that strlen()
does in fact count UTF-8 code points, so "a" and "<a-umlaut>"
will
both give 1.
If we enable nelem() for strings, it would add value if it counted
bytes instead (internally, use the C library's strlen rather than
g_utf8_strlen). One thing we do internally when trying to get
certain strings to line up correctly under translation is compare
strlen and g_utf8_strlen. It's conceivable that package writers
(or at least addon writers) might also want to do this.
Allin