On Sun, 11 Mar 2018, Sven Schreiber wrote:
Am 11.03.18 um 15:42 schrieb Allin Cottrell
> On Sun, 11 Mar 2018, Sven Schreiber wrote:
> On checking the current code, I'm reminding myself that strlen()
> does in fact count UTF-8 code points, so "a" and
"<a-umlaut>" will
> both give 1.
>
> If we enable nelem() for strings, it would add value if it counted
> bytes instead (internally, use the C library's strlen rather than
> g_utf8_strlen). One thing we do internally when trying to get
> certain strings to line up correctly under translation is compare
> strlen and g_utf8_strlen. It's conceivable that package writers
> (or at least addon writers) might also want to do
Well I hope that nelem(“”) wouldn’t return 1 then because of a
weird C-style/0 byte?
Count of bytes before the terminating NUL is the convention. Strings
have to end with (at least one) NUL byte in all programming
languages that I know of: it's how they tell that the end was
reached.
I can see the case for value added, but wouldn’t it be strange if
strlen in hansl did exactly _not_ what strlen in C does?
It's debatable, but it seemed to me that in case of non-ASCII text
it would be more intuitive in the hansl context to return the number
of "characters" (code points) rather than the number of bytes. (With
ASCII text there's no difference.)
I ‘d be in favor of a strlen variant (optional switch...) to mimic
C’s strlen with byte counting.
That could be done.
Allin