Re: [Gretl-devel] slight inconsistency with strstr truth value

Sunday, 11 March 2018

On Sun, 11 Mar 2018, Sven Schreiber wrote:

...
 Am 11.03.18 um 15:42 schrieb Allin Cottrell

> On Sun, 11 Mar 2018, Sven Schreiber wrote:

> On checking the current code, I'm reminding myself that strlen() 
> does in fact count UTF-8 code points, so "a" and
"<a-umlaut>" will 
> both give 1.
> 
> If we enable nelem() for strings, it would add value if it counted 
> bytes instead (internally, use the C library's strlen rather than 
> g_utf8_strlen). One thing we do internally when trying to get 
> certain strings to line up correctly under translation is compare
> strlen and g_utf8_strlen. It's conceivable that package writers
> (or at least addon writers) might also want to do

 Well I hope that nelem(“”) wouldn’t return 1 then because of a 
 weird C-style/0 byte? 
Count of bytes before the terminating NUL is the convention. Strings 
have to end with (at least one) NUL byte in all programming 
languages that I know of: it's how they tell that the end was 
reached.

...
 I can see the case for value added, but wouldn’t it be strange if 
 strlen in hansl did exactly _not_ what strlen in C does? 
It's debatable, but it seemed to me that in case of non-ASCII text 
it would be more intuitive in the hansl context to return the number 
of "characters" (code points) rather than the number of bytes. (With 
ASCII text there's no difference.)

...
 I ‘d be in favor of a strlen variant (optional switch...) to mimic 
 C’s strlen with byte counting. 
That could be done.

Allin

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

Re: [Gretl-devel] slight inconsistency with strstr truth value