Re: [Gretl-devel] slight inconsistency with strstr truth value

Sunday, 11 March 2018

On Sun, 11 Mar 2018, Sven Schreiber wrote:

...
 Am 11.03.2018 um 01:02 schrieb Allin Cottrell:
> On Sat, 10 Mar 2018, Sven Schreiber wrote:

>> Of course, for bundles, arrays, and matrix we have nelem(). I was also 
>> wondering why nelem() isn't working for strings. I'm aware of strlen, 
>> and it's no big deal, but in principle in the hansl logic I don't see 
>> why nelem("") couldn't give 0 and nelem("abc")
couldn't give 3, for 
>> example.
> 
> Alright, no reason why nelem() on strings shouldn't be (in effect) an 
> alias for strlen() -- unless, that is, nelem() were to count UTF-8 
> code-points rather than bytes! (In UTF-8, non-ASCII characters are coded 
> in two or more bytes.)

 It wasn't my intention to count bytes in any case. For a high-level 
 language such as hansl I don't see the need for that. So in my view 
 nelem("a") and nelem("ä") [a-umlaut, if somebody doesn't get this

 displayed properly] should give the same result.
 If there's a need to count bytes I'd suggest an optional switch to nelem 
 which would only apply to strings and then trigger that behavior. (Or 
 alternatively add that option to strlen.) 
On checking the current code, I'm reminding myself that strlen() 
does in fact count UTF-8 code points, so "a" and "<a-umlaut>"
will 
both give 1.

If we enable nelem() for strings, it would add value if it counted 
bytes instead (internally, use the C library's strlen rather than 
g_utf8_strlen). One thing we do internally when trying to get 
certain strings to line up correctly under translation is compare
strlen and g_utf8_strlen. It's conceivable that package writers
(or at least addon writers) might also want to do this.

Allin

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

Re: [Gretl-devel] slight inconsistency with strstr truth value