[Gretl-devel] Re: special syntax for "end of matrix"

Wednesday, 15 September 2021

On Tue, Sep 14, 2021 at 4:35 PM Allin Cottrell <cottrell(a)wfu.edu&gt; wrote:
...

 On Tue, 14 Sep 2021, Riccardo (Jack) Lucchetti wrote:

 > [W]e currently don't have a way to indicate the element of a
 > matrix that is second-from-last, third-from-last, etc.
 >
 > At present we have to use contructs like
 >
 > <hansl>
 > b = a[3:rows(a)-1]
 > </hansl>
 >
 > In matlab, you can use the "end" keyword, as in
 >
 > <matlab>
 > b = a(3:end-1)
 > </matlab>
 >
 > I was wondering if we could have the same syntax (and ideally extend it to
 > other multidimensional objects, such as arrays).

 In current git there's a first pass at implementing this, for
 matrices. It's a bit of a hack, but a more rigorous version would be
 quite complicated. [...] 
Here's a description of what we have now. Comments welcome.

In the current, experimental implementation of indexation from the
end, the keyword 'end' is defined as the maximum signed 32-bit integer
(INT_MAX = 2147483647).  If we come across 'end-k' we replace it with
INT_MAX-k (and similarly for end/2, 0.25*end or whatever).

Usually when we come to use an index i, we check that 1 <= i <= n,
where n is the relevant dimension, and flag an error if that's not the
case. With 'end' in play, however, i is almost surely > n.  If i > n
and i > 2000000000 we defer the error and calculate

i2 = i + n - INT_MAX

If 1 <= i2 <= n, i2 will be the correct 'end'-based index.  For
example, let n = 100 and let the index be end-10. Then

i = 2147483637 and i2 = 2147483637 + 100 - 2147483647 = 90

Let's call this the 'i2 heuristic'.

Let's see, what could go wrong?

* Could the i2 heuristic block a valid indexation expression that
doesn't in fact employ 'end'? No, because it's conditional on i > n.

* Could the i2 heuristic mask an indexation error?  Only if the user
gives an ordinary i value greater than 2 billion.  Suppose n =
2000000000 and the user gives i = 2000000001. Then we'll try

i2 = 2000000001 + 2000000000 - 2147483647 = 1852516354

and i2 will 'work' as an index, spuriously.

* Could this apparatus flag as erroneous an end-based index that ought
to work? This would require that either (a) i <= 2 billion or (b) i <=
n (since these are the two cases in which case the i2 heuristic will
fail to kick in).

Case (a) requires an 'end-k' specification with k greater than

INT_MAX - 2 billion = 147483647

This could be valid only for a matrix of size greater than 1125 MB.

Case (b) requires n > INT_MAX/2. Such a matrix would be of size
greater than 8 GB. Explanation: we can figure the maximum k for any
given n: kmax(n) = n-1. That's the greatest number of steps back from
the end that you can take. Then the minimum i for given n is INT_MAX -
kmax(n). It's fairly easy to see that this will be greater than n
unless n > INT_MAX/2.

Reality check: it's unlikely that matrices of a size sufficient to
provoke the possible failure modes of the current 'end' implementation
could even be constructed on most computers. Apart from memory
limitations, all our matrix indexation at present is based on 32-bit
signed integers. That means that while, in principle, you could create
a vector of length 2-billion-plus, you could not create a matrix with
one dimension of that size and the other dimension greater than 1 --
or if you could create it, you couldn't index into it.

Allin

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

[Gretl-devel] Re: special syntax for "end of matrix"