passing bundles to python (foreign block)
by Sven Schreiber
Hi,
I thought it would be nice if more object types than series and matrix
could be re-used in a python foreign block. So I'm attaching a first
attempt at a python function (gretl_bdlimport) which takes an XML text
file representing a gretl bundle (as written by gretl's bwrite), and
copies (some of) its contents to a python dictionary, with the same keys
as in the original bundle.
So far only the following data types are handled and copied, but this
could be extended:
- scalar (taken as generic float, post-process if you want an explicit
integer)
- string
- strings (array, put into a python list of strings)
This will only work under Python3, and requires xml.etree.ElementTree,
where I'm not sure what the min version for that is.
My proposal is to put this into gretl's gretl_io.py eventually.
Comments and testing welcome,
Sven
6 years, 1 month
matrix "division"
by Allin Cottrell
As you probably know, gretl supports matrix "left division" (A \ B)
and "right division" (A / B). I'll take left division as my example
here but everything applies to right division mutatis mutandis.
If A is square, A \ B gives the matrix X that solves AX = B, but if A
has more rows than columns it gives the least squares solution. So if
you just want the coefficients from an OLS regression you can do
betahat = X \ y
rather than using mols(), if you wish. The result has good accuracy;
it uses QR decomposition via lapack.
So far so good, but I recently noticed that it doesn't play very
nicely if A (m x n, m > n) happens to be of less than full rank: you
get an answer, but in general it's _not_ actually the least-squares
solution.
In git, therefore, I've now replaced the lapack function we were
calling before (dgels) with dgelsy, which uses column pivoting and
handles rank deficiency correctly.
Here's a little example that illustrates how things now work.
<hansl>
set verbose off
scalar T = 50
scalar k = 4
matrix X = ones(T,1) ~ mnormal(T,k)
matrix y = mnormal(T,1)
# regular matrix OLS
b = mols(y, X)
uh = y - X*b
SSR = uh'uh
s2 = SSR/(T-cols(X))
se = sqrt(diag(s2 * inv(X'X)))
printf "regular OLS (coeff, se):\n\n%12.6g\n", b~se
printf "SSR = %g, s2 = %g\n\n", SSR, s2
# add a perfectly collinear column and use
# left-division (QR with column pivoting)
X ~= ones(T,1)
b = X \ y
uh = y - X*b
SSR = uh'uh
s2 = SSR/(T-rank(X))
se = sqrt(diag(s2 * ginv(X'X)))
printf "Left-division, rank-deficient (coeff, se):\n\n%12.6g\n", b~se
printf "SSR = %g, s2 = %g\n\n", SSR, s2
</hansl>
The SSR will now be the same in the two cases: mols() with an X matrix
of full rank, and left division with a redundant column added to X. In
the latter case you get an extra coefficient, but X*b produces the
same result.
Allin
6 years, 1 month
Access series attributes
by Logan Kelly
Hello,
A feature that would be very handy is a way to access from Hansel the attributes of a series. In particular, it would be great to be able to access the graph name. This could be a function like
GraphName = getinfo(graph-name, series)
Or even cooler
GraphName = series.graph-name
I am posting to the development list because I think this is a feature request, but please forgive me if I am on the wrong list.
Still love gretl,
Logan
6 years, 1 month
Mentioning libsvm support in the manual
by Artur T.
Dear Jack and Allin,
I just recognized that the support for libsvm is not mentioned at all in
the manual, yet. Maybe a reference to the existing pdf on the
libsmv-library could be added.
Best,
Artur
6 years, 1 month
Standards for function package help
by Sven Schreiber
Hi,
let me start a new thread here on the devel list about the function
package help business.
First, thanks to Allin the situation is already much improved. My hope
is that the better online accessibility also leads to higher visibility
among users.
Then there are two or three pending questions I think.
1: Addons are not covered in the list. I think that's perfectly fine,
but for outsiders the difference between contributed packages and addons
is not so clear. And now the core gretl docs as well as regular package
docs are online, but the addons docs are not (or very hidden).
Two suggestions here: Add a remark at the top of the SHOW_FUNCS output
that addons are elsewhere. And secondly perhaps add the pdf help files
of the addons to the same place where
http://sourceforge.net/projects/gretl/files/manual/ links to. Because
this latter link is also on the homepage under
http://gretl.sourceforge.net/#man.
2: Another already mentioned gap are the zip packages without pdf file.
No immediate action is necessary (Jack has already said he wants to
amend the dhurdle.zip package), but my proposal would be to make a pdf
file mandatory for zip packages.
3: Then the "markdown" business for plain --or not so plain-- text help
docs. (For the beginning of the discussion see the 'New "frontier"
package Gretl-users Digest, Vol 138, Issue 14' thread on the users
list.) I still think it's a good idea. The line-length problem just
seemed to be a problem of a faulty implementation. I wouldn't make the
use of markdown mandatory to keep the entry bar low, but we could
encourage it.
My first question is whether we agree on that?
The second question is: I guess package authors should use an indicator
tag in their help text if they are indeed using markdown? Simply
"<markdown> ... </markdown>", or how would that work? Also, maybe we
only want to support or encourage the use of a subset of markdown?
Thanks for reading this long post,
Sven
6 years, 1 month
redefining "NA"
by Allin Cottrell
I posted about a possible redefinition of the internal value
corresponding to "NA" or missing in
http://lists.wfu.edu/pipermail/gretl-devel/2018-July/008926.html
(Reminder: there's a short PDF attachment which explains the
associated issues.)
I've now checked the new scheme (NA = NaN internally, rather than NA =
DBL_MAX) on all of my test scripts, all current function packages and
all the official "addons", and everything seems to be OK. But there's
one point that's maybe worth discussing before I "flip the switch" in
git and snapshots.
There's a potential compatibility issue with the gdtb binary data file
format. Backward compatibility will be preserved: there's code in
place to ensure that "new gretl" will handle old gdtb files OK
(converting missing values from DBL_MAX to NaN on reading). But
"forward compatibility" is likely to be at least partially broken.
That is, gdtb files written by "new gretl" (and containing NAs) will
not be handled correctly by "old gretl". Two comments on this:
1. It's easy enough to convert a dataset read by old gretl from a new
gdtb file to old-style NAs. The following script will do the job:
<hansl>
function void na_conv (series *y)
loop i=1..$nobs -q
y[i] = y[i]
endloop
end function
open new.gdtb
list L = dataset
loop foreach i L -q
na_conv(&$i)
endloop
</hansl>
This may look funny but it works fine, based on the way that old gretl
automatically converts non-finite values to NA on assignment to
series. A simpler variant of the na_conv function would also work:
function void na_conv (series *y)
y = y
end function
but this version would lose the descriptive labels on the series.
2. We could preserve forward as well as backward compatibility if we
were to convert new-style NAs to DBL_MAX on writing gdtb files. But
one of the benefits of the gdtb format is very fast input-output for
big datasets, and that would be compromised if we had to perform
NA conversion on all writes and reads.
My feeling is that it's worth paying the incompatibility price,
particularly since there's a workaround available.
Allin
6 years, 1 month
Compiling error
by Marcin Błażejowski
Hi,
when I was trying to compile current git I got following message:
---
../lib/src/uservar.c: In function ‘serialize_scalar_value’:
../lib/src/uservar.c:1492:14: error: ‘prn’ undeclared (first use in this
function)
fputs("NA", prn);
^~~
../lib/src/uservar.c:1492:14: note: each undeclared identifier is
reported only once for each function it appears in
---
Marcin
--
Marcin Błażejowski
6 years, 1 month
revision of index loop?
by Allin Cottrell
There's something I'd like to adjust about "index loops", that is,
constructs of the form
loop i=start..stop
...
endloop
At present such loops strictly increment the index by 1 at each
iteration (and hence do not run at all if stop < start). I'd like to
support decrement-by-1 in case stop < start, as in "loop i=10..1"
(10,9,8,...,1).
You can create such a "backwards" loop at present, either using the
more flexible (but more cumbersome) "for" construct or by adding an
auxiliary index, but it would be convenient to have the decrement case
handled automatically. And it turns out that's easy to do.
However, by itself this change would be backward-incompatible under
some conditions. For example, if some hansl code is designed to run
through combinations of elements of a vector or list of length n its
skeleton might look like this:
loop i=1..n -q
loop j=i+1..n -q
printf "(i,j) = (%d,%d)\n", i, j
# access elements i and j
endloop
endloop
When i reaches n the inner loop is not executed but under the proposed
change it would be executed, giving j a value of n+1 and provoking an
out-of-bounds error. One could say that the code above is buggy (the
outer loop should really run from 1 to n-1) but it works OK at
present.
I'm therefore thinking that if we support the proposed variant perhaps
it should require a --decrement option.
Thoughts?
Allin
6 years, 1 month
Next release: unmarked translations?
by Henrique Andrade
Dear Gretl Team,
I'm not sure if all the following strings has to be marked for
translation, but please take a look at them:
"variable 8 (explicativas): non-numeric values = 200 (100,00 percent)"
"allocating string table"
"LHS vector should be of length 146, is 145" (it appears when use gretlcli)
"nls_genr_setup failed" (it appears when use gretlcli)
"array of strings, length 384" (it appears inside a created bundle -
I'm not sure if we should mark this for translation)
"matrix: 384 x 4" (it appears inside a created bundle - I'm not sure
if we should mark this for translation)
"kalman: obsymat is 6 x 6, should be 1 x 6" (it appears on a Kalman
filter context)
"Kalman input matrices" (it appears inside a Kalman bundle)
"Kalman output matrices" (it appears inside a Kalman bundle)
"Kalman scalars" (it appears inside a Kalman bundle)
To get a better understanding about the real impact of some of that
untranslated strings, take a look at the output of that "open" command
using Gretl on a pt_BR environment:
open http://samplecsvs.s3.amazonaws.com/Sacramentorealestatetransactions.csv
We get some more untranslated strings:
interpretando C:\Users\Henrique\AppData\Roaming\gretl\Sacramentorealestatetransactions.csv...
usando o delimitador ','
maior linha: 127 caracteres
primeiro campo: 'street'
número de colunas = 12
número de variáveis: 12
número de linhas não-vazias: 986
procurando nomes de variáveis...
linha: street,city,zip,state,beds,baths,sq__ft,type,sale_date,price,latitude,longitude
procurando rótulos de linhas e dados…
variable 1 (street): non-numeric values = 985 (100.00 percent)
variable 2 (city): non-numeric values = 985 (100.00 percent)
variable 4 (state): non-numeric values = 985 (100.00 percent)
variable 8 (type): non-numeric values = 985 (100.00 percent)
variable 9 (sale_date): non-numeric values = 985 (100.00 percent)
allocating string table
tratando estes como sendo dados sem data
Best regards,
Henrique Andrade
6 years, 1 month