The mysterious stack() function
by Sven Schreiber
Hi,
a colleague made me notice the stack() function for handling a specific
case of panel data import in section 4.5 of the guide. I must admit I
have never been aware of that function, and I have one or two questions
here.
First, it does not appear in the function index (Gretl Command Reference
/ Functions proper, or in the built-in function documentation). Is this
an oversight, or is there a deeper reason?
Secondly, it is well documented in the guide section 4.5, but it appears
to be a strange beast: It is not a gretl command (gets function
arguments in parentheses, for example), but there are double-dash
options such as --offset or --length. I don't remember to have seen
something like this in gretl (or hansl) before.
I guess the story here is some path dependence of the early days, but I
wonder if this area could be cleaned up somehow?
thanks,
sven
5 years, 10 months
a very strange case of arima (poor)convergence
by oleg_komashko@ukr.net
Dear all,
The script below
illustrate the problem
Findings: extremely bad lnl
in comparison to --x-12-arima
zero values of the 2 last parameters
and gradients at all iterations
Strangely large scaling factor
Note that --x-12-arima gives
nice pol. roots and excellent Ljung-Box Q'
Also note that with the default $pd for
modtest --autocorr
obviously fails because zero df
open bad_data.gdt #attached
smpl 1 194
# note strange zeros for b[y_one] and b[y_two]
arima 3 0 0; 1 0 0; diff_series const y_one y_two
lnl1 = $lnl
modtest --autocorr 5
# compare
arima 3 0 0; 1 0 0; diff_series const y_one y_two --x-12-arima
lnl2 = $lnl
modtest --autocorr 5
eval lnl2 - lnl1
arima 3 0 0; 1 0 0; diff_series const y_one y_two --verbose
# Scaling y by 2.18989e+018 !!!
/*
Iteration 1: loglikelihood = -8292.74884180
Parameters: -2.8152e+016 0.65496 -0.090036 -0.16211 -0.44328 0.00000
0.00000
Gradients: 4.7608e-018 4.0881 1.2538 3.3817 -1.2644 0.00000
0.00000 (norm 7.59e-001)
Iteration 2: loglikelihood = -8292.71644305 (steplength = 0.0016)
Parameters: -2.8152e+016 0.66150 -0.088030 -0.15670 -0.44531 0.00000
0.00000
Gradients: 4.5624e-018 1.3507 -1.5003 1.0847 -1.4772 0.00000
0.00000 (norm 5.32e-001)
*/
#etc. etc
Oleh
6 years
the same name global series and package functions
by oleg_komashko@ukr.net
Dear all,
In "shadowing" thread, Allin (correctly) noticed
that there is a situation where it is difficult to
distinguish between series(int_lag) and function series (int_argument)
1) my suggested answer is incorrect, since we can
have
function series name (int i)
and
function list name (int i)
In this case it is impossible to distinguish
between name(int_lag) and name(int_arg)
The worse (note that I have obtained the
same on Windows10, 2018c and Ubuntu18.04,
today's Git
<hansl>
include lp-mfx.gfn
open keane.gdt -q
# the following line simulate the
situation when a data file
already has 'mlogit_mfx' series
series mlogit_mfx = normal()
smpl (year==87) --restrict
logit status 0 educ exper expersq black --multinomial -q
bundle b = mlogit_mfx(status, $xlist, $coeff, $vcv, $sample)
lp_mfx_print(&b)
catch list z = mlogit_mfx(-1 to -2)
ser = mlogit_mfx
eval mean(ser(-1))
########### output
? open keane.gdt -q
Read datafile /usr/local/share/gretl/data/misc/keane.gdt
? series mlogit_mfx = normal()
Generated series mlogit_mfx (ID 19)
? smpl (year==87) --restrict
Full data set: 12723 observations
Current sample: 1738 observations
? logit status 0 educ exper expersq black --multinomial -q
? bundle b = mlogit_mfx(status, $xlist, $coeff, $vcv, $sample)
? lp_mfx_print(&b)
Multinomial logit marginal effects
(evaluated at means of regressors)
note: dp/dx based on discrete change for black
Outcome 1: (status = 1, Pr = 0.0304)
dp/dx s.e. z pval xbar
educ 0.010826 0.0018373 5.8924 3.8069e-09 12.549
exper -0.020829 0.0054490 -3.8225 0.00013211 3.4403
expersq 0.0019936 0.00073506 2.7122 0.0066841 17.199
black -0.011001 0.0076838 -1.4318 0.15221 0.37973
Outcome 2: (status = 2, Pr = 0.1434)
dp/dx s.e. z pval xbar
educ -0.045462 0.0039535 -11.499 1.3298e-30 12.549
exper -0.11360 0.012550 -9.0517 1.4080e-19 3.4403
expersq 0.0076209 0.0016081 4.7392 2.1456e-06 17.199
black 0.065872 0.019682 3.3468 0.00081740 0.37973
Outcome 3: (status = 3, Pr = 0.8263)
dp/dx s.e. z pval xbar
educ 0.034636 0.0042808 8.0911 5.9137e-16 12.549
exper 0.13443 0.013542 9.9268 3.1829e-23 3.4403
expersq -0.0096146 0.0017275 -5.5654 2.6150e-08 17.199
black -0.054870 0.020909 -2.6243 0.0086834 0.37973
? catch list z = mlogit_mfx(-1 to -2)
> z = mlogit_mfx(-1 to
The symbol 'to' is undefined
? ser = mlogit_mfx
Generated series ser (ID 20)
? eval mean(ser(-1))
-0.017113513
We see that both running package functions
and 'ser = mlogit_mfx' worked ok
But in 'list z = mlogit_mfx(-1 to -2)'
'mlogit_mfx' is interpreted as function
In my opinion this phenomenon can easily be
avoided since no function can have -1 to -2 as
argument
Much worse is mlogit_mfx(-1)
A possible solution is deprecating mlogit_mfx(-1)
and substitution for mlogit_mfx(-1 to -1)
To the contrast,
<hansl>
open keane.gdt -q
series lp_mfx_print = normal()
include lp-mfx.gfn
smpl (year==87) --restrict
logit status 0 educ exper expersq black --multinomial -q
bundle b = mlogit_mfx(status, $xlist, $coeff, $vcv, $sample)
lp_mfx_print(&b)
<hansl>
Note the difference between the two scripts:
1-st a series have the same name with mlogit_mfx()
2-nd a series have the same name with lp_mfx_print()
The rest is the same
Also note that confusing named series are not inserted anywhere
Here I have
? open keane.gdt -q
Read datafile /usr/local/share/gretl/data/misc/keane.gdt
? series lp_mfx_print = normal()
Generated series lp_mfx_print (ID 19)
? include lp-mfx.gfn
/home/oleh/.gretl/functions/lp-mfx.gfn
lp-mfx 0.4, 2016-11-10 (Allin Cottrell)
? smpl (year==87) --restrict
Full data set: 12723 observations
Current sample: 1738 observations
? logit status 0 educ exper expersq black --multinomial -q
? bundle b = mlogit_mfx(status, $xlist, $coeff, $vcv, $sample)
? lp_mfx_print(&b)
> lp_mfx_print(&b)
Incomplete expression
Syntax error
Error executing script: halting
> lp_mfx_print(&b)
May be, a pointer argument?
So we need at least two corrections
1) some functions stop working
2) something should be done with series(int_lag)
Oleh
6 years
again processor-detecting dependent behavior in arima
by oleg_komashko@ukr.net
Dear all,
again processor-detecting dependent behavior in arima
1) script
eval $sysinfo
open bad_data.gdt #attached
smpl 1 194
series sty=diff_series/sd(diff_series)
list zli = y_one y_two
y = sty+6.48
arima 3 0 0; 1 0 0; y 0 zli --verbose
Note: on the same pc and os blascore = Prescottl; blascore = Atom
2) pc info
Motherboard:
CPU Type QuadCore Intel Pentium N3540, 2666 MHz (32 x 83)
Motherboard Name Lenovo B50-10
Motherboard Chipset Intel Bay Trail-M
System Memory 3978 MB
DIMM1: SK hynix HMT451S6BFR8A-PB 4 GB DDR3-1600 DDR3 SDRAM (11-11-11-28 @ 800 MHz) (10-10-10-27 @ 761 MHz) (9-9-9-24 @ 685 MHz) (8-8-8-22 @ 609 MHz) (7-7-7-19 @ 533 MHz) (6-6-6-16 @ 457 MHz) (5-5-5-14 @ 380 MHz)
BIOS Type Unknown (04/14/2015)
Communication Port Последовательный порт (COM1)
3) output 1, system installed
# Output 1, 2018d-git, wordlen = 64
gretl version 2018d-git
Current session: 2018-10-26 14:48
? eval $sysinfo
bundle anonymous:
nproc = 4
blascore = Prescott
hostname = DESKTOP-DE5ESQO
os = windows
mpi = 0
blas = openblas
omp_num_threads = 4
omp = 1
blas_parallel = OpenMP
mpimax = 4
wordlen = 64
? open bad_data.gdt
Read datafile C:\Users\Lenovo\Documents\gretl\bad_data.gdt
periodicity: 4, maxobs: 204
observations range: 1950:1 to 2000:4
Listing 4 variables:
0) const 1) diff_series 2) y_one 3) y_two
? smpl 1 194
Full data range: 1950:1 - 2000:4 (n = 204)
Current sample: 1950:1 - 1998:2 (n = 194)
? series sty=diff_series/sd(diff_series)
Generated series sty (ID 4)
? list zli = y_one y_two
Generated list zli
? y = sty+6.48
Generated series y (ID 5)
? arima 3 0 0; 1 0 0; y 0 zli --verbose
NLS: failed to converge after 1605 iterations
Error executing script: halting
> arima 3 0 0; 1 0 0; y 0 zli --verbose
4) output 2 the same pc and os, 2018c, portable
gretl version 2018c
Current session: 2018-10-26 14:50
? eval $sysinfo
bundle anonymous:
nproc = 4
blascore = Atom
hostname = DESKTOP-DE5ESQO
os = windows
mpi = 0
blas = openblas
omp_num_threads = 4
omp = 1
blas_parallel = OpenMP
mpimax = 4
wordlen = 32
? open bad_data.gdt
Read datafile C:\Users\Lenovo\Documents\gretl\bad_data.gdt
periodicity: 4, maxobs: 204
observations range: 1950:1 to 2000:4
Listing 4 variables:
0) const 1) diff_series 2) y_one 3) y_two
? smpl 1 194
Full data range: 1950:1 - 2000:4 (n = 204)
Current sample: 1950:1 - 1998:2 (n = 194)
? series sty=diff_series/sd(diff_series)
Generated series sty (ID 4)
? list zli = y_one y_two
Generated list zli
? y = sty+6.48
Generated series y (ID 5)
? arima 3 0 0; 1 0 0; y 0 zli --verbose
ARMA initialization: using nonlinear AR model
Iteration 1: loglikelihood = -136.938951717
Parameters: 6.4637 0.58059 -0.11046 -0.082556 -0.093679 0.10601
-1.9068
Gradients: 9.2712 10.471 1.3218 -2.0031 -2.1470 3.9715
-2.8401 (norm 3.22e+000)
Iteration 2: loglikelihood = -136.689823002 (steplength = 0.0016)
Parameters: 6.4785 0.59734 -0.10835 -0.085761 -0.097114 0.11237
-1.9113
Gradients: 3.7130 3.7662 -2.9409 -3.1750 -1.7423 -1.1818
-2.5919 (norm 2.14e+000)
Iteration 3: loglikelihood = -136.635783208 (steplength = 0.0016)
Parameters: 6.4839 0.60234 -0.11626 -0.092855 -0.10056 0.10774
-1.9166
Gradients: 1.9617 4.8733 1.0840 1.5885 1.2310 3.6716
-0.80868 (norm 1.60e+000)
Iteration 4: loglikelihood = -136.600859241 (steplength = 0.008)
Parameters: 6.4784 0.61437 -0.13175 -0.090207 -0.091373 0.11082
-1.9254
Gradients: 3.9294 3.0305 2.2406 1.5760 0.43245 2.7506
-0.97313 (norm 2.07e+000)
Iteration 5: loglikelihood = -136.590682758 (steplength = 0.008)
Parameters: 6.4844 0.61344 -0.13942 -0.082111 -0.082786 0.11045
-1.9365
Gradients: 1.7762 4.1148 3.1480 0.21233 -0.75300 3.1392
-1.2209 (norm 1.57e+000)
Iteration 6: loglikelihood = -136.579257175 (steplength = 0.008)
Parameters: 6.4832 0.61301 -0.13869 -0.079072 -0.080908 0.11078
-1.9511
Gradients: 2.1798 3.8598 2.7328 -0.11127 -0.24403 3.3903
-0.55423 (norm 1.62e+000)
Iteration 7: loglikelihood = -136.558904481 (steplength = 0.04)
Parameters: 6.4846 0.59869 -0.13673 -0.098319 -0.057675 0.13257
-1.9918
Gradients: 1.7338 3.3469 1.7212 0.78241 -1.7498 3.2189
-0.55661 (norm 1.47e+000)
Iteration 8: loglikelihood = -136.521260314 (steplength = 0.04)
Parameters: 6.4876 0.56033 -0.15574 -0.11922 -0.052945 0.20112
-2.0607
Gradients: 0.32695 2.4407 0.16140 -1.6302 -2.7718 -1.1861
-1.7788 (norm 1.05e+000)
Iteration 9: loglikelihood = -136.495894655 (steplength = 1)
Parameters: 6.4896 0.58297 -0.15220 -0.11017 -0.066710 0.18134
-2.0469
Gradients: -0.43210 -0.65107 -0.17364 0.15633 0.59707 0.32961
0.36414 (norm 7.63e-001)
Iteration 10: loglikelihood = -136.490592511 (steplength = 1)
Parameters: 6.4886 0.57213 -0.15608 -0.11690 -0.061996 0.19671
-2.0648
Gradients: -0.097512 -0.22807 -0.23333 -0.23077 -0.058053 -0.28839
-0.074467 (norm 3.86e-001)
Iteration 11: loglikelihood = -136.490080201 (steplength = 1)
Parameters: 6.4884 0.56792 -0.15787 -0.11988 -0.060088 0.20265
-2.0736
Gradients: -0.026271 -0.042462 -0.11705 -0.17211 -0.13359 -0.25929
-0.11138 (norm 2.74e-001)
Iteration 12: loglikelihood = -136.489973117 (steplength = 1)
Parameters: 6.4883 0.56748 -0.15809 -0.12036 -0.059698 0.20303
-2.0754
Gradients: 0.014754 0.015626 -0.015933 -0.033397 -0.049127 -0.037086
-0.030232 (norm 1.62e-001)
Iteration 13: loglikelihood = -136.489964277 (steplength = 1)
Parameters: 6.4883 0.56727 -0.15829 -0.12056 -0.059753 0.20348
-2.0763
Gradients: -0.0068714 -0.0023991 6.6724e-006 0.00028924 0.0033536 -0.0024461
0.00092363 (norm 8.33e-002)
Iteration 14: loglikelihood = -136.489964105 (steplength = 1)
Parameters: 6.4883 0.56731 -0.15825 -0.12052 -0.059747 0.20339
-2.0762
Gradients: 0.0010810-8.4809e-005 1.7192e-005 0.00019304 0.00019005 0.00043713
0.00024676 (norm 3.32e-002)
Iteration 15: loglikelihood = -136.489964103 (steplength = 1)
Parameters: 6.4883 0.56731 -0.15825 -0.12053 -0.059747 0.20339
-2.0762
Gradients: -0.00012818 5.0844e-005 2.7890e-005 2.0559e-005-3.7774e-005 2.5363e-005
-3.0714e-005 (norm 1.16e-002)
Iteration 15: loglikelihood = -136.489964103 (steplength = 1)
Parameters: 6.4883 0.56731 -0.15826 -0.12053 -0.059747 0.20339
-2.0762
Gradients: -0.00012818 5.0844e-005 2.7890e-005 2.0559e-005-3.7774e-005 2.5363e-005
-3.0714e-005 (norm 1.16e-002)
--- FINAL VALUES:
loglikelihood = -136.489964103 (steplength = 5.12e-007)
Parameters: 6.4883 0.56731 -0.15826 -0.12053 -0.059747 0.20339
-2.0762
Gradients: -0.00012818 5.0844e-005 2.7890e-005 2.0559e-005-3.7774e-005 2.5363e-005
-3.0714e-005 (norm 1.16e-002)
Function evaluations: 47
Evaluations of gradient: 15
Model 1: ARMAX, using observations 1950:1-1998:2 (T = 194)
Estimated using AS 197 (exact ML)
Dependent variable: y
Standard errors based on Hessian
coefficient std. error z p-value
---------------------------------------------------------
const 6.48832 0.0467055 138.9 0.0000 ***
phi_1 0.567313 0.159434 3.558 0.0004 ***
phi_2 −0.158255 0.107503 −1.472 0.1410
phi_3 −0.120526 0.130267 −0.9252 0.3549
Phi_1 −0.0597470 0.121899 −0.4901 0.6240
y_one 0.203395 0.233933 0.8695 0.3846
y_two −2.07615 0.389196 −5.334 9.58e-08 ***
Mean dependent var 6.480000 S.D. dependent var 1.000000
Mean of innovations −0.001117 S.D. of innovations 0.488410
Log-likelihood −136.4900 Akaike criterion 288.9799
Schwarz criterion 315.1228 Hannan-Quinn 299.5659
Real Imaginary Modulus Frequency
-----------------------------------------------------------
AR
Root 1 1.0476 -1.1562 1.5602 -0.1328
Root 2 1.0476 1.1562 1.5602 0.1328
Root 3 -3.4083 0.0000 3.4083 0.5000
AR (seasonal)
Root 1 -16.7372 0.0000 16.7372 0.5000
-----------------------------------------------------------
Oleh
6 years
missing values being forward-filled in lagged variables
by Pozdeev, Igor
Hi all,
This looks like a feature but is a bug by my standards: missing values are being forward-filled when lags are taken of variables. In the screenshot attached, the leftmost panel is the original variable, with two missing values, the central panel is the same variables shifted by one period, and the third - the same variables shifted by two periods. Observe how the value 7.1500 fills in the gaps.
[cid:image001.png@01D46D13.CE083E80]
Is there a reason for this behavior?
Best,
Igor
Igor Pozdeev
Visiting Scholar
NYU Stern School of Business
44 West 4th Street, 9-66
New York, NY 10012
+1-917-657-1120
www.igorpozdeev.me<http://www.igorpozdeev.me/>
6 years
bread() issue with empty list
by Artur T.
Dear all,
I am currently using Gretl 2018d-git (2018-10-11) on Win10 (actually the
same happens on Linux). Trying to read a bundle where some list is empty
results in an error:
<hansl>
clear
open denmark.gdt -q
bundle b = null
list L = LRM
b.L = L
list X = null # putting "LRY" into the list would work
b.X = X
print b
bwrite(b, "foob")
bundle b2 = bread("foob")
print b2 # results in: 'X': got NULL data value
</hansl>
Best,
Artur
6 years
Adding the EIA database via the API
by Johannes Lips
Hi all,
I just stumbled across the fact, that the EIA is providing some or
most of their data through an API. [1] I noticed that you need to
register, but perhaps there might be a way around it, when we get in
touch with them and explain the possible use case and the options to
them.
I don't know if there's a proper process for adding possible new
databases to the gretl database offerings, but I wanted to explore the
possibility if we could access their API with gretl.
All the best
Johannes
[1] https://www.eia.gov/opendata/register.php
6 years
reserved words as names in foreign datasets
by oleg_komashko@ukr.net
Dear all,
Below is content of attached data file
index,ltm,x1,x2,x3,ols
1,0.098527655736295,0.14198429261587,-0.191116784992981,0.22224443607502,-1.13946634223632
Of course trying to open it generates
using delimiter ','
longest line: 92 characters
first field: 'index'
number of columns = 6
number of variables: 6
number of non-blank lines: 2
scanning for variable names...
line: index,ltm,x1,x2,x3,ols
'ols' is a reserved word
For .csv I can easy change 'ols' anywhere
For .xls(x) I have LibreOffice if
I use Ubuntu, or if I do not want to be a pirate
How about stata, etc?
I think, gretl could change such names
I mean only reserved words: this would open
files that used to fail to be opened
So a new possibility and no backward-incompatible changes
If substitute ols_ for ols everything works
Oleh
6 years
"shadowing" over-diagnostics
by oleg_komashko@ukr.net
Dear all,
If we create a function having a name coinsiding
with the name of a local variable in a package,
then calling a package function would print
In regard to function function_name (package package_nam):
Warning: 'some_name' shadows a function of the same name
Example
<hansl>
include lp-mfx.gfn
function void den(scalar x)
eval floor(x)
end function
open keane.gdt -q
smpl (year==87) --restrict
logit status 0 educ exper expersq black --multinomial -q
bundle b = mlogit_mfx(status, $xlist, $coeff, $vcv, $sample)
<hansl>
The cause:
The package function mlogit_pj() has
a local variable named "den"
Oleh
6 years
constant mis-specification and arima convergence problem
by oleg_komashko@ukr.net
Dear all,
########## redundant constant
open denmark.gdt
# note a decent model:
arima 2 1 0; LRM --nc
modtest --autocorr 4
#
# Ljung-Box Q' = 2.40961,
# with p-value = P(Chi-square(2) > 2.40961) = 0.2997
# Real Imaginary Modulus Frequency
# -----------------------------------------------------------
# AR
# Root 1 -1.5828 0.0000 1.5828 0.5000
# Root 2 1.4501 0.0000 1.4501 0.0000
# -----------------------------------------------------------
# so models below
# are seasonally over-differenced
set bfgs_toler default
catch arima 1 1 0; 0 1 1; LRM
err = $error
eval errmsg(err)
arima 1 1 0; 0 1 1; LRM --nc
ma = 10^-5|$coeff
tol = 10^-9
set bfgs_toler tol
set initvals ma
arima 1 1 0; 0 1 1; LRM --verbose
arima 1 1 0; 0 1 1; LRM --x-12-arima
# Note
# Scaling y by 93455.8
set bfgs_toler default
catch arima 1 1 0; 1 1 0; LRM
err = $error
eval errmsg(err)
# arima 1 1 0; 1 1 0; LRM --verbose
# Scaling y by 93455.8
ols diff(LRM) 0 -s
arima 1 1 0; 1 1 0; LRM --nc
ma = 10^-5|$coeff
set bfgs_toler default
set initvals ma
arima 1 1 0; 1 1 0; LRM
arima 1 1 0; 1 1 0; LRM --x-12-arima
# here we have almost the same estimates
# and log-lik
# pol. roots are very decent
#### the main indicator is high p-value at const
# So another situation with bad
# convergence is when we have a const
# with very high p-value
# i.e. redundant const
# some indicators: slowly growing lnl
# and small gradients
############# missing constant
open greene5_1.gdt
logs *
# a decent model
arima 0 1 2; l_realcons
# looking at p-value for theta_1 (0.9356)
# select
matrix qvec = {2}
arima 0 1 qvec; l_realcons
modtest --autocorr
# Test for autocorrelation up to order 4
#
# Ljung-Box Q' = 1.34035,
# with p-value = P(Chi-square(3) > 1.34035) = 0.7196
# pol. roots are well behaved
arima 0 1 qvec; l_realcons --nc
modtest --autocorr
# no problem with convergence but autocorrelation
# We consider
arima 1 1 1; l_realcons
modtest --autocorr
# well-behaved roots but autocorrelation
# Note p-value on const is 1.05e-029
set bfgs_toler default
scalar reallyhuge = 2*$huge
set bfgs_maxgrad reallyhuge
arima 1 1 1; l_realcons --nc
arima 1 1 1; l_realcons --nc --x-12-arima
# some indicators: large norm of gradient
# and nearly unit circle roots
Oleh
6 years