December 2014 - Gretl-devel - gretlml.univpm.it

by Allin Cottrell

Hello all, I've been trying to make a dent in the backlog of newish stuff in gretl that's not yet documented. Step 1 is a document titled "String-valued series in gretl": you can find it at http://users.wfu.edu/cottrell/tmp/gretl-strvars.pdf Comments welcome. Happy holidays, Allin

10 years, 6 months

3
4
0 / 0

msortby() stumbles over NA / nan

by Sven Schreiber

Hi, it seems that msortby() only sorts within "blocks" surrounded by occurrences of NA (showing up as nan in matrices). The rest of the sorting-like functions seem to work ok. Example: <hansl> matrix in = {2; 1; NA; 0; -5} print in matrix check = msortby(in, 1) # not ok print check matrix check = sort(in) # ok print check matrix check = dsort(in) # ok print check matrix check = values(in) # ok print check matrix check = uniq(in) # ok print check </hansl> Thanks, sven

10 years, 7 months

3
17
0 / 0

pca bug and issues reloaded

by Sven Schreiber

Hi, there was an open-ended thread initiated by Paulo Grahl http://lists.wfu.edu/pipermail/gretl-users/2013-December/009475.html) about gretl's 'pca' command. I checked again and I think there still --with a very recent snapshot-- is a bug, although slightly different from Paulo's experience. Here's an example script: <hansl> open denmark list vars = IBO IDE # compare 'pca' and 'princomp()' in the full sample matrix P1 = princomp({vars}, 1) pca vars --save=1 # turns out they coincide; good # now compare them in the reduced sample smpl 1980:1 1985:1 matrix P2 = princomp({vars}, 1) pca vars --save=1 # matrix and series differ; bad # check if the PCs are different in the overlapping range if sum(PC1 - PC11) > 0.01 # PC naming is fragile... print "ok" else print "PCs are the same although they should differ" # I get this endif smpl --full </hansl> Summary: The 'princomp()' function seems to work fine, but 'pca' apparently uses the full sample for calculating the pca, even if a reduced sample is specified. What's different from Paulo's report is that the PCs are saved only over the reduced sample range (but the values are still wrong). I also would like to (re-) raise some other issues with pca: - Accessor for the loadings, as suggested by Henrique (http://lists.wfu.edu/pipermail/gretl-users/2012-March/007346.html) and in terms of the princomp() function by myself. Allin answered that it's easy to get them as the eigenvectors of the correlation matrix. This is of course correct, but first it's a convenience issue, and secondly if you perhaps want to do some simulations it seems like an avoidable inefficiency to compute the eigenvectors twice (first implicitly in princomp, and then explicitly by hand). - Automatic printing of the workfile variables: When using 'pca' to save some PCs to the workfile, gretl automatically prints all the variables in the workfile. IMHO this contaminates the script output for no good reason (I currently have thousands of variables in there, and it really is a long list in the output...). So could this be switched off? Thanks, sven

10 years, 7 months

2
2
0 / 0

"join" news

by Allin Cottrell

Some news regarding gretl's "join" command (importation of data with lots of options). These points are in the current documentation for "join" in the User's Guide, but I thought it would be worth explicitly drawing them to people's attention. 1) I've mentioned this before but only in passing: besides "CSV" (delimited text) files you can now join from gretl-native gdt or gdtb (binary) files. 2) More recent: you can now pull multiple series from the source file in one command. I'll expand on the second point. When we first wrote "join" we were wrestling with a lot of complexity (key-matching, filtering, aggregation) and we simplified matters by stipulating that only a single series could be operated on at a time. Now that the join code has stabilized, we've found it feasible to support "batch" importation of series. This is subject to two limitations: 1) When importing multiple series, the --data option (which permits renaming of a single series on import) is not available. You have to accept the names of series as they appear in the source data file (or as "fixed up" by gretl, if need be). 2) You only get one set of key-matching, filtering and aggregation options; these options are applied uniformly to all series specified in a single command. So if you want to import several series but with different keys, filters or aggregation methods, you still need separate instances of the "join" command. How do you ask for multiple series? You just replace the second (series-name) argument to "join" with either (a) several series names, separated by spaces, or (b) the name of an array-of-strings variable that holds the names of the series you want. My motivation for setting this up is that this semester I've been helping some students construct datasets from the PUMS (Public Use Microdata Sample) made available by the US Census Bureau. These are BIG files (e.g. the person datafile for California alone is > 300MB). So if you want data from all 50 US states plus DC, and especially if you want household-level data too, we're talking quite a major data processing exercise. I've found that with multiple imports in "join" it doesn't take much longer to import 6 or 7 series at a time than it does to import a single series, meaning that we get a very noticeable speed-up of the process. Allin

10 years, 7 months

1
0
0 / 0

silent failure of sprintf

by Sven Schreiber

Hi, I stumbled again over something for which my own mistake was the ultimate cause, but still I think gretl should have complained: <hansl> scalar r = 5 sprintf r "%d", 10 print r # prints number 5 print "@r" prints literal @r </hansl> So I guess it's expected that gretl doesn't want to change the type of r from scalar to string -- BTW, I don't mind this, but is this static typing actually an intended property of hansl? --, but then this should produce an error or a warning at least, no? thanks, sven

10 years, 7 months

3
3
0 / 0

syntax inconsistency for coeff vector

by Sven Schreiber

Hi, this is nothing new, but I stumbled over it again and now I can use it as an excuse why I never manage to remember the correct syntax: When we access the coeff vector after estimation, we have '$coeff'; when we give the variable index we use square brackets ($coeff[2]), but when we give the name then it's round brackets ($coeff(myvar)). So far so good. But when we formulate restrictions, it's now 'b' for the coeff vector, and we _always_ have to use square brackets apparently (b[2] as well as b[myvar]). So there are two pretty obvious questions from the point of view of the user I think: 1) Why have a separate symbol for the coeff vector in restrict blocks at all? (Backward compatibility issues aside for now.) 2) The bracket situation seems arbitrary and confuses me every time. Can it be changed? (in the medium term I mean, no need to rush.) Thanks, sven

10 years, 7 months

2
4
0 / 0

slight hiccup with Umlauts

by Sven Schreiber

Hi, I don't even know if it's supposed to work, but if I work with German special characters in matrix row names, the output is slightly misarranged. <hansl> string ex = "ÄÜß hi" matrix in = {1, 3; 5, 6} err = rownames(in, ex) print in </hansl> Apart from that gretl has suddenly started acting very sluggishly again, I'll investigate if it has to do with the names of matrix columns or rows. thanks, sven

10 years, 7 months

2
1
0 / 0

is there an inarray function?

by Logan Kelly

Hello, I need a function the tests if variable is in an array (sorry for the poor wording of this sentence). What I mean by this is a function that compares each element of an array to a variable and returns 0 if no elements of the array are equal and the position in the array of the first instance of equality. So here are my question: 1. Dose such a function exist? (I haven't found one, but I thought I should ask) 2. If not, coding it up is no problem, but I need a way to check the data type of variable. Is there such a command? 3. Is there a way, other than using a bundle, to pass a variable of unknown data type to a function? Thanks, Logan

10 years, 7 months

4
9
0 / 0

SIGSEGV

by Marcin Błażejowski

Hi, I get the following gdb error (1.9.92 and current CVS): ------------ Program received signal SIGSEGV, Segmentation fault. __strcmp_sse2_unaligned () at ../sysdeps/x86_64/multiarch/strcmp-sse2-unaligned.S:29 29 ../sysdeps/x86_64/multiarch/strcmp-sse2-unaligned.S: No such file or directory. ------------ The problem occurs in one of my old packages and I don't know where since I don't use only strlen() function. Best Regards, Marcin -- Marcin Błażejowski GG: 203127

10 years, 7 months

2
1
0 / 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

Gretl-devel December 2014