Gretl project on the launchpad
by Ivan Sopov
Hello, gretl developers.
I'm trying to start a translation of help files into russian on
launchpad.net as it seems to be the most suitable tool to participate
for all familiars with econometrics but not with gettext, linux. cvs,
etc.
The problem is that there is already a project for gretl on launchpad
and it is strongly prohibited to start more than one project for a
single program. I cannot contact with Constantine Tsardounis for about
a month, so I think it is time to re-assign that project to someone
else. On the irc-channel of launchpad I was told that
Our admins can re-assign the project to new owners but we'd prefer to
hear from the upstream owners. can you get one of them to submit a
question here:
https://answers.edge.launchpad.net/launchpad
But if nobody from main developers wants to register and do something
at launchpad it is possible to assign this function to me and in that
case a letter in this list will probably be enough.
I have prepared a .po-file for genr_funcs.xml and gretl_commands.xml
with the help of po4a utility and got 1511 strings for translation
(strings a rather big).
Good luck, Ivan Sopov.
P.S. My previous letter about using launchpad for translation is
http://lists.wfu.edu/pipermail/gretl-devel/2009-November/002171.html
12 years
Typo in GARCH GUI help
by Hélio Guilherme
Hi Allin,
I am slowly doing the translation of gretl_commands.xml, so I found two errors.
This is the block starting at line 3788:
<para context="gui">
The estimated conditional variance, along with the residuals and
various other model statistics, can be accessed and added to the
dataset using the <quote>Model data</quote> menu in the window where
the model is displayed. If the box marked <quote>Standardize the
residuals</quote> is checked, the residuals are divided by the
square root of te conditional variance.
</para>
First the menu entry should be <quote>Save</quote>, and then you have
"square root of te conditional variance".
Best Regards,
Hélio
12 years, 3 months
really big data
by Allin Cottrell
A few changes in recent gretl CVS address the issue of
handling very large datasets -- datasets that will not fit
into RAM in their entirety. These changes are not finalized --
though hopefully they will be by the end of the summer, once
Jack Lucchetti and I have worked on the matter together -- so
I haven't yet started to document them. But I'll explain the
current state of affairs here and if people would like to test
and give their comments that would be great.
First, let me note in passing that Jack and I have considered
the idea of introducing a new internal data-type in gretl -- a
type smaller than the double-precision floating-point type
that we currently use to represent all data series. Doubles
take up 8 bytes apiece, but some data (e.g. dummy variables)
could be represented correctly using a single byte. That way
we could cram more data into RAM. However, we've abandoned
that idea for the present; making the change would be a huge
amount of work, given the heavy dependence of the existing
gretl code on the assumption that all series live in a big
array of uniform type.
So what I'll describe here is not actually a way of fitting
more data into RAM; it's a way of pulling data for analysis
from a data source that is too big to handle in full.
To fix ideas, consider a census dataset with a million
observations on a thousand variables, so a giga-values
dataset. In double precision this would occupy 8GB of memory.
To load such data in full, let alone run regressions on them,
you'd need substantially more than 8GB of RAM. But it's
unlikely you'd want to run regressions using the entire
dataset; more plausibly, you might want to use a subset of the
variables and/or a subset of the observations. The problem
we're addressing is this: how do you extract such a subset
using gretl, if you can't read the full data into memory to
start with?
The answer depends on the format of the full dataset. Gretl
can read specified individual series from various sorts of
databases (native gretl binary databases, RATS 4.0, PcGive),
and via ODBC it can extract both particular series and
particular observations as specified via SQL. But what if the
original data are not in any of these formats?
Very large public datasets are quite often available in plain
text format, either delimited (comma-separated or similar) or
in fixed format (where each variable has a known starting
column and a known width in bytes). To date, gretl has had a
mechanism for reading specified variables from a fixed-format
text datafile (the --cols option to the "open" command), but
there has been no mechanism for reading specified variables
from a delimited text file, nor has there been a way of asking
gretl to read only certain observations (rows) from such a
file.
So here are the changes in CVS:
* The --cols option has been generalized so that it can be
used on delimited text files as well as fixed-format ones.
* A --rowmask option has been added which enables you to read
specified rows from text datafiles (and also from native
binary databases).
I'll get to the (provisional) syntax in a moment, but first
let me give an overview of how one might proceed, starting
from a huge text datafile. I'll assume we want to subset both
the series and the observations.
(1) Open the data source using the --cols option to extract
the series that will be used to pick out the observations we
want. (For example, maybe we need a male/female dummy variable
to pick out observations on women.) I'm assuming here that the
number of series needed for this task is small enough that we
can afford to load all the observations.
(2) Create a (matrix) "mask" with 1s for observations we want
and 0s for those to be skipped. Clear the current dataset but
keep the matrix.
(3) Open the source again: this time use both the --cols and
--rowmask options to extract the particular data we want.
To give a sense of the current syntax, here's a hansl example
to carry out steps (1) to (3) above. I'll assume at first that
we're reading from a delimited text file.
<hansl>
# read a specified column
open huge.txt --cols=15 --delimited
# create the observations mask
matrix mask = (gender == 1)
# re-open and read
open huge.txt --cols=1,2,20,156 --rowmask=mask --delimited --preserve
</hansl>
In the first "open" we need the --delimited option to tell
gretl to interpret the --cols specification as giving a list
of one or more delimited columns. The "15" says to read the
15th data column, which I assume contains a series named
"gender". In the second "open" we specify the columns (series)
that we want for analysis along with the row mask we just
created. The --preserve option is needed so that the matrix we
want to use as a mask doesn't get destroyed.
Here's the equivalent, but this time supposing we're reading
from a fixed-format file:
<hansl>
# read a specified column
open fixed.txt --cols=15,1
# create the observations mask
matrix mask = (v1 == 1)
# re-open and read
matrix C = {1,6,7,8,32,6}
open fixed.txt --cols=C --rowmask=mask --preserve
</hansl>
In this example the --cols specification has its original
meaning (as documented in relation to "open"). That is, it is
made up of pairs (c,w) where c gives the starting byte and w
the width in bytes. So the series we want for the row mask is
a single byte starting at byte 15 on each line of the data.
With fixed-format data variable names are not supported, but
the first variable to be read will be automatically given the
name "v1". We then proceed to read specified observations on
three series: one starting at byte 1 and occupying 6 bytes,
one 8 bytes wide starting at byte 7, and one of 6 bytes
starting at byte 32. This example also illustrates the way you
can (now) use a named matrix with the --cols option.
One more point. Note that a native gretl binary database can
be both read and written piece-wise, without ever holding the
whole thing in memory, so one could modify the above scenarios
to write out the huge data in native db format. The advantage
of that approach is that reading from a native gretl db is
much faster than reading from a text file, so if we want to go
back and read various different subsets of the data we'd get a
big gain in efficiency. In this context it would be nice to be
able to use gretl's "open" command in a loop. This is not
currently enabled (it may happen at some point, but it
wouldn't be trivial to implement). But you can use the shell
to implement the loop, as in the following pairs of scripts.
Running the bash script will cause gretl to read a thousand
series from huge.txt, one at a time, and write them into the
database file huge.bin.
<hansl>
# writedb.inp
open huge.txt --cols=COL --delimited --quiet
store huge.bin --database
</hansl>
<bash>
for i in {1..1000}
do
sed -e "s+COL+$i+" writedb.inp > tmp.inp
gretlcli -b tmp.inp
done
</bash>
Allin
12 years, 3 months
using libcurl
by Allin Cottrell
In CVS, we've now switched to using libcurl for HTTP support
(as opposed to using code that was ripped from wget once upon
a time).
This will make our network support more flexible and
extensible but it means we now have another dependency. To
build current CVS gretl you'll need the appropriate "dev"
package for libcurl -- or go for the original source at
http://curl.haxx.se/libcurl/
Allin Cottrell
12 years, 3 months
Re: [Gretl-devel] gretl 1.9.9 just about ready
by artur.tarassow@googlemail.com
Hey,
I tried out the new feature of saving graphpages this morning. An error occured, on windows and ubuntu, when I tried to save a graph as a pdf file. All worked fine with eps. I think there was a problem with pdflatex or so. I'll have a look in this later today and write you a more detailed report.
And thank you for implementing this feature!
Best, Artur
-original message-
Subject: [Gretl-devel] gretl 1.9.9 just about ready
From: Allin Cottrell <cottrell(a)wfu.edu>
Date: 30/05/2012 17:41
I'm about ready to release 1.9.9, but I'd be grateful if any
of you could take today's snapshots for Windows and OS X for a
quick spin, just to check there's nothing badly broken.
If there are no bad reports I'll bump the version and do the
release, probably tomorrow.
Allin
_______________________________________________
Gretl-devel mailing list
Gretl-devel(a)lists.wfu.edu
http://lists.wfu.edu/mailman/listinfo/gretl-devel
12 years, 4 months
Calling GMM from libgretl
by Evan Miller
I am using libgretl in a C program and wish to perform an IV
regression via GMM. Here is the relevant code:
DATASET *data_set = create_new_dataset(nvar, nobs, 0);
for (i=0; i<nvar; i++) {
for (j=0; j<nobs; j++) {
dset_set_data(data_set, i, j, x[i*nobs+j]);
}
}
... create list using -100 as a separator ...
MODEL model_results = ivreg(list, data_set, OPT_G);
When I first tried it, I got this error:
realgen: exiting on expr() error 19
genr_compile: genrs[0] = 0x0, err = 19
formula: 'gmm___e = const-(b0+b1*+b2*+b3*+b4*+b5*+b6*+b7*+b8*+b9*)'
nls_genr_setup failed
Apparently variables without names were a big no-no, so I named the
variables in my data set by writing to data_set->varnames. Now I get
this error:
realgen: exiting on lex() error 15
genr_compile: genrs[0] = 0x0, err = 15
formula: 'gmm___e =
var0-(b0+b1*var5+b2*var6+b3*var7+b4*var8+b5*var9+b6*var10+b7*var11+b8*var12+b9*var13)'
The symbol 'var0' is undefined
nls_genr_setup failed
What am I doing wrong? I am using libgretl 1.9.8 on a Mac.
Thanks
Evan
12 years, 4 months