We've just noticed that a bug was introduced into our code for reading
native gretl .gdt data files in August of this year. The bug should be
triggered only rarely, but we thought it wise to issue a warning.
Description of bug: If a gdt file contains "subnormal" values (that
is, floating point values that are too close to zero to be represented
with the usual precision), then when such a file is read on Linux, the
first subnormal value to be found on a given row (observation) will be
incorrectly copied into the remaining columns (series) on that row.
Example: A gdt file containing 10 series has a subnormal for series
number 5 on row 25. Then when the file is read on Linux, that
subnormal will replace the correct values for series 6 to 10 for
observation 25.
Comment: This won't affect the reading of "primary" data (actual
micro- or macroeconomic measurements), which will never contain
subnormal values (we're talking about absolute values less than 10 to
the minus 307). And the bug is not triggered on MS Windows. However,
subnormal values may be produced by some data transformations (such as
squaring very small numbers, or computing the normal CDF of very big
negative values).
Fix: This is now fixed in the git source for gretl and also the
current snapshots. And we will put out a new release soon, gretl
2015d.
Diagnostic: If you think a dataset may suffer from this problem,
you can run the script checkdata.inp, from
http://ricardo.ecn.wfu.edu/pub/gretl/checkdata.inp
First load the dataset in question. Then open checkdata.inp and run
it. An affected dataset may produce something like this:
<script-output>
Total number of values examined: 164122
Check for subnormal floating-point values
-----------------------------------------
Total number found: 138
Longest (row) sequence: 138
(occurs at obs 210, starting series ID 461)
Number of sequences (of length >= 2): 1
</script-output>
The symptom of a problem is that we find a consecutive sequence of
subnormal values on one or more rows of the dataset. This could occur
for "natural" reasons but it may indicate corruption. Isolated
subnormals don't indicate the bug. And again, most datasets should
contain no subnormal values.
Allin Cottrell