On Sun, 18 Feb 2018, Sven Schreiber wrote:
Am 17.02.2018 um 19:26 schrieb Sven Schreiber:
> Instead I'd suggest to go back to numpy's savetxt, because it offers gzip
> compression "for free", whenever a 'gz' extension is given.
>
> Does that sound like a good idea? I could send a concrete proposal for
> gretl_export -- but not for the imminent release.
Here's a new function version for gretl_io.py:
<python>
def gretl_export(X, fname, autodot=1):
from numpy import asmatrix, savetxt
M = asmatrix(X)
r, c = M.shape
if autodot:
fname = gretl_dotdir + fname
ghead = repr(r) + '\t' + repr(c)
savetxt(fname, M, header=ghead, comments='')
</python>
Because of the standard savetxt behavior, if the filename ends with "gz" it
will be automatically compressed. I tested that this works OK together with
gretl's mread() function.
Numpy.savetxt has been in numpy in the necessary form with the comments and
header options since numpy v1.7 (released in 2013).
Thanks, Sven. That's now in my working copy and I'll push it to SF as
soon as their git access is working again.
One point to note is that although using the gzip facility will save
on disk space, it will slow down the data transfer. In an effort to
speed it up I've also implemented support for binary-format matrices.
This kicks in if you give a matrix file the ".bin" suffix. I show
below a sample script and output -- though I'm afraid you'll have to
wat to test this till SF is back up.
<hansl>
set verbose off
matrix m = mnormal(1000, 500)
set stopwatch
mwrite(m, "m.mat", 1)
foreign language=python
import numpy as np
m = gretl_loadmat('m.mat', 1)
gretl_export(np.asmatrix(m), 'py.mat')
end foreign
py_m = mread("py.mat", 1)
printf "plain text round-trip: %gs\n", $stopwatch
printf "max diff = %g\n", maxr(maxc(abs(m - py_m)))
set stopwatch
mwrite(m, "m.mat.gz", 1)
foreign language=python
import numpy as np
m = gretl_loadmat('m.mat.gz', 1)
gretl_export(np.asmatrix(m), 'py.mat.gz')
end foreign
py_m = mread("py.mat.gz", 1)
printf "gzipped text round-trip: %gs\n", $stopwatch
printf "max diff = %g\n", maxr(maxc(abs(m - py_m)))
set stopwatch
mwrite(m, "m.bin", 1)
foreign language=python
import numpy as np
m = gretl_loadmat('m.bin', 1)
gretl_export(np.asmatrix(m), 'pymat.bin')
end foreign
py_m = mread("pymat.bin", 1)
printf "binary round-trip: %gs\n", $stopwatch
printf "max diff = %g\n", maxr(maxc(abs(m - py_m)))
</hansl>
This gives me:
plain text round-trip: 1.15049s
max diff = 0
gzipped text round-trip: 3.52079s
max diff = 0
binary round-trip: 0.16587s
max diff = 0
Allin