Am 29.08.20 um 19:50 schrieb Sven Schreiber:
Am 29.08.2020 um 19:40 schrieb Riccardo (Jack) Lucchetti:
> On Sat, 29 Aug 2020, Riccardo (Jack) Lucchetti wrote:
>
>>> It's a pitty that people did not consider the gretl project. I am
>>> wondering whether "we" may re-run the same experiments for gretl
for
>>> getting an idea how gretl would position.
>>>
https://www.modelsandrisk.org/appendix/speed_2020/
>>
>> I think we should prepare hansl version of the little code snippets
>> that are on the webpage you linked.
>
> Sorry for replying to myself: I now see that one of the tests is based
> on reading a gzipped csv file. At present, I don't think we can do such
> a thing, although it shouldn't be difficult to add IMO. Would it be
> worthwhile?
I guess for the "open" command the standard reaction in the past would
have been: why not use any of the many free third-party tools to unzip
the file first?
Not denying that it would be a "nice to have", as always.
But perhaps for "join" it could be a sensible option?
Some while ago (half a year?) I asked for this feature. I had a use-case
at this time with many (>100) csv.gz files which I wanted to join.
Unzipping "outside" the gretl script then required a separate
shell-script before doing loading things to gretl. The "csv.gz" format
is quite standard when working with large data sets (for instance Spark
data frames can be stored as csv.gz).
As gretl does not support wide-spread binary formats such as "pickle
files" or the "Parquet" format, being able to read/ write csv.gz is
definitely useful.
By the way, when I wrote a shell-script half a year ago I've found this
very useful and reliable tool called "pigz"
(
https://linux.die.net/man/1/pigz) for parallel (de-)compression.
Artur