Am 24.08.2013 20:15, schrieb Sven Schreiber:
Am 23.08.2013 11:21, schrieb Sven Schreiber:
>
> Here's my take at doing it in hansl (untested), but let's not forget
> that the goal is (IMHO) to make the preprocessing unnecessary
> altogether, by enabling 'join' do smaller/greater comparisons on ISO
> date strings!
>
Following is now an actually working version, tested with the real-world
1MB file of INDPRO. However, it is very slow, much slower than using my
Python solution it seems to me. Don't know if there are some gretl
string internals that could be sped up.
Specifically, the preprocessing including all calling overheads takes
<2sec with the Python solution, and roughly 120sec with native gretl.
I think the crucial lines are the following:
loop repetitions # loop over the lines in file
sscanf(rest,"%s\t%s\t%s\t%s\n",col1,col2,col3,col4)
string rest = strstr(rest,"\n") + 1 # offset to drop the leading \n
That is, there are thousands of operations working on strings holding
(almost) the entire file content (in this case about 1MB as I said). I
have tried to consolidate this into a more clever sscanf line, but that
didn't really help. Glad to take more ideas.
In contrast, in python the file is read line per line. So I don't know
if it's worth it (bearing in mind that actually we don't want any
preprocessing...), but perhaps it would help if the readfile() function
could be extended to automatically (= at the C level) separate the lines
of the file; for example:
<future hansl>
bundle btemp = readfile(fname,1) # new optional 2nd arg to split lines
loop i=1..nelem(btemp) # extended nelem() for number of bundle items
string line = btemp.line$i
... do stuff with line ...
endloop
</future hansl>
Again, not sure if it's worth it.
thanks,
sven