Am 23.08.2013 09:06, schrieb Riccardo (Jack) Lucchetti:
On Wed, 21 Aug 2013, Sven Schreiber wrote:
> Am 21.08.2013 09:40, schrieb Sven Schreiber:
>> Status update: For now it still seems necessary to pre-process the
>> input. I did that with Python in a 'foreign' block within the gretl
>> script, and after some trial-and-error passing around the correctly
>> formatted path strings it works for me as a temporary solution. I will
>> send the code when I have wrapped the stuff in functions as far as
>> possible.
>>
>
> I'm attaching a hansl file with a couple of relevant functions which
> should cover many real-world cases. These functions assume an Alfred
> csv file preprocessed like I described before, which resides in the
> location specified in the string argument 'fname'. (This path is taken
> as-is, so the caller is responsible for getting it right.)
Could you share the details of the pre-processing of the Alfred data
you
need? It's very possible that this could be done in hansl directly,
thereby removing the python dependency.
The transformation is basically two things:
1) For the 'realtime_start_date' and 'realtime_end_date' columns
transform the entries from ISO format (%Y-%m-%d) to a number string
(%Y%m%d), removing the hyphens, for example "1983-02-03" ->
"19830203".
2) Also for these columns, replace dots (".", missing values) by
"99999999".
Note that your alfred_test.inp code is "not clever" enough, because
AFAICS it also replaces dots in the value column (in your case, with
"_latest_"), which is unwanted I think. (In the reduced example file,
this didn't occur, but it can in general.)
Also, it's not as easy as just removing all the hyphens in the file,
that would be easy in hansl. But the 'observation_date' column must stay
in ISO date format, as I wrote in yet another message, since specifying
'--time="%Y%m%d"' didn't work.
Here's my take at doing it in hansl (untested), but let's not forget
that the goal is (IMHO) to make the preprocessing unnecessary
altogether, by enabling 'join' do smaller/greater comparisons on ISO
date strings!
<hansl>
string temp = readfile(fname)
string out = ""
# to use strsplit() we must insert artificial spaces
temp = strsub(temp,"\n","\n ")
i=1
loop while 1 # loop over the lines in file
string line = strsplit(temp,i)
if line=="" # end of file reached
break
endif
# again insert artificial spaces for splitting
line = strsub(line,"\t","\t ")
## do the replacements
# realtime_start_date is the 3rd col
string trans3 = strsub(strsplit(line,3),"-","")
trans3 = strsub(trans3,".","_latest_") # or "99999999"
# and realtime_end_date the 4th
string trans4 = strsub(strsplit(line,4),"-","")
trans4 = strsub(trans4,".","_latest_")
## put the stuff together and add to output
string out += strsplit(line,1) ~ strsplit(line,2) \
~ trans3 ~ trans4
i++
endloop
## write the transformed file
set echo off
outfile @dotdir/temp.csv --write
print out
outfile --close
</hansl>
Attached is my existing python-based preprocessor. The driving code is:
<hansl>
preprocess = 0
##### preprocess the Alfred file with Python: #########
if preprocess
# first write out the meta information
set echo off
outfile @dotdir/temp.py --write # the temp.py name is hardcoded
print "pyfname = r'@fname'" # r to make Python work with backslashes
print "pyoutfname = r'@outfname'"
outfile --close
set echo on
include AlfredPreprocessor.inp # which is Python code in a foreign block
endif
############ end embedded python / preprocessing ######
</hansl>
As I mentioned in another message, wrapping the Python code in a hansl
function currently doesn't work because the indendation is not preserved.
cheers,
sven