On Wed, 3 Mar 2021, Sven Schreiber wrote:
Am 02.03.2021 um 17:04 schrieb Sven Schreiber:
> # now the real thing
> web =
>
readfile("https://opendata.dwd.de/climate_environment/CDC/observations_germany/climate/monthly/kl/historical/")
>
> print web # very long string! (HTML source)
> eval xmlget(web, "//a") # error: xmlParseMemory returned NULL
>
> </hansl>
>
> Does the error mean that the page's code is simply too much / too long?
>
OK, turns out the main problem doesn't have anything to do with gretl's
xmlget function. The web page above uses the (valid) HTML 4 element <hr>
which isn't valid xml and therefore libxml chokes on this (and by
implication then also xmlget).
However, there's still something strange with the printout. If I run the
following short script:
<hansl>
clear
string web =
readfile("https://opendata.dwd.de/climate_environment/CDC/observations_germany/climate/monthly/kl/historical/")
web = strsub(web, "<hr>", "")
# print web
eval xmlget(web, "//a")
</hansl>
then there's no error, but the "eval" line only produces 40 lines or so,
with the last printed line being truncated like this ("_hist.zip" is
missing":
monatswerte_KL_00183_19360101_20191231
If instead I uncomment the "print web" line above then I get everything,
which is several hundreds of lines. Not sure why I need to print the
string first! This is with a recent snapshot.
But note: If I don't do "eval" but assign the xmlget result to a string
variable, everything seems fine, so no deep problem here.
Hmm, it works OK for me using "eval xmlget..." without any
intervening "print": I get 1095 lines of output. (Tested in both
gretlcli and gretl_x11.)
Allin