Am 02.03.2021 um 17:04 schrieb Sven Schreiber:
# now the real thing
web =
readfile("https://opendata.dwd.de/climate_environment/CDC/observations_germany/climate/monthly/kl/historical/")
print web # very long string! (HTML source)
eval xmlget(web, "//a") # error: xmlParseMemory returned NULL
</hansl>
Does the error mean that the page's code is simply too much / too long?
OK, turns out the main problem doesn't have anything to do with gretl's
xmlget function. The web page above uses the (valid) HTML 4 element <hr>
which isn't valid xml and therefore libxml chokes on this (and by
implication then also xmlget).
However, there's still something strange with the printout. If I run the
following short script:
<hansl>
clear
string web =
readfile("https://opendata.dwd.de/climate_environment/CDC/observations_germany/climate/monthly/kl/historical/")
web = strsub(web, "<hr>", "")
# print web
eval xmlget(web, "//a")
</hansl>
then there's no error, but the "eval" line only produces 40 lines or so,
with the last printed line being truncated like this ("_hist.zip" is
missing":
monatswerte_KL_00183_19360101_20191231
If instead I uncomment the "print web" line above then I get everything,
which is several hundreds of lines. Not sure why I need to print the
string first! This is with a recent snapshot.
But note: If I don't do "eval" but assign the xmlget result to a string
variable, everything seems fine, so no deep problem here.
thanks
sven