On Tue, 28 Feb 2017, Sven Schreiber wrote:
Hi,
concerning the heuristics gretl uses when parsing formatted text data: I just
had a case with tab-delimited data where gretl gave up, and all I had to do
is to replace all tabs with semicolons.
The other format information: daily dates were given as dd.mm.yyyy, the
decimal separator was a comma. The date column had the label "Date".
I'm not saying this is a bug and I know heuristics can never be perfect. Just
wanted to let you know that gretl was almost getting there, there was only
one step left.
Hmm, I think I can see what's happening. Gretl takes comma as the
default field separator. From that point of view the first line of the
file will appear to be a single field (with embedded tabs), but then
the second line will seem to have a number of fields equal to the
number of decimal commas + 1. And we abort if we don't get a
consistent number of fields. If a file of this sort uses the decimal
dot, however, gretl is smart enough to figure things out: tab
separation plus decimal comma is the puzzler.
This is easy to fix if the following heuristic is safe: when parsing
the first line, if it contains any TAB characters then give up the
usual default and assume the file is in fact tab-separated.
Allin