Hi everyone,
Thanks for the replies. Sorry for the late response.
Jack: Yes, I'm tracking the number of observations. I'm actually not
using the gretl GUI at all, I'm directly using the libraries. I pretty
much tested how long it takes to execute:
/ MahalDist *distance = get_mahal_distances(gretlParameters,
gretlData, OPT_NONE, NULL, &error);//
/
where gretlData is a DATASET with /n/ number of observations (so this is
the amount that I increased from 1 - 250). FYI: My system monitor
indicates that the statement above only executes on a single thread.
I ran your script in a Linux Mint virtual machine (4 cores, 4GB RAM) and
got different results compared to Helio. I ran the script a couple of
times (see attachments) and although different, they show similar
characteristics. I'm not sure how Helio got the first result.
Looking at the script output, I don't think it's the best way to
benchmarking the execution time in this case.
I've used 8 different datasets with 30-40 million samples each. Every
single window over every single dataset gave the exact same time jump
between 199 and 200 observations.
What I've done is start a timer just before calling the
/get_mahal_distances/ function and then stopping the timer right after.
I've done this a total of about 300 million times and the graphs I sent
in my original post is the average over all these tests - so it should
be a quite accurate estimation.
I'm using these results for my thesis, and I somehow have to explain why
this happens (even if it's just a performance improvement like Allin
said). So if anyone else knows why, please let me know.
In any case, thanks for the help
Chris
On 2014/04/15 02:01 PM, Allin Cottrell wrote:
On Tue, 15 Apr 2014, Riccardo (Jack) Lucchetti wrote:
> On Tue, 15 Apr 2014, Allin Cottrell wrote:
>
>> On Tue, 15 Apr 2014, GOO Creations wrote:
>>
>>> I'm benchmarking the Mahalanobis distance to see how the accuracy and
>>> execution time changes with an increasing sample size. As far as I
>>> understand the algorithm the execution time should grow linearly as the
>>> sample size increases. The weird thing is that the time grows linearly up
>>> to (and including) 199 samples, but then suddenly has a drop at 200
>>> samples. I've attached a graph to illustrate this.
>> What implementation of lapack/blas are you using?
>>
>> The most demanding task in computing Mahalanobis distance is the inversion
>> of the covariance matrix of the selected series, which is performed via
>> the lapack Cholesky functions dpotrf and dpotri. Depending on the
>> implementation, these functions may switch algorithm based on the size of
>> the input data (e.g. invoking parallelization when a certain threshold
>> size is exceeded).
> That's what I had thought too, initially. However, the size of the covariance
> matrix doesn't depend on the number of observations, which is the variable
> our friend is tracking (unless I misunderstood his message).
Duh! You're right. Then I can't explain this either.
Allin
_______________________________________________
Gretl-users mailing list
Gretl-users(a)lists.wfu.edu
http://lists.wfu.edu/mailman/listinfo/gretl-users