On Fri, 23 Dec 2016, Riccardo (Jack) Lucchetti wrote:
On Sat, 17 Dec 2016, Allin Cottrell wrote:
> I've added in git a function named ecdf which takes a series or vector
> argument and returns the empirical CDF in the form of a two column matrix
> with unique sorted values of the input in column 1 and cumulative relative
> frequency in column 2.
>
> This is not very difficult to do in hansl, but it's substantially faster in
> C (the differential depending on the dimensions of the problem) and I think
> it may be worth having. Test script below.
>
> <hansl>
> function matrix hansl_ecdf (const matrix M)
> matrix ret = values(M) ~ 0
> scalar n = rows(M)
> loop i=1..rows(ret) -q
> ret[i,2] = sumc(M .<= ret[i,1])/n
> endloop
> return ret
> end function
>
> scalar N = 3000
> matrix M = zeros(N, 1)
> loop i=1..N -q
> M[i] = randint(1, 200)
> endloop
>
> scalar t1=0
> scalar t2=0
>
> loop 500 -q
> set stopwatch
> matrix ec1 = ecdf(M)
> t1 += $stopwatch
> matrix ec2 = hansl_ecdf(M)
> t2 += $stopwatch
> endloop
>
> printf "built-in: %.3fs\n", t1
> printf "hansl: %.3fs\n", t2
> </hansl>
>
> hansl_ecdf() could probably be improved upon, but it seems like a "natural"
> solution for hansl users. Output on the machine I'm at:
>
> built-in: 0.204s
> hansl: 1.610s
>
> (Besides, the built-in version handles both series and vectors, and
> automatically drops NAs or NaNs from the calculation.)
>
> If I've missed a reason why this is redundant (which sometimes happens!)
> please let me know. Otherwise I'll go ahead and document it.
The following is, in fact, a bit faster than yours, but still slower than C
(unsurprisingly); but, as you say, it doesn't support matrices, although it'd
be noce to extend aggregate to do so
<hansl>
function matrix hansl_ecdf(series x)
ret = aggregate(const,x)
a = cum(ret[,2])
scalar n = a[rows(a)]
ret[,2] = a ./ n
return ret
end function
</hansl>
You're too modest, this version is over twice as fast as my naive
hansl version ;-)
Besides it exposes a couple of things about aggregate() which are
now fixed in git. First, the help didn't indicate that you could
call it without a third (aggregator function) argument. Second, in
your usage above, "const" is just a dummy that does nothing; one
could replace it with "null". However, one couldn't just omit it
altogether, as in
ret = aggregate(,x) # error
but it seems that variant ought to be acceptable. It now is in git.
Allin