On Sat, 17 Dec 2016, Allin Cottrell wrote:
I've added in git a function named ecdf which takes a series or
vector
argument and returns the empirical CDF in the form of a two column matrix
with unique sorted values of the input in column 1 and cumulative relative
frequency in column 2.
This is not very difficult to do in hansl, but it's substantially faster in C
(the differential depending on the dimensions of the problem) and I think it
may be worth having. Test script below.
<hansl>
function matrix hansl_ecdf (const matrix M)
matrix ret = values(M) ~ 0
scalar n = rows(M)
loop i=1..rows(ret) -q
ret[i,2] = sumc(M .<= ret[i,1])/n
endloop
return ret
end function
scalar N = 3000
matrix M = zeros(N, 1)
loop i=1..N -q
M[i] = randint(1, 200)
endloop
scalar t1=0
scalar t2=0
loop 500 -q
set stopwatch
matrix ec1 = ecdf(M)
t1 += $stopwatch
matrix ec2 = hansl_ecdf(M)
t2 += $stopwatch
endloop
printf "built-in: %.3fs\n", t1
printf "hansl: %.3fs\n", t2
</hansl>
hansl_ecdf() could probably be improved upon, but it seems like a "natural"
solution for hansl users. Output on the machine I'm at:
built-in: 0.204s
hansl: 1.610s
(Besides, the built-in version handles both series and vectors, and
automatically drops NAs or NaNs from the calculation.)
If I've missed a reason why this is redundant (which sometimes happens!)
please let me know. Otherwise I'll go ahead and document it.
The following is, in fact, a bit faster than yours, but still slower than
C (unsurprisingly); but, as you say, it doesn't support matrices, although
it'd be noce to extend aggregate to do so
<hansl>
function matrix hansl_ecdf(series x)
ret = aggregate(const,x)
a = cum(ret[,2])
scalar n = a[rows(a)]
ret[,2] = a ./ n
return ret
end function
</hansl>
-------------------------------------------------------
Riccardo (Jack) Lucchetti
Dipartimento di Scienze Economiche e Sociali (DiSES)
Università Politecnica delle Marche
(formerly known as Università di Ancona)
r.lucchetti(a)univpm.it
http://www2.econ.univpm.it/servizi/hpp/lucchetti
-------------------------------------------------------