On Sun, 7 Aug 2016, Sven Schreiber wrote:
Am 07.08.2016 um 11:44 schrieb Riccardo (Jack) Lucchetti:
> On Sat, 6 Aug 2016, Sven Schreiber wrote:
>
>> hi,
>> in relation to the feature request about cross-section dependence
>> testing in panels I've encountered some performance bottleneck.
>
> The "smpl" command is rather CPU-expensive. Here's a different
solution
> (which can be further optimised a little, but I'm sure you get the idea):
OK, thanks for the great suggestions. Indeed my own previous and faster code
had only N=140 sample restrictions, whereas the later version had (N-1)/2
more, so almost 70-fold, which then could explain going from "instantaneous"
to several seconds.
It's a pity though, because my "elegant" version is much more readable
IMHO...
Readable, yes, I agree. But it involves about 39 thousand resizings
of a dataset containing 1260 observations on 11 series, so I'm not
surprised that it takes some time.
Jack's version avoids resizing the entire dataset and thereby is an
order of magnitude faster. But this is a case where a native C
version of the calculation is a good deal faster still.
It's not yet documented, and it's not yet represented in the GUI,
but there's now an --xdepend option to the "modtest" command in git
(for panel-data models only). It computes the Pesaran CD statistic,
and is about 2.5 orders of magnitude faster than Jack's (optimal, I
would say) hansl code.
<hansl>
set echo off
set messages off
function scalar SvensCD (series u)
series unit = $obsmajor
N = max(unit)
rhos = zeros(N,N)
Tcomms = zeros(N,N)
loop i=1..N -q
loop j=(i+1)..N -q
smpl (unit == i || unit == j) --restrict --replace
smpl pxnobs(u) == 2 --restrict
matrix uij = {u}
Tcomms[i,j] = rows(uij) / 2
rhos[i,j] = corr(uij[1: Tcomms[i,j]], uij[Tcomms[i,j] + 1 : ])
endloop
endloop
# Pesaran's CD
scalar CD = sqrt(2 / (N*(N-1))) * sum(sqrt(Tcomms) .* rhos)
return CD
end function
function scalar corr_na(matrix x, matrix y)
e = ok(x) && ok(y)
H = selifr(x ~ y, e)
return corr(H[,1], H[,2])
end function
function scalar JacksCD (series u)
series unit = $obsmajor
N = max($obsmajor)
T = max(time)
set skip_missing off
set warnings off
mu = mshape({u}, T, N)
OK = ok(mu)
matrix Tcomms = OK'OK
matrix rhos = zeros(N,N)
loop i = 1..N --quiet
x = mu[,i]
loop j = i+1..N --quiet
rhos[i,j] = corr_na(x, mu[,j])
endloop
endloop
scalar CD = sqrt(2 / (N*(N-1))) * sum(sqrt(Tcomms) .* rhos)
return CD
end function
open abdata.gdt --quiet
ols n 0 --quiet
series u = $uhat
set stopwatch
eval SvensCD(u)
t0 = $stopwatch
eval JacksCD(u)
t1 = $stopwatch
modtest --xdepend
t2 = $stopwatch
print t0 t1 t2
</hansl>
Output here:
Read datafile /opt/esl/share/gretl/data/misc/abdata.gdt
96.788106
96.788106
Pesaran test for cross-sectional dependence
Test statistic: z = 96.788106,
with p-value = P(z) > 96.7881) = 0
t0 = 1.6515259
t1 = 0.20180730
t2 = 0.00053288300
Allin