On Mon, 22 Jan 2024, Sven Schreiber wrote:
I've stumbled over a certain behavior with panel data handling,
which could perhaps be a bug.
Yes, there was a bug. After going over all that Sven reported I was
able to replicate the problem with these steps:
1) launch GUI and run pansmpl.inp (see below), giving N=8, T=24
2) Do '/File/Save data' with name foo.gdt
3) Use '/Sample/Set range' and reduce T to 12 (2023 only)
4) Do '/File/Save data' again, saying No to restoring full data
5) Then in the console:
a smpl -> Full data range: 1:01 - 8:12 (n = 96) # OK
b smpl full -> same output as above # OK
c smpl ichk --dummy -> Current sample: 1:01 - 4:12 (n = 48) # Bad!
d smpl full -> Full data range: 1:01 - 8:12 (n = 96) # OK
e smpl ichk==1 --restrict -> Current sample: 1:01 - 7:12 (n = 84) # OK
Switching the order of console commands c and e didn't alter the
discrepancy, I still got the bad result from the --dummy option but
the correct result from --restrict.
And this result was invariant wrt using '/Sample/Set range' to cut
out the second year's data versus using 'smpl time < 13 --restrict'.
But if I did, after step 4);
smpl full
store pantest.gdt
open pantest.gdt
smpl ichk --dummy
then I got the correct result with the --dummy option. So it seemed
the error must lie with the shrinkage of the original T=24 dataset
at a result of step 4) above. Indeed, some "debris" was not being
cleared out. That's now fixed in git.
<hansl name="pansmpl.inp">
N = 8
T = 24
NT = N*T
nulldata NT --preserve
setobs 24 1:01 --stacked-time-series
setobs 12 2023:01 --panel-time
series u = $unit
genr time
series ichk = u != 6
</hansl>
Allin