On Mon, 10 Sep 2018, Riccardo (Jack) Lucchetti wrote:
On Mon, 10 Sep 2018, Allin Cottrell wrote:
> I agree it would be nice from the user's PoV if "join" could handle
this
> case. However, my first reaction is that this sort of transposition and
> reshuffling is not in join's present repertoire and would induce a fair
> amount more complexity, plus it means adding at least three more options to
> a command that already has more than any other.
I agree on this.
> Just thinking out loud, but maybe we could have a pstack() function (a
> proper function) that does the business of transposing and reshaping the
> data, with "skip_missing" temporarily turned off. It could take two
> arguments: the integer number of individuals and an optional list argument
> (defaulting to the entire dataset), and would return a matrix with
> panelized variables in the columns.
>
> In itself that would leave the user with the task of extracting the series
> from the matrix. But maybe this pstack, or whatever we call it, could write
> out the transformed dataset as CSV (or gdt) so it could then be loaded
> easily -- either in addition to, or instead of, returning a matrix.
Thi doesn't have to be in libgretl. In fact, it could be part of the
"extra"
package, unless I'm missing something.
You're right. Here's proof-of-concept hansl prototype:
<hansl>
function scalar csv_print_matrix (const matrix X,
const string fname,
const strings vnames[null])
N = rows(X)
k = cols(X)
have_names = nelem(vnames) > 0
string nl = sprintf("\n")
outfile @fname --quiet
loop j=1..k -q
if have_names
printf "%s", vnames[j]
else
printf "v%d", j
endif
printf "%s", j == k ? nl : ", "
endloop
loop i=1..N -q
loop j=1..k -q
printf "%.15g", X[i,j]
printf "%s", j == k ? nl : ", "
endloop
endloop
end outfile
return 0
end function
function matrix pstack (int N, list L,
const string fname[null],
const strings vnames[null])
if !exists("vnames")
strings vnames = array(0)
endif
set skip_missing off
matrix X = {L}
set skip_missing on
T = cols(X) # number of periods
NT = N * T # panelized series length
k = nelem(X)/NT # number of panelized series
X = mshape(X', NT, k)
if exists("fname")
csv_print_matrix(X, fname, vnames)
endif
return X
end function
</hansl>
Here's a sample call:
open byvar.csv
N = 6 # number of individuals, A-F
strings vnames = defarray("foo", "bar")
matrix X = pstack(N, dataset, "foo.csv", vnames)
open foo.csv
print -o
where byvar.csv looks like this:
A,1.0,2.0,3.0,4.0
B,1.1,2.1,3.1,4.1
C,1.2,2.2,3.2,4.2
D,1.3,2.3,3.3,4.3
E,1.4,2.4,3.4,4.4
F,1.5,2.5,3.5,4.5
A,10.0,20.0,30.0,40.0
B,10.1,20.1,30.1,40.1
C,10.2,20.2,30.2,40.2
D,10.3,20.3,30.3,40.3
E,10.4,20.4,30.4,40.4
F,10.5,20.5,30.5,40.5
Allin