Now looking at the algorithm in more detail...

Thanks for doing this!
Allin Cottrell schrieb:
>
> First, we scan the matrices R and q (as in R*vec(beta) = q),
> looking for rows that satisfy this criterion:
>
> * There's exactly one non-zero entry, r_{ij}, in the given row of
> R, and the corresponding q_i is non-zero.
seems ok; do you also check whether there are restrictions on
alpha (and which kind)?

Yes. I have added some checks since I first described the
algorithm, including: we don't try scale removal if we find any
alpha restrictions that cross between columns of alpha. (We don't
support non-homogeneous restrictions on alpha, so that's not an
issue at present.)
In addition, we consider scale removal infeasible for a given beta
column if (a) we find a non-homogeneous restriction involving two
or more beta elements (e.g. b[1,1] - b[1,3] = 2), or (b) we find
any cross-column beta restrictions.
> For each such row we record the coordinates i and j and the
> implied value of the associated beta element, v_j = q_i/r_{ij}.
> Based on j we determine the associated column of the beta matrix:
> k = j / p_1, where p_1 is the number of rows in beta (and '/'
> indicates truncated integer division).
Is this really what you mean?
If we're talking about the first element in the first cointegration
vector, then with "human" (=one-based) matrix indexing we have i=j=1
and, for example, p_1 = 3. Then k = 1 '/' 3 = 0, but should be 1?

Internally, we use 0-based indexing throughout. So in the example
you give, k = 0 / 3 = 0, which is correct.
> We then ask: Is row i the first such row pertaining to column k
of
> beta? If so, we flag this row (which I'll call a scaling row) for
> removal. If not, we flag it for replacement with a homogeneous
> restriction. For example, suppose we find two rows of R that
> correspond to
>
> b[2,1] = 1
> b[2,3] = -1
>
> We'll remove the first row, and replace the second with
>
> b[2,1] + b[2,3] = 0
ok; what do you do if there are more than two relevant rows?

Hmm. A third such row would be treated in the same way as the
second -- that is, converted into a homogeneous restriction that
references the coefficient in the first scaling row.
> We then do the maximization using the switching algorithm. Once
> that's done, but before computing the variance of the estimates,
> we re-scale beta and alpha using the saved information from the
> scaling rows.
Could you store the maximized likelihood value before the rescaling?
Just as a debug check if the rescaling leaves it unchanged.

Good idea.
> We then recompute the relevant blocks of the observed
information
> matrix using the rescaled beta and alpha, and I think the variance
> comes out right.
yes the problem seems to lie with the point estimates themselves
well sorry, couldn't find any obvious mistakes :-(

I'll try increasing the verbosity of the switching and see if I
can spot any strange nasties.
Allin.