First, we scan the matrices R and q (as in R*vec(beta) = q),
looking for rows that satisfy this criterion:
* There's exactly one non-zero entry, r_{ij}, in the given row of
R, and the corresponding q_i is non-zero.

seems ok; do you also check whether there are restrictions on alpha (and
which kind)?
For each such row we record the coordinates i and j and the
implied value of the associated beta element, v_j = q_i/r_{ij}.
Based on j we determine the associated column of the beta matrix:
k = j / p_1, where p_1 is the number of rows in beta (and '/'
indicates truncated integer division).

Is this really what you mean?
If we're talking about the first element in the first cointegration
vector, then with "human" (=one-based) matrix indexing we have i=j=1
and, for example, p_1 = 3. Then k = 1 '/' 3 = 0, but should be 1?
We then ask: Is row i the first such row pertaining to column k of
beta? If so, we flag this row (which I'll call a scaling row) for
removal. If not, we flag it for replacement with a homogeneous
restriction. For example, suppose we find two rows of R that
correspond to
b[2,1] = 1
b[2,3] = -1
We'll remove the first row, and replace the second with
b[2,1] + b[2,3] = 0

ok; what do you do if there are more than two relevant rows?
In general, the replacement R row is all zeros apart from a 1 in
column j0 and the value x in column j1, where
* j0 is the non-zero column in the scaling row pertaining to
column k of beta;
* j1 is the non-zero column in the row of R to be replaced; and
* x = -v_{j0} / v_{j1}

looks ok, too
We then do the maximization using the switching algorithm. Once
that's done, but before computing the variance of the estimates,
we re-scale beta and alpha using the saved information from the
scaling rows.

Could you store the maximized likelihood value before the rescaling?
Just as a debug check if the rescaling leaves it unchanged.
For each column, k, of beta, if there's an associated scale factor
v_j (as defined above; strictly speaking, v_{j0}):
* Construct the value s_k = b_j / v_j, where b_j is the jth
element of the estimated vec(beta).
* Divide each non-zero element of the beta column by s_k; multiply
each non-zero element of column k of alpha by s_k.

ok
We then recompute the relevant blocks of the observed information
matrix using the rescaled beta and alpha, and I think the variance
comes out right.

yes the problem seems to lie with the point estimates themselves
well sorry, couldn't find any obvious mistakes :-(
-sven