? square sqft bedrooms

? logs price

? series price = price/100

? list xlist = const sqft sq_sqft sq_bedrooms bedrooms baths age

? matrix Rmat = zeros(3,4)~I(3)

? matrix r = { 700 ; 400; -10 }

? ols price xlist

Model 1: OLS, using observations 1-1080

Dependent variable: price

coefficient std. error t-ratio p-value

---------------------------------------------------------------

const 168.782 216.484 0.7797 0.4358

sqft -0.758827 0.0741780 -10.23 1.68e-023 ***

sq_sqft 0.000248214 1.03688e-05 23.94 8.12e-102 ***

sq_bedrooms -117.075 19.4308 -6.025 2.32e-09 ***

bedrooms 694.058 138.416 5.014 6.22e-07 ***

baths 379.550 46.2502 8.206 6.48e-016 ***

age -8.34062 1.14878 -7.260 7.40e-013 ***

Mean dependent var 1548.632 S.D. dependent var 1229.128

Sum squared resid 4.01e+08 S.E. of regression 611.6813

R-squared 0.753717 Adjusted R-squared 0.752340

F(6, 1073) 547.2962 P-value(F) 0.000000

Log-likelihood -8458.451 Akaike criterion 16930.90

Schwarz criterion 16965.79 Hannan-Quinn 16944.11

? restrict --full

? R=Rmat

? q=r

? end restrict

Test statistic: F(3, 1073) = 0.955727, with p-value = 0.412961

Model 2: Restricted OLS, using observations 1-1080

Dependent variable: price

coefficient std. error t-ratio p-value

------------------------------------------------------------------

const 201.080 88.6191 2.269 0.0235 **

sqft -0.783296 0.0645078 -12.14 6.88e-032 ***

sq_sqft 0.000250439 9.22345e-06 27.15 4.42e-124 ***

sq_bedrooms -118.625 5.21332 -22.75 6.95e-094 ***

bedrooms 700.000 0.000000 NA NA

baths 400.000 4.64027e-07 8.620e+08 0.0000 ***

age -10.0000 0.000000 NA NA

Notice that the std error on sq_sqft squared is very small (but not zero) and the one on baths (which is technically zero) is only 1 decimal smaller. If you didn't know that the se is supposed to be zero on a restricted coefficient (like many of my students) you'd report something that was obviously wrong. In the original example, the problem was not so much in the restrictions, but the conditioning of the data themselves, which remains very bad even in this case of "good" restrictions. It's not clear to me how sorting this out based on size is possible. Is there a complex eigenvalue associated with the R*inv(X'X)*R' that might identify which should be NA?

Lee

On Sun, Nov 18, 2012 at 11:11 AM, Lee Adkins <lee.adkins@okstate.edu> wrote:

On Sun, Nov 18, 2012 at 8:51 AM, Allin Cottrell <cottrell@wfu.edu> wrote:

On Sun, 18 Nov 2012, Riccardo (Jack) Lucchetti wrote:

> On Sun, 18 Nov 2012, Allin Cottrell wrote:
>
>> If we were to do this, I'd favour restricting the "clean-up" to the
>> standard errors (printing 0 rather than NA) and let the $vcv
>> accessor show what was actually computed, warts and all.
>
> I think that the flaws from machine precision are of great didactical value.
> IMHO, teaching students that 1.2345e-30 is in fact zero and they should
> _distrust_ software that writes "0" instead of 1.2345e-30 is part of teaching
> good econometrics. That said, in a case like the one Lee brought up, better
> to have 0 than NA.

Agreed.

There are two aspects of our policy to date that are
questionable. First, when computing standard errors from a
variance matrix we've always set the s.e. to NA when we
encounter a negative diagonal entry. This is reasonable in
general, but is arguably too strict when we're producing
restricted estimates.

When we estimate subject to restriction, the "ideal" result is
that (a) the restrictions are met exactly and (b) whenever a
restriction stipulates a definite numerical value for a
parameter, its variance is exactly zero. But -- in line with
your point above -- in general that ain't gonna happen in
digital arithmetic. In a case like Lee's example, with a bunch
of zero restrictions, we expect to find the computed variances
distributed "randomly" in the close neighbourhood of zero,
with some of them likely negative. In that case I think it's
reasonable to print the standard errors as zero, if they're
close enough, but provide the "true" (numerical, ugly)
variance matrix for those who want to see it. That's now in
CVS.

Second, when it comes to retrieving the coefficient or
standard error vector from a model, we've checked for NAs and
if any are found we refuse to supply the object (as Lee
observed). OK, that seems too delicate, or paternalistic, or
something. So now in CVS you can access $stderr even if it
contains NAs.

Allin

I like this solution since, at least for my purposes, it solves an immediate problem. I'm trying to stuff the std errors after a _restrict --full_ statement into a bundle. The function that initiates the bundle fails because $stderr is not returned. I can _catch_ the error and put something into the matrix, but that defeats the purpose of using the _restrict --full_ in this case. I realize the example is extreme, but I'm stress testing the set of functions for a RLS Stein-rule package I'm working on.

I probably need to reassess how I'm computing the restricted estimates in order to make the thing backward compatible with previous versions of gretl. My first version used something similar to the Greene example Allin gave, but the restrict function is so elegant I couldn't resist using it instead. Still, being able to put matrices that contain NA into a bundle will pay dividends down the road I think, especially because there are so many accessors available for subsequent use .... gretl does contain several ways to dress these up once they are available (e.g., misszero() ).

Thanks,
Lee

--
Lee Adkins
Professor of Economics
lee.adkins@okstate.edu

learneconometrics.com

--
Lee Adkins
Professor of Economics
lee.adkins@okstate.edu

learneconometrics.com