On Sun, Nov 18, 2012 at 11:42 AM, Lee Adkins wrote:
Here is a followup that illustrates what Jack is saying about when is a small number really zero. Being too strict about the size of the value may lead to other unintended results.  This example is based on the same dataset (br.gdt) but uses restrictions that are nearly true.

? square sqft bedrooms
? logs price
? series price = price/100
? list xlist = const sqft sq_sqft sq_bedrooms bedrooms baths age
? matrix Rmat = zeros(3,4)~I(3)
? matrix r = {  700 ; 400; -10 }
? ols price xlist

Model 1: OLS, using observations 1-1080
Dependent variable: price

coefficient     std. error    t-ratio    p-value
---------------------------------------------------------------
const         168.782        216.484           0.7797 0.4358
sqft           -0.758827       0.0741780     -10.23   1.68e-023 ***
sq_sqft         0.000248214    1.03688e-05    23.94   8.12e-102 ***
sq_bedrooms  -117.075         19.4308        -6.025   2.32e-09  ***
bedrooms      694.058        138.416          5.014   6.22e-07  ***
baths         379.550         46.2502         8.206   6.48e-016 ***
age           -8.34062        1.14878        -7.260   7.40e-013 ***

Mean dependent var   1548.632   S.D. dependent var   1229.128
Sum squared resid    4.01e+08   S.E. of regression   611.6813
F(6, 1073)           547.2962   P-value(F)           0.000000
Log-likelihood      -8458.451   Akaike criterion     16930.90
Schwarz criterion    16965.79   Hannan-Quinn         16944.11

? restrict --full
? R=Rmat
? q=r
? end restrict

Test statistic: F(3, 1073) = 0.955727, with p-value = 0.412961

Model 2: Restricted OLS, using observations 1-1080
Dependent variable: price

coefficient     std. error     t-ratio     p-value
------------------------------------------------------------------
const         201.080        88.6191         2.269       0.0235    **
sqft          -0.783296      0.0645078    -12.14        6.88e-032 ***
sq_sqft         0.000250439   9.22345e-06   27.15        4.42e-124 ***
sq_bedrooms  -118.625         5.21332      -22.75        6.95e-094 ***
bedrooms      700.000         0.000000      NA          NA
baths         400.000         4.64027e-07    8.620e+08   0.0000    ***
age           -10.0000        0.000000      NA          NA

Notice that the std error on sq_sqft squared is very small (but not zero) and the one on baths (which is technically zero) is only 1 decimal smaller.  If you didn't know that the se is supposed to be zero on a restricted coefficient (like many of my students) you'd report something that was obviously wrong.  In the original example, the problem was not so much in the restrictions, but the conditioning of the data themselves, which remains very bad even in this case of "good" restrictions.  It's not clear to me how sorting this out based on size is possible.  Is there a complex eigenvalue associated with the R*inv(X'X)*R' that might identify which should be NA?

Lee

After thinking about this for a second, there will only be 3 eigenvalues for that matrix, and all of them positive to be sure.  Hmmm.  The ones for X'X are positive (but some are tiny indicating severe collinearity).
--