On Sun, 18 Nov 2012, Lee Adkins wrote:
On Sun, Nov 18, 2012 at 11:42 AM, Lee Adkins
<lee.adkins(a)okstate.edu> wrote:
> Here is a followup that illustrates what Jack is saying about when is a
> small number really zero. Being too strict about the size of the value may
> lead to other unintended results. This example is based on the same
> dataset (br.gdt) but uses restrictions that are nearly true.
>
> ? square sqft bedrooms
> ? logs price
> ? series price = price/100
> ? list xlist = const sqft sq_sqft sq_bedrooms bedrooms baths age
> ? matrix Rmat = zeros(3,4)~I(3)
> ? matrix r = { 700 ; 400; -10 }
> ? ols price xlist
>
> Model 1: OLS, using observations 1-1080
> Dependent variable: price
>
> coefficient std. error t-ratio p-value
> ---------------------------------------------------------------
> const 168.782 216.484 0.7797 0.4358
> sqft -0.758827 0.0741780 -10.23 1.68e-023 ***
> sq_sqft 0.000248214 1.03688e-05 23.94 8.12e-102 ***
> sq_bedrooms -117.075 19.4308 -6.025 2.32e-09 ***
> bedrooms 694.058 138.416 5.014 6.22e-07 ***
> baths 379.550 46.2502 8.206 6.48e-016 ***
> age -8.34062 1.14878 -7.260 7.40e-013 ***
>
> Mean dependent var 1548.632 S.D. dependent var 1229.128
> Sum squared resid 4.01e+08 S.E. of regression 611.6813
> R-squared 0.753717 Adjusted R-squared 0.752340
> F(6, 1073) 547.2962 P-value(F) 0.000000
> Log-likelihood -8458.451 Akaike criterion 16930.90
> Schwarz criterion 16965.79 Hannan-Quinn 16944.11
>
> ? restrict --full
> ? R=Rmat
> ? q=r
> ? end restrict
>
> Test statistic: F(3, 1073) = 0.955727, with p-value = 0.412961
>
>
> Model 2: Restricted OLS, using observations 1-1080
> Dependent variable: price
>
> coefficient std. error t-ratio p-value
> ------------------------------------------------------------------
> const 201.080 88.6191 2.269 0.0235 **
> sqft -0.783296 0.0645078 -12.14 6.88e-032 ***
> sq_sqft 0.000250439 9.22345e-06 27.15 4.42e-124 ***
> sq_bedrooms -118.625 5.21332 -22.75 6.95e-094 ***
> bedrooms 700.000 0.000000 NA NA
> baths 400.000 4.64027e-07 8.620e+08 0.0000 ***
> age -10.0000 0.000000 NA NA
>
>
> Notice that the std error on sq_sqft squared is very small (but not zero)
> and the one on baths (which is technically zero) is only 1 decimal smaller.
> If you didn't know that the se is supposed to be zero on a restricted
> coefficient (like many of my students) you'd report something that was
> obviously wrong. In the original example, the problem was not so much in
> the restrictions, but the conditioning of the data themselves, which
> remains very bad even in this case of "good" restrictions. It's not
clear
> to me how sorting this out based on size is possible. Is there a complex
> eigenvalue associated with the R*inv(X'X)*R' that might identify which
> should be NA?
>
After thinking about this for a second, there will only be 3 eigenvalues
for that matrix, and all of them positive to be sure. Hmmm. The ones for
X'X are positive (but some are tiny indicating severe collinearity).
In this example the computed variance for the baths
coefficient is 2.15e-13, which seems on the big side to be
forced to zero. I think the only way to get this "right" would
be to somehow keep track of which coefficients, if any, are
assigned a definite numerical value by the restriction -- i.e.
look for rows of the R matrix that have only one non-zero
entry?
Allin