Hi All,

I have been trying to solve a few problems regarding the rolling regression. In order to separate them, I am using a simple example here, in which I have two financial assets: Amazon (AMZN) and the market (SP500). I'm trying to estimate "beta", which can be roughly obtained from regressing the returns of AMZN on the returns of SP500.

I found a very simple code from a previous issue on this list, which is quite straightforward:

T = $nobs

scalar window_size = 20
scalar k = $nobs - window_size + 1
series b = NA

smpl 1 window_size
loop i = window_size .. T
    ols AMZN const SP500
     if i < T
         smpl +1 +1
     endif
endloop

smpl full

So far so good, this works quite well. But let's say the data covers 100 periods for SP500 but only 60 for AMZN (no data for the last 40). Because I'm using a rolling window of 20 data points, a point will come when the routine will use only 19 data points, 18, 17, and so on until reaching 2 (which is the technical minimum). The reason for this is because the routine will still identify datapoints in the full sample, even though there are less for AMZN.

My question is as follows: Is there a simple way of imposing the routine to only estimate OLS when we have the full 20 data points for AMZN and 20 for SP500? When I have a very large dataset with 400 or even 500 assets, there are many cases where some just went out of the market and then I should not be estimating betas. I believe the program checks for the $t1 and if it exists it computes OLS. I would like it to check for everything between $t1 and $t2 and if some missing, in particular at the end, just don't compute.

Hope I could make myself clear! Thanks all!

--
Filipe Rodrigues da Costa
Send me an email to: filipe@pobox.io
Reach me through Telegram at: https://t.me/rodriguesdacosta