I have been trying to solve a few problems regarding the rolling regression. In order to separate them, I am using a simple example here, in which I have two financial assets: Amazon (AMZN) and the market (SP500). I'm trying to estimate "beta", which can be roughly obtained from regressing the returns of AMZN on the returns of SP500.

So far so good, this works quite well. But let's say the data covers 100 periods for SP500 but only 60 for AMZN (no data for the last 40). Because I'm using a rolling window of 20 data points, a point will come when the routine will use only 19 data points, 18, 17, and so on until reaching 2 (which is the technical minimum). The reason for this is because the routine will still identify datapoints in the full sample, even though there are less for AMZN.

My question is as follows: Is there a simple way of imposing the routine to only estimate OLS when we have the full 20 data points for AMZN and 20 for SP500? When I have a very large dataset with 400 or even 500 assets, there are many cases where some just went out of the market and then I should not be estimating betas. I believe the program checks for the $t1 and if it exists it computes OLS. I would like it to check for everything between $t1 and $t2 and if some missing, in particular at the end, just don't compute.

Filipe Rodrigues da Costa

Send me an email to: filipe@pobox.io

Reach me through Telegram at: https://t.me/rodriguesdacosta