On Fri, 18 Jun 2021, George Matysiak wrote:
Thanks for that. To be clear, what I would like to do is save the
random
selected observations and estimate a regression equation. This is my
training sample. The excluded observations would be my test sample for
forecasts - how could I save them as the test set? Can it be done with the
GUI or perhaps it needs a script? Thanks.
Jack's earlier suggestion is apt for this. Here's a slightly
extended version with a little explanation of what's going on:
<hansl>
open data7-24
# select 100 training observations at random
smpl 100 --random
# create an all-ones series
series training = 1
# run a regression or whatever on the training data
# (you could do cross validation in here)
ols salepric const sqft age city
# go back to the full dataset
smpl full
# Note: values for 'training' will be missing (NA) for
# all observations NOT included in the training sample.
# We key off that to create a complementary sub-sample.
series testing = missing(training)
# take a look, to check, if you like
print training testing --byobs
# switch to the testing subset
smpl testing --dummy
# run a regression or whatever on the testing data
ols salepric const sqft age city
</hansl>
Each of these steps could be done in the GUI but it's a lot easier,
and less error-prone, to script this sort of thing.
Allin