Hi Giulia,
Not sure that the following script is the best shortcut but I think it does work. It supposes that there's a variable in your dataset which identifies the companies - named here as 'company'.

# create a list with all variables, x1 to x20, to check for any missing values
list xlist = x1
loop foreach i x2..x20
    list xlist = xlist $i

# create a series, as a dummy to mark 'valid' observations; it takes 1 by default
series validity=1

# check for NA: test_empty=0 if there's at least one missing value throughout 'xlist'
series test_empty = ok(xlist)

# loop through each company
loop for i=1..1577
    smpl company=$i --restrict --replace
    # check if there's at least one missing value
    scalar test_sc = sum(test_empty)
    # if so then validity=0 for all the years for the respective company 
    if test_sc <5
        series validity=0

# finally restrict sample to 'valid' observations only
smpl validity=1 --restrict --replace
# and compute summary statistics - mean & median
summary xlist

# compute the sum for each variable
matrix m_x = {xlist}
matrix m_sum = sumc(xlist)
smpl full

2015-12-16 9:24 GMT+01:00 Giulia Taveggia <taveggia@csilmilano.com>:

Dear all,


I handle a panel dataset consisting of 1577 companies, 5 years and 20 variables. I would like to know how restrict the dataset excluding missing values (ex. If one company have a missing value for one year it should be excluded by all other years). After doing this operation, I would like to compute the mean, the median and the sum for each variable. Could you please write down the right script for these actions?


Thank you very much,

Best Regards



Giulia Taveggia


Researcher, Country Analysis and Forecasts Unit


CSIL Centre for Industrial Studies

Corso Monforte 15 - 20122 Milano - Italy Tel +39 02780497- Fax +39 02 780703


Gretl-users mailing list