Re: 100-based indices with panel data (fwd)

Sunday, 2 July 2023

---------- Forwarded message ----------
Date: Sat, 1 Jul 2023 19:39:27 +0000
From: Allin Cottrell <cottrell(a)wfu.edu&gt;
To: "Riccardo (Jack) Lucchetti" <r.lucchetti(a)univpm.it&gt;
Cc: Josué Martínez-Castillo <jota3mc(a)gmail.com&gt;,
Subject: Re: 100-based indices with panel data

On Sat, 1 Jul 2023, Riccardo (Jack) Lucchetti wrote:

...
 On Sat, 1 Jul 2023, Allin Cottrell wrote:

> On Fri, 30 Jun 2023, Josué Martínez-Castillo wrote:
> 
>> I'm a newbie in gretl, very excited to learn how to use the program for 
>> learning econometrics on my own. However, right now I'm curious on how to 
>> estimate 100-based indices when dealing with panel data. For example, what 
>> if I want to estimate a 100-based index for each unit using as base year 
>> the first year available of, say, real GDP.
>> 
>> I was looking for the answer in the manual of the 2023 version of gretl. No 
>> success. I was hoping maybe someone can help me with guidance.
> 
> Good question. As things stand there isn't a built-in way to construct such 
> indices for panel data using the graphical interface. But assuming you want 
> the indices to work in the time dimension for each panel unit, it's actually 
> not hard to do via scripting. Here's an example:
 [...]

 Here's another approach, which avoids the loop. The syntax is a bit too 
 terse, perhaps, but IMO instructive.

 <hansl>
 open abdata.gdt

 base = cum(ok(EMP)) == 1 ? EMP : NA
 EMP_b100 = EMP/pexpand({base}) * 100
 print EMP EMP_b100 --byobs
 </hansl> 
Yes, quite instructive! In case anyone's interested let's unpack Jack's 
formulation.

First consider:

base = cum(ok(EMP)) == 1 ? EMP : NA

We're looking at what gretl calls series here.

The inner expression, "ok(EMP)" creates a series with value 1 for valid values 
of its series argument and 0 for NAs (missing values).

This addresses a problem with the first variant I posted, where I just took the 
base of the indices to be the first observation for each unit. That's OK with 
the grunfeld data that I referenced because it has no missing values. But if 
the first observation for a unit were NA, the whole index series for that unit 
would be NA via my method (since NAs propagate in arithmetical calculation).

Not accidentally, Jack chose the supplied abdata dataset (Arellano and Bond), 
which contains missing values, to illustrate his calculation, and I'll work 
with it here.

Now cum() is gretl's cumulation function, and it works "properly" for panel

data: it cumulates in the time dimension, starting over for each unit. So 
"cum(ok(EMP))" gives a series holding the count of valid values "to
date" for 
each unit. OK, so far?

Then "cum(ok(EMP)) == 1 ? EMP : NA" is an instance of the very handy ternary 
operator. It has the form:

result = condition ? one_thing : other_thing

which can be spelled out a bit as

if (condition is true) result is one_thing, otherwise result is other_thing

So, "cum(ok(EMP)) == 1 ? EMP : NA" gives a series holding the value of EMP for 
each first-valid-observation per panel unit, and NA for all other observations.

To see what's happening up to this point one could open the abdata dataset in 
gretl and execute these commands (in a script or via the console):

series eok = ok(EMP)
series cumeok = cum(eok)
series base = cum(ok(EMP)) == 1 ? EMP : NA
print EMP eok cumeok base --byobs

Next comes the line:

EMP_b100 = EMP/pexpand({base}) * 100

On the left-hand side is the final indices series. On the right-hand side we're 
using the original EMP (employment), multiplying by 100 (as per convention), 
and dividing by "pexpand({base})". What the heck is this last thing?

Well, notice the curly brackets around "base". These turn a series into a 
vector (special case of a matrix) and the thing you need to know here is that 
in gretl by default this conversion skips any missing values. [Note: you can 
prevent this via the command "set skip_missing off".] So in a panel with N 
units {base} will be an N-vector holding just the first valid observation of 
EMP for each unit.

Then the pexpand ("panel-expand") function turns this N-vector into a series by

repeating each of the N values T times, for each unit. Which is (probably) just 
what we want to divide EMP by, to create the per-unit indices. In a panel 
dataset with no missing values it's exactly equivalent to the more pedestrian 
formulations I posted earlier.

Now for a couple of missing-data complications we'd want to deal with in a 
built-in version of this functionality.

1) What if some units have NO valid values for the variable we're working with? 
Then {base} will not be an N-vector and Jack's method will not work unmodified.

2) What if the date of the first valid observation differs across units, but we 
want a set of indices that start in the same period? Again, some fancier 
footwork would be needed. We'd need to look for the first period with a common 
non-missing observation across all units that had more than one non-missing 
observation.

Allin Cottrell

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004