[Gretl-devel] Regression with many dummies (RFC)

Friday, 17 October 2025

Hi all,

I've begun to explore the issue of the numerical performance of OLS 
regression, where you want to condition on a qualitative variable with 
many different values, that is you want to run something like

ols y X dummify(fac)

where "fac" is a discrete variable with a high number of possible valid 
values (call it h).

Normally, you don't really care for all the parameters; you just want 
the OLS subvector for X (call it beta). Of course, a special case of the 
above is fixed-effect estimation in panel data, but the problem is in 
fact a little bit more general than that.

If nelem(X) = k, that would lead to regressing y on a list with k+h-1 
elements. If the sample size n and h are both large, that takes a lot of 
RAM, and it's very inefficient, since (as is well known) you can compute 
beta in a much more clever way via the Frisch-Waugh theorem.

The attached script does just that[*], and compares execution time for 
both approaches, so you can play with it.

My question to the community is: would it be worthwhile to implement the 
"specialised" algorithm natively? Something like

fols y X fac

where "fols" stands for "factorised OLS"? Or maybe as an option to the

ols command? Or maybe as a function? Having such a command (or function) 
would of course just pay off just in cases when both n and h are large. 
Is this worth the effort?

[*] The attached function just computes beta, not all the auxiliary 
quantities. But those are easy to add.

-------------------------------------------------------
   Riccardo (Jack) Lucchetti
   Dipartimento di Scienze Economiche e Sociali (DiSES)

   Università Politecnica delle Marche
   (formerly known as Università di Ancona)

   r.lucchetti(a)univpm.it
   http://www2.econ.univpm.it/servizi/hpp/lucchetti
-------------------------------------------------------

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

[Gretl-devel] Regression with many dummies (RFC)