\documentclass[a4paper,11pt]{article}

\usepackage{bm}
\usepackage{verbatim}

\newcommand{\obsvec}{\ensuremath\mathbf{y}}
\newcommand{\obsmat}{\ensuremath\mathbf{H}}
\newcommand{\obsdist}{\ensuremath\mathbf{w}}
\newcommand{\obsvar}{\ensuremath\mathbf{R}}

\newcommand{\statevec}{\ensuremath{\bm\xi}}
\newcommand{\strmat}{\ensuremath\mathbf{F}}
\newcommand{\strdist}{\ensuremath\mathbf{v}}
\newcommand{\strvar}{\ensuremath\mathbf{Q}}
\newcommand{\hatstatevar}{\ensuremath\mathbf{P}}

\newcommand{\prederr}{\ensuremath\mathbf{e}}
\newcommand{\predvar}{\ensuremath{\bm\Sigma}}

\newcommand{\exog}{\ensuremath{\mathbf{z}}}
\newcommand{\param}{\ensuremath{\bm\theta}}

\title{RFC: Implementation in \texttt{gretl} of the Kalman filter
  at the user level}
\author{Jack}

\begin{document}

\maketitle

The Kalman filter is one of those functionalities that may come in
handy in a number of cases: of course we already use internally it for
ARMA estimation, but for example, it'd be nice to have a tool like
that for people who do DSGE models or maybe for people who would like
to implement ARMA estimation with series with missing values and so
on.

\section*{Notation}

Notation \emph{is} a problem. It seems that in econometrics everyone
is happy with $y = X \beta + u$, but we can't, as a community, make up
our minds on a standard for state-space models. Harvey (1991),
Hamilton (1994), Harvey \& Proietti (2005) and Pollock (1999) all use
different conventions. The notation here will be based on Hamilton's,
with slight variations.

A state-space model can be written down as
\begin{eqnarray}
  \label{eq:obseq}
  \obsvec_t & = & \obsmat_t \statevec_t + \mathbf{d}_t + \obsdist_t \\
  \label{eq:straneq}
  \statevec_{t} & = & \strmat_{t} \statevec_{t-1} + \mathbf{c}_t + \strdist_t
\end{eqnarray}
The symbol $T$ (not in bold) indicates the sample size. The
observables $\obsvec_t$ are an $n$-vector (in many cases, $n=1$,
but not necessarily), while the unobservable vector of states
$\statevec_t$ has $m$ elements.%; usually, if not always, $n < m$.
The covariance matrices for $\obsdist_t$ and $\strdist_t$ are $\obsvar_t$ and
$\strvar_t$, respectively.

In the special case when $\strmat_t = \strmat$, $\obsmat_t =
\obsmat$ and so on, the model will be called \emph{time-invariant}.

We have two options: we may go forward in time and do the
\emph{filtering} proper, or after that we go back and do the
\emph{smoothing}. 

To quote from Pollock (1999), p. 241 (with different notation), 
\begin{quote}
[\ldots] it is assumed that prior information on the previous state vector
$\statevec_0$ is available in the form of an unbiased estimate $\hat{\statevec}_0$ which has been drawn from
a distribution with a mean of $\statevec_0$ and a dispersion matrix of $\hatstatevar_0$.
\end{quote}

It may happen that $\hatstatevar_{0} \to \infty$ for diffuse priors. In
this case, we need to check Koopman (1997); in the standard case,
however, the prediction equations are:
\begin{eqnarray}
  \label{eq:filtstates}
  E_{t-1}(\statevec_t) = \hat{\statevec}_{t|t-1} & = & 
  \strmat_t \hat{\statevec}_{t-1} + \mathbf{c}_t  \label{eq:statepred}\\
  V_{t-1}(\statevec_t) = \hatstatevar_{t|t-1} & = &
  \strmat_t \hatstatevar_{t-1} \strmat'_t + \strvar_t \label{eq:Ppred}\\
  E_{t-1}(\obsvec_t) = \hat{\obsvec}_{t|t-1} & = & \obsmat_t \hat{\statevec}_{t|t-1} + \mathbf{d}_t  \\
  \prederr_t & = & \obsvec_t - \hat{\obsvec}_{t|t-1} \\
  V_{t-1}(\prederr_t) = \predvar_t & = & \obsmat_t \hatstatevar_{t|t-1} \obsmat_t' + \obsvar_t
 \end{eqnarray}
so the updating equations are:
\begin{eqnarray}
  \hat{\statevec}_t & = & 
  \hat{\statevec}_{t|t-1} + 
  \mathbf{K}_{t} \prederr_t \label{eq:stateupd}\\
  \hatstatevar_{t} &  = & 
  \hatstatevar_{t|t-1} - 
  \hatstatevar_{t|t-1} \obsmat_t' \mathbf{F}^{-1}_t \obsmat_t \hatstatevar_{t|t-1} \label{eq:Pupd}
\end{eqnarray}
where $\mathbf{K}_{t} = \hatstatevar_{t|t-1} \obsmat_t'
\obsvar_t^{-1}$ is the Kalman gain.  Equations
(\ref{eq:statepred})--(\ref{eq:stateupd}) and
(\ref{eq:Ppred})--(\ref{eq:Pupd}) can be written as a single set of
recursions going directly from $\hat{\statevec}_{t-1}$ to $\hat{\statevec}_t$
and from $\hatstatevar_{t-1}$ to $\hatstatevar_{t}$.

Note that in practice we may
want to implement something computationally smarter, like the
information filter (some bibliographic search is needed here, I can't
remember exactly what it looks like).

I can't be bothered right now to write the smoothing recursions, but
it's not terribly important.

\section*{User syntax}

The idea is to define a command block like we do with \texttt{mle} or
\texttt{gmm}:
\begin{verbatim}
list y = x1 x2
# or possibly
matrix Y = { x1 x2 }

kalman [ --smooth ]
  obsy y
  obsmat H
  obsex d
  obsvar Omega
  strmat Phi
  strex c
  strvar Psi
  inistate x0
  iniprec P0
end kalman
\end{verbatim}
where \texttt{obsy} stands for ``observable $\obsvec$'',
\texttt{obsmat} stands for ``matrix in the observation equation''
(\ref{eq:obseq}), \texttt{obsex} for ``exogenous term in the
observation equation'' and \texttt{obsvar} for ``variance of the
observation equation''. A similar conventions would hold for the state
transition equation (\ref{eq:straneq}), with \texttt{str} replacing
\texttt{obs}. The strings \texttt{inistate} and \texttt{iniprec}
should be self-explanatory.

The elements \texttt{obsex}, \texttt{strex} and \texttt{inistate} need
not be compulsory. If absent, they are understood to be 0. More or
less the same may go for \texttt{iniprec}, if the initialisation is
done via a diffuse prior (the simplest thing would be to use somehing
like $P_0 = 10^{30}\cdot I$, but smarter methods are available --
again, Koopman (1997) should come to the rescue).

For time-invariant models, the arguments to \texttt{obsmat, strmat}
etcetera must be matrices of the appropriate size. In the case of
time-varying models, the best idea I've been able to come up with so
far\footnote{Another solution is to pass matrices with $T$ rows and an
  appropriate number of columns, where each row is the vectorization
  of the corresponding matrix at time $t$; for example, if $\strmat_t$
  is time-varying, then the argument to \texttt{strmat} will have to
  be a $T \times m^2$ matrix. It'll be gretl's job to reshape the
  pertinent row of such a matrix, but the more I think the more I find
  it ugly.} is to pass matrices which result from user-defined
functions. This is very general and elegant; several interesting
possibilities are opened here, such as allowing, say, the elements of
$\obsmat_t$ to depend on $\prederr_{t-1}, \obsvec_{t-1}$, structural
parameters (as can be the case with DSGE models) and so on. 

Hence, you would have something like
\[
  \obsmat_t = \obsmat(\exog_t, \param),
\]
where $\obsmat$ corresponds to a user-written function of the kind
\begin{verbatim}
    function MyObsMat(list X, matrix theta)
\end{verbatim}
whose content depends on the context (recursive OLS, ARMAs, you name
it), to be called as
\begin{verbatim}
    list Z = x1 x2 x3
    matrix parm = { 0.5, -0.8 }
    kalman
      obsmat MyObsMat(Z, parm)
      ...
    end kalman
\end{verbatim}

Of course, one also needs to retrieve the interesting quantities: this
could be done via via the existing accessors \texttt{\$uhat} for
$\prederr_t$ and \texttt{\$sigma} (other candidates may be
\texttt{\$vcv} and \texttt{\$h} --- but I personally like
\texttt{\$sigma} better) for $\predvar_t$. When $n = 1$, these would be
series; otherwise \texttt{\$uhat} would be a $T \times n$ matrix (a
bit like we do for single equation/multiple equations models), while
\texttt{\$sigma} should be a $T \times \frac{n (n+1)}{2}$ matrix, with
the $t$-th row holding $\mathrm{vech}(\predvar_t)'$. New accessors
should be devoted to returning the states and their variances:
suitable candidates could be and \texttt{\$states} for $\statevec_t$ and
\texttt{\$stvars} for $\mathrm{vech}(\hatstatevar_t)'$.

Moreover, it is probably useful to also return the series
\[
  \ell_t = -0.5 \left[ 
    n \ln(2 \pi) + \left|\predvar_t\right| + 
    \prederr_t'\predvar_t^{-1} \prederr_t
  \right] ,
\]
possibly in the accessor \texttt{\$lnl}, which would yield the
likelihood. It then becomes trivial to insert the whole thing in an
\texttt{mle} block to estimate all sorts of model. Note, however, that
the \texttt{kalman} block itself would perform no optimisation
whatsoever: its job would only be the filtering (or the smoothing,
with the \verb|--smooth| option).

\section*{References}

Hamilton, James D. (1994), \emph{Time Series Analysis}, Princeton
University Press.
\smallskip

\noindent
Harvey, A.C. (1991), \emph{Forecasting, Structural Time Series Models
  and the Kalman Filter}, Cambridge University Press.
\smallskip

\noindent
Harvey, A.C. and T. Proietti (2005), \emph{Readings in Unobserved
  Component Models}, Oxford University Press.
\smallskip

\noindent
Koopman, S. J. (1997), ``Exact Initial Kalman Filtering and Smoothing
for Nonstationary Time Series Models'', Journal of the American
Statistical Association, Vol. 92, No. 440, pp. 1630--1638 \smallskip

\noindent
Pollock, D.S.G. (1999), \emph{A Handbook of Time-Series Analysis,
  Signal Processing and Dynamics}, Academic Press.


\end{document}

%%% Local Variables: 
%%% mode: latex
%%% TeX-master: t
%%% End: