Hi Luca
Many thanks for your clear explanation. My interest in the Bayesian
estimator arises out of a personal interest (as a salmon angler) and
concern that one of the UK government environment agencies has been
pooling data from the 12 rivers that are or have beenmonitored using fish
counters to count the numvers of adult salmon entering a river to
reproduce. It is intended to use the pooled data model as a means of
estimating the run size on 56 other non monitored rivers , utilising a
Bayesian model (with Beta-binomial non informative prior). A number of
indepndent variables are in the model {eg river flow data, angler effort in
days fished etc). The rationale is that as all salmon anglers must report
their annual catch, the percentage of adult fish caught on the monitored
rivers can be derived. This can then be used to estimate the numbers of
fish entering non counter rivers given the catch data for each of those
rivers.Essentially it is a panel data estimation problem of the proportion
of fish caught by anglers in frequentist terms, and one I have already
modelled in gretl to compare with the official Bayesian model output now
being proposed.
The bottom line is that for a given reported catch level on a river
without a fish counter, if the percentage of fish caught on that river
can be estimated then the total number of fish can be derived i.e. total
estimated fish in river = rod catch/rod catch %. So the model dependent
variable is percentage of fish caught on all 12 monitored rivers - logit
transformed.
There are a number of underlying issues with the recorded river data that
give me some cause for concern in their model which pools the full set of
the 12 monitored rivers' data First, fish counter recording on 2 rainfed
rivers in the pool ceased permanently in 2015, and on another temporarily,
but is now being reinstalled. Second the hydrological profiles of 4 of the
other rivers are unique, being chalkstream rivers where much of the flow is
from subterranean spring water sources. Half of the pooled panel units
therefore have rainfed river flow regimes, where salmon only enter those
rivers at times of higher river flow levels. It is clear that rod catch
proportions are higher on the chalk stream sample than on the rainfed
rivers and in fact the chalkstreams only account for 4% of the national
salmon rod catch, so are disproportionately represented in the pooled
sample.
In frequentist terms several of the exploitation rates in the panel of
monitored rivers are only trend stationary, or AR1. {hence my question re
recovering residuals from Bayesian model fitted estimates). One of their
primary independent variables - fishing effort exhibits structural change
which passed unrecognised in the official model.
The official model fails to account for different rod catch proportion
levels on the different monitored rivers which LSQ DV panel data model
framework readily handles with robust errors to cope with
heteroskedasticity. The official model simulations based on the Bayesian
parameter mean estimates for 52 non monitored rivers overstates in
consequence their rod catch proportions, and under-predicts the number of
salmon entering non_counter rivers.
In summary, the issue largely reduces to that relating to the constant in
the estimated prediction equation, and appropriate rivers to include in the
pool panel dataset. I have estimated the panel model with all 12, 10
(excluding 2 rivers where the counters have permanently ceased), and 6
rivers ( also excluding the 4 chalkstreams} ie a panel of rainfed rivers
only. In that sequence of panel sizeof models estimated, model errors
reduce and mean rod catch predicted percentages are lower than official
model estimates.My dataset also drops the initial years time sample years
of 1994-2001 used in the official model to avoid including the
pre-structural break in the rod effort series in 2001. My interest now is
to replicate my model specification with Bayesian estimates, but to be able
to establish whether there is serial correlaion in its residuals, which
naturally I suspect will be lurking in the mean coefficient fitted version
of the official model.
As I do chair a government special advisory panel for agricultural
economics, you can see that I need to be clear in the issues of bayesian
estimation before stirring the pot in the Ministry, which has oversight and
funds the fisheries science agencies doing the modelling. A one size fits
all model may be administratively convenient, but in my view is hardly
appropriate.
The government agency has published a paper about their model in a
fisheries science journal which I intend to formally comment on. Much is
unsaid in the article regarding post estimation diagnostic checking or
panel selection. Tiny graphs are presented usually with log scale vertical
axes to render modle fit and data points on the model difficult to see
clearly. However I have photo enlarged the graphs, only to discover that
depending on the river panel unit some 39 to 70 percent of their actual
recorded datapoints fall outside the 95% (in)credible interval of the
model fitted mean predicted values for each river.
Cheers
On Tue, 2 Apr 2024, 23:32 Luca Pedini, <luca_pedini(a)live.it> wrote:
Hi Brian,
thanks for the message and thank you Sven for the quick reply.
Unfortunately I’ve been experiencing some issues with the mailing list
recently: I’m receiving the messages late or not at all! Sorry!
As for your original questions, the function by default will generate
non-informative results (frequentist) alongside the informative ones (truly
Bayesian, obtained via the chosen prior): the idea behind is to have a
direct comparison between the two. In the resulting output bundle you can
access both of them: the sub-bundle non_info_case reports all basic
frequentist estimators; while for the informative results you can access
two different sub-bundles: i) sampled_coeff which contains all the sampled
parameters and from which you can recover quantiles or any sort of
function; and ii) post_summ, which simply reports posterior means and
standard deviations.
In the sample script the informative prior is “not so informative”, for
this reason the results are quite similar to the frequentist ones. But if
you have to choose which posterior quantity to use between non-informative
and informative, always use the informative one. Moreover, if you have to
compute some functions on the posterior quantities, I suggest computing
them on the sampled parameters and not simply on the posterior mean. In
this way, you can derive a distribution and more exact conclusions.
As for the residual questions, these are not stored: storing residuals for
each sampled parameter will drain quickly the computer memory, especially
with big dataset, for this reason, these are not saved. However, you can
recover it via the sampled coefficients.
I hope this may help and if you have any other questions or doubts, do not
hesitate to ask!
Best regards,
Luca
*Da: *Brian Revell <bjr.newmail(a)gmail.com>
*Data: *lunedì, 1 aprile 2024 23:18
*A: *Gretl list <gretl-users(a)gretlml.univpm.it>
*Oggetto: *[Gretl-users] Re: BayTool Package
Thanks Sven for the detailed explanstion.. I had eventually arrived at the
conclusion that the mean value of the density was the point estimate being
output albeit that the densities for each parameter were also an output
plot option. It would seem then that the residuals from the mean values are
not retrievable (yet) from running the GUI version, which incidentally has
now attached itself permanently as an option within univariate options.
It would appear in many applied studies that the parameter means with
credible intervals are employed. In a forecasting or projection context
with Bayesian estimation of time series data, one might suppose that the
residuals would clearly still be important for post estimation diagnosis
regarding stationarity and serial correlation.
Brian
On Mon, 1 Apr 2024, 20:55 Sven Schreiber, <sven.schreiber(a)fu-berlin.de>
wrote:
Am 16.03.2024 um 17:22 schrieb Brian Revell:
Hi
I have tried using the BayTool package with some success (in terms of
getting output),
I have read (perhaps not well enough) the detailed paper by Luca Pedini.
Luca has had some email problems lately, perhaps he didn't get this one.
However, I am puzzled why when simply using the linear model with
conjugate prior, as a default I get two sets of posterior mean parameter
and se estimates -NI-post m and I post-m and NI-Pst se and I-post se. .
Looking at the graphs of the Gaussian Posterior density estimates, they
appear to more closely centred on the I-pos -mean. Although the
difference s between the N & O posterior estimates is small, what is the
basis for selecting one rather than the other?
I believe that NI stands for non-informative and I for informative, which
typically refers to the prior used. So basically the NI results should be
purely data-driven, which usually means the max-likelihood or frequentist
quantities. This is just for comparison, a Bayesian would of course go for
the informative ones. (Provided she really believes in the prior, which for
some reason doesn't seem to happen all that often in reality, but that's a
different story.)
Second question - how can I retrieve the fitted values and residuals from
theB-Took Package. Hopefully it resides somewhere to be retirieved post
estimation. Or is iit necessary to run the input data through the equation
to generate it onself -and if so, which set of posterior mean estimates
does one use?
Luca may correct me eventually, but it seems those things are not directly
available. I guess this makes sense from a Bayesian perspective since it
gives you densities, not point estimates. For example, do you want to
consider the posterior mean, median or mode as your quasi-point estimate?
All imply different sets of residuals. In principle they could all be
stored of course, but it's not clear whether that's the most practical
thing to do.
Having said that, it looks like for the coefficients (in the sub-bundle
post_summ) the means are stored, so in that context this arbitrary choice
was made, apparently.
So for the CASE 1 in the sample script with covariates list X, I guess you
could try:
lincomb(X, out.post_summ.post_betamean)
to get the fitted values. (Untested.)
PS -the BayTool package does not seem to attach permanently to the URL
"model" list of options.
Sorry, I don't understand. What URL?
cheers
sven
_______________________________________________
Gretl-users mailing list -- gretl-users(a)gretlml.univpm.it
To unsubscribe send an email to gretl-users-leave(a)gretlml.univpm.it
Website:
https://gretlml.univpm.it/postorius/lists/gretl-users.gretlml.univpm.it/
_______________________________________________
Gretl-users mailing list -- gretl-users(a)gretlml.univpm.it
To unsubscribe send an email to gretl-users-leave(a)gretlml.univpm.it
Website:
https://gretlml.univpm.it/postorius/lists/gretl-users.gretlml.univpm.it/