URGENT HELP!! SCF Full Dataset too large! Help me retrieve some variables
by 100453087@alumnos.uc3m.es
Hello all!! I am just working on my thesis and I need a helping hand (I am pretty new in Econometrics, and I am really stuck rn ).
I'm using SCF data.
SCF provide datasets in Full Public Data or Summary Extract Public Data. I am mainly working with Summary, but I need to retrieve some variables from the FULL PUBLIC DATA (I am unable to do so because Stata's restrictions in number of variables, and the same in Gretl, the softwares I use). I have tried Stat/Transfer but the Demo mode does not work properly.
I would really appreciate if somebody could get for me/tell me the way to retrieve the following variables and observations for 2004 and 2007 from these datasets:
"Full Public Data Set: Stata format" for 2004" ( https://www.federalreserve.gov/econres/scf_2004.htm )
"Full Public Data Set: Stata format" for 2007 ( https://www.federalreserve.gov/econres/scf_2007.htm)
CASEID
AGE
Y1
X820
X828
X920
X1020
X9154
X9155
X9156
Thanks a lot!!
9 months, 3 weeks
Re: R: Re: BayTool Package
by Brian Revell
Hi Luca
Many thanks for your clear explanation. My interest in the Bayesian
estimator arises out of a personal interest (as a salmon angler) and
concern that one of the UK government environment agencies has been
pooling data from the 12 rivers that are or have beenmonitored using fish
counters to count the numvers of adult salmon entering a river to
reproduce. It is intended to use the pooled data model as a means of
estimating the run size on 56 other non monitored rivers , utilising a
Bayesian model (with Beta-binomial non informative prior). A number of
indepndent variables are in the model {eg river flow data, angler effort in
days fished etc). The rationale is that as all salmon anglers must report
their annual catch, the percentage of adult fish caught on the monitored
rivers can be derived. This can then be used to estimate the numbers of
fish entering non counter rivers given the catch data for each of those
rivers.Essentially it is a panel data estimation problem of the proportion
of fish caught by anglers in frequentist terms, and one I have already
modelled in gretl to compare with the official Bayesian model output now
being proposed.
The bottom line is that for a given reported catch level on a river
without a fish counter, if the percentage of fish caught on that river
can be estimated then the total number of fish can be derived i.e. total
estimated fish in river = rod catch/rod catch %. So the model dependent
variable is percentage of fish caught on all 12 monitored rivers - logit
transformed.
There are a number of underlying issues with the recorded river data that
give me some cause for concern in their model which pools the full set of
the 12 monitored rivers' data First, fish counter recording on 2 rainfed
rivers in the pool ceased permanently in 2015, and on another temporarily,
but is now being reinstalled. Second the hydrological profiles of 4 of the
other rivers are unique, being chalkstream rivers where much of the flow is
from subterranean spring water sources. Half of the pooled panel units
therefore have rainfed river flow regimes, where salmon only enter those
rivers at times of higher river flow levels. It is clear that rod catch
proportions are higher on the chalk stream sample than on the rainfed
rivers and in fact the chalkstreams only account for 4% of the national
salmon rod catch, so are disproportionately represented in the pooled
sample.
In frequentist terms several of the exploitation rates in the panel of
monitored rivers are only trend stationary, or AR1. {hence my question re
recovering residuals from Bayesian model fitted estimates). One of their
primary independent variables - fishing effort exhibits structural change
which passed unrecognised in the official model.
The official model fails to account for different rod catch proportion
levels on the different monitored rivers which LSQ DV panel data model
framework readily handles with robust errors to cope with
heteroskedasticity. The official model simulations based on the Bayesian
parameter mean estimates for 52 non monitored rivers overstates in
consequence their rod catch proportions, and under-predicts the number of
salmon entering non_counter rivers.
In summary, the issue largely reduces to that relating to the constant in
the estimated prediction equation, and appropriate rivers to include in the
pool panel dataset. I have estimated the panel model with all 12, 10
(excluding 2 rivers where the counters have permanently ceased), and 6
rivers ( also excluding the 4 chalkstreams} ie a panel of rainfed rivers
only. In that sequence of panel sizeof models estimated, model errors
reduce and mean rod catch predicted percentages are lower than official
model estimates.My dataset also drops the initial years time sample years
of 1994-2001 used in the official model to avoid including the
pre-structural break in the rod effort series in 2001. My interest now is
to replicate my model specification with Bayesian estimates, but to be able
to establish whether there is serial correlaion in its residuals, which
naturally I suspect will be lurking in the mean coefficient fitted version
of the official model.
As I do chair a government special advisory panel for agricultural
economics, you can see that I need to be clear in the issues of bayesian
estimation before stirring the pot in the Ministry, which has oversight and
funds the fisheries science agencies doing the modelling. A one size fits
all model may be administratively convenient, but in my view is hardly
appropriate.
The government agency has published a paper about their model in a
fisheries science journal which I intend to formally comment on. Much is
unsaid in the article regarding post estimation diagnostic checking or
panel selection. Tiny graphs are presented usually with log scale vertical
axes to render modle fit and data points on the model difficult to see
clearly. However I have photo enlarged the graphs, only to discover that
depending on the river panel unit some 39 to 70 percent of their actual
recorded datapoints fall outside the 95% (in)credible interval of the
model fitted mean predicted values for each river.
Cheers
On Tue, 2 Apr 2024, 23:32 Luca Pedini, <luca_pedini(a)live.it> wrote:
> Hi Brian,
>
>
>
> thanks for the message and thank you Sven for the quick reply.
> Unfortunately I’ve been experiencing some issues with the mailing list
> recently: I’m receiving the messages late or not at all! Sorry!
>
>
>
> As for your original questions, the function by default will generate
> non-informative results (frequentist) alongside the informative ones (truly
> Bayesian, obtained via the chosen prior): the idea behind is to have a
> direct comparison between the two. In the resulting output bundle you can
> access both of them: the sub-bundle non_info_case reports all basic
> frequentist estimators; while for the informative results you can access
> two different sub-bundles: i) sampled_coeff which contains all the sampled
> parameters and from which you can recover quantiles or any sort of
> function; and ii) post_summ, which simply reports posterior means and
> standard deviations.
>
>
>
> In the sample script the informative prior is “not so informative”, for
> this reason the results are quite similar to the frequentist ones. But if
> you have to choose which posterior quantity to use between non-informative
> and informative, always use the informative one. Moreover, if you have to
> compute some functions on the posterior quantities, I suggest computing
> them on the sampled parameters and not simply on the posterior mean. In
> this way, you can derive a distribution and more exact conclusions.
>
>
>
> As for the residual questions, these are not stored: storing residuals for
> each sampled parameter will drain quickly the computer memory, especially
> with big dataset, for this reason, these are not saved. However, you can
> recover it via the sampled coefficients.
>
>
>
> I hope this may help and if you have any other questions or doubts, do not
> hesitate to ask!
>
>
>
> Best regards,
>
>
>
> Luca
>
>
>
> *Da: *Brian Revell <bjr.newmail(a)gmail.com>
> *Data: *lunedì, 1 aprile 2024 23:18
> *A: *Gretl list <gretl-users(a)gretlml.univpm.it>
> *Oggetto: *[Gretl-users] Re: BayTool Package
>
> Thanks Sven for the detailed explanstion.. I had eventually arrived at the
> conclusion that the mean value of the density was the point estimate being
> output albeit that the densities for each parameter were also an output
> plot option. It would seem then that the residuals from the mean values are
> not retrievable (yet) from running the GUI version, which incidentally has
> now attached itself permanently as an option within univariate options.
>
>
>
> It would appear in many applied studies that the parameter means with
> credible intervals are employed. In a forecasting or projection context
> with Bayesian estimation of time series data, one might suppose that the
> residuals would clearly still be important for post estimation diagnosis
> regarding stationarity and serial correlation.
>
>
>
> Brian
>
>
>
> On Mon, 1 Apr 2024, 20:55 Sven Schreiber, <sven.schreiber(a)fu-berlin.de>
> wrote:
>
> Am 16.03.2024 um 17:22 schrieb Brian Revell:
>
> Hi
>
> I have tried using the BayTool package with some success (in terms of
> getting output),
>
> I have read (perhaps not well enough) the detailed paper by Luca Pedini.
>
> Luca has had some email problems lately, perhaps he didn't get this one.
>
> However, I am puzzled why when simply using the linear model with
> conjugate prior, as a default I get two sets of posterior mean parameter
> and se estimates -NI-post m and I post-m and NI-Pst se and I-post se. .
> Looking at the graphs of the Gaussian Posterior density estimates, they
> appear to more closely centred on the I-pos -mean. Although the
> difference s between the N & O posterior estimates is small, what is the
> basis for selecting one rather than the other?
>
> I believe that NI stands for non-informative and I for informative, which
> typically refers to the prior used. So basically the NI results should be
> purely data-driven, which usually means the max-likelihood or frequentist
> quantities. This is just for comparison, a Bayesian would of course go for
> the informative ones. (Provided she really believes in the prior, which for
> some reason doesn't seem to happen all that often in reality, but that's a
> different story.)
>
>
>
> Second question - how can I retrieve the fitted values and residuals from
> theB-Took Package. Hopefully it resides somewhere to be retirieved post
> estimation. Or is iit necessary to run the input data through the equation
> to generate it onself -and if so, which set of posterior mean estimates
> does one use?
>
> Luca may correct me eventually, but it seems those things are not directly
> available. I guess this makes sense from a Bayesian perspective since it
> gives you densities, not point estimates. For example, do you want to
> consider the posterior mean, median or mode as your quasi-point estimate?
> All imply different sets of residuals. In principle they could all be
> stored of course, but it's not clear whether that's the most practical
> thing to do.
>
> Having said that, it looks like for the coefficients (in the sub-bundle
> post_summ) the means are stored, so in that context this arbitrary choice
> was made, apparently.
>
> So for the CASE 1 in the sample script with covariates list X, I guess you
> could try:
>
> lincomb(X, out.post_summ.post_betamean)
>
> to get the fitted values. (Untested.)
>
>
>
> PS -the BayTool package does not seem to attach permanently to the URL
> "model" list of options.
>
> Sorry, I don't understand. What URL?
>
> cheers
>
> sven
>
> _______________________________________________
> Gretl-users mailing list -- gretl-users(a)gretlml.univpm.it
> To unsubscribe send an email to gretl-users-leave(a)gretlml.univpm.it
> Website:
> https://gretlml.univpm.it/postorius/lists/gretl-users.gretlml.univpm.it/
>
> _______________________________________________
> Gretl-users mailing list -- gretl-users(a)gretlml.univpm.it
> To unsubscribe send an email to gretl-users-leave(a)gretlml.univpm.it
> Website:
> https://gretlml.univpm.it/postorius/lists/gretl-users.gretlml.univpm.it/
>
9 months, 3 weeks
Storing string-valued series in a database
by Artur T.
Hi all,
The first time, I play around with the option to store a dataset as a
gretl database. This works, as mentioned in the command reference, for
cross-sectional and time-series data but not panel (ok so far).
I tried to store a string-valued series. However, the string-values get
lost when importing again, only showing the numerical value mapping a
string to some discrete positive number.
I could not find any details in the manual nor the help text whether
this is actually supported.
I did the following:
<hansl>
open australia.gdt
series y = 1
stringify(y, defarray("foo"))
</hansl>
Then:
- via the menu File -> Export data -> Store series as gretl DB-format
(.bin).
- Delete series y in the active dataset
- Open the database by dragging the bin file from the file explorer into
the gretl main window.
- Import series y.
Thanks in advance.
Artur
9 months, 3 weeks
BayTool Package
by Brian Revell
Hi
I have tried using the BayTool package with some success (in terms of
getting output),
I have read (perhaps not well enough) the detailed paper by Luca Pedini.
However, I am puzzled why when simply using the linear model with
conjugate prior, as a default I get two sets of posterior mean parameter
and se estimates -NI-post m and I post-m and NI-Pst se and I-post se. .
Looking at the graphs of the Gaussian Posterior density estimates, they
appear to more closely centred on the I-pos -mean. Although the
difference s between the N & O posterior estimates is small, what is the
basis for selecting one rather than the other?
Second question - how can I retrieve the fitted values and residuals from
theB-Took Package. Hopefully it resides somewhere to be retirieved post
estimation. Or is iit necessary to run the input data through the equation
to generate it onself -and if so, which set of posterior mean estimates
does one use?
I expect these are straightforward issues for a Bayesian -but not for a
frequentist.
PS -the BayTool package does not seem to attach permanently to the URL
"model" list of options.
regards
Brian J Revell
Professor Emeritus (Agricultural Economics)
Tel: home 01952 728153 Mobile 07976 538712
Address: Orchard Croft, Vineyard Rd, Homer, Much Wenlock TF!3 6NG
Alt. Email bjrevell(a)harper-adams.ac.uk
9 months, 3 weeks
Bai-Perron Structural Breaks Model with GUI
by Brian Revell
I am attempting a simple task of identifying structural breaks in a single
time series trend.
There is a single dep var t=1994-2021, two variables in X viz. constant and
time
The following error message comes up
The variable X is read-only
*** error in function SB_GUI, line 7
> list X = !nelem(X) ? null : X
Some guidance would be welcome
.
Brian J Revell
9 months, 3 weeks
$error = 1
by Marcin Błażejowski
Hi,
two questions:
1. what $error=1 means?
2. and why 'errmsg(1)' returns empty string?
Marcin
--
Marcin Błażejowski
9 months, 3 weeks