Gretl-users April 2024

gretl-users@gretlml.univpm.it

12 participants
16 discussions

URGENT HELP!! SCF Full Dataset too large! Help me retrieve some variables

by 100453087＠alumnos.uc3m.es

Hello all!! I am just working on my thesis and I need a helping hand (I am pretty new in Econometrics, and I am really stuck rn ). I'm using SCF data. SCF provide datasets in Full Public Data or Summary Extract Public Data. I am mainly working with Summary, but I need to retrieve some variables from the FULL PUBLIC DATA (I am unable to do so because Stata's restrictions in number of variables, and the same in Gretl, the softwares I use). I have tried Stat/Transfer but the Demo mode does not work properly. I would really appreciate if somebody could get for me/tell me the way to retrieve the following variables and observations for 2004 and 2007 from these datasets: "Full Public Data Set: Stata format" for 2004" ( https://www.federalreserve.gov/econres/scf_2004.htm ) "Full Public Data Set: Stata format" for 2007 ( https://www.federalreserve.gov/econres/scf_2007.htm) CASEID AGE Y1 X820 X828 X920 X1020 X9154 X9155 X9156 Thanks a lot!!

1 year, 7 months

3
6
0 / 0

Re: R: Re: BayTool Package

by Brian Revell

Hi Luca Many thanks for your clear explanation. My interest in the Bayesian estimator arises out of a personal interest (as a salmon angler) and concern that one of the UK government environment agencies has been pooling data from the 12 rivers that are or have beenmonitored using fish counters to count the numvers of adult salmon entering a river to reproduce. It is intended to use the pooled data model as a means of estimating the run size on 56 other non monitored rivers , utilising a Bayesian model (with Beta-binomial non informative prior). A number of indepndent variables are in the model {eg river flow data, angler effort in days fished etc). The rationale is that as all salmon anglers must report their annual catch, the percentage of adult fish caught on the monitored rivers can be derived. This can then be used to estimate the numbers of fish entering non counter rivers given the catch data for each of those rivers.Essentially it is a panel data estimation problem of the proportion of fish caught by anglers in frequentist terms, and one I have already modelled in gretl to compare with the official Bayesian model output now being proposed. The bottom line is that for a given reported catch level on a river without a fish counter, if the percentage of fish caught on that river can be estimated then the total number of fish can be derived i.e. total estimated fish in river = rod catch/rod catch %. So the model dependent variable is percentage of fish caught on all 12 monitored rivers - logit transformed. There are a number of underlying issues with the recorded river data that give me some cause for concern in their model which pools the full set of the 12 monitored rivers' data First, fish counter recording on 2 rainfed rivers in the pool ceased permanently in 2015, and on another temporarily, but is now being reinstalled. Second the hydrological profiles of 4 of the other rivers are unique, being chalkstream rivers where much of the flow is from subterranean spring water sources. Half of the pooled panel units therefore have rainfed river flow regimes, where salmon only enter those rivers at times of higher river flow levels. It is clear that rod catch proportions are higher on the chalk stream sample than on the rainfed rivers and in fact the chalkstreams only account for 4% of the national salmon rod catch, so are disproportionately represented in the pooled sample. In frequentist terms several of the exploitation rates in the panel of monitored rivers are only trend stationary, or AR1. {hence my question re recovering residuals from Bayesian model fitted estimates). One of their primary independent variables - fishing effort exhibits structural change which passed unrecognised in the official model. The official model fails to account for different rod catch proportion levels on the different monitored rivers which LSQ DV panel data model framework readily handles with robust errors to cope with heteroskedasticity. The official model simulations based on the Bayesian parameter mean estimates for 52 non monitored rivers overstates in consequence their rod catch proportions, and under-predicts the number of salmon entering non_counter rivers. In summary, the issue largely reduces to that relating to the constant in the estimated prediction equation, and appropriate rivers to include in the pool panel dataset. I have estimated the panel model with all 12, 10 (excluding 2 rivers where the counters have permanently ceased), and 6 rivers ( also excluding the 4 chalkstreams} ie a panel of rainfed rivers only. In that sequence of panel sizeof models estimated, model errors reduce and mean rod catch predicted percentages are lower than official model estimates.My dataset also drops the initial years time sample years of 1994-2001 used in the official model to avoid including the pre-structural break in the rod effort series in 2001. My interest now is to replicate my model specification with Bayesian estimates, but to be able to establish whether there is serial correlaion in its residuals, which naturally I suspect will be lurking in the mean coefficient fitted version of the official model. As I do chair a government special advisory panel for agricultural economics, you can see that I need to be clear in the issues of bayesian estimation before stirring the pot in the Ministry, which has oversight and funds the fisheries science agencies doing the modelling. A one size fits all model may be administratively convenient, but in my view is hardly appropriate. The government agency has published a paper about their model in a fisheries science journal which I intend to formally comment on. Much is unsaid in the article regarding post estimation diagnostic checking or panel selection. Tiny graphs are presented usually with log scale vertical axes to render modle fit and data points on the model difficult to see clearly. However I have photo enlarged the graphs, only to discover that depending on the river panel unit some 39 to 70 percent of their actual recorded datapoints fall outside the 95% (in)credible interval of the model fitted mean predicted values for each river. Cheers On Tue, 2 Apr 2024, 23:32 Luca Pedini, <luca_pedini(a)live.it> wrote: > Hi Brian, > > > > thanks for the message and thank you Sven for the quick reply. > Unfortunately I’ve been experiencing some issues with the mailing list > recently: I’m receiving the messages late or not at all! Sorry! > > > > As for your original questions, the function by default will generate > non-informative results (frequentist) alongside the informative ones (truly > Bayesian, obtained via the chosen prior): the idea behind is to have a > direct comparison between the two. In the resulting output bundle you can > access both of them: the sub-bundle non_info_case reports all basic > frequentist estimators; while for the informative results you can access > two different sub-bundles: i) sampled_coeff which contains all the sampled > parameters and from which you can recover quantiles or any sort of > function; and ii) post_summ, which simply reports posterior means and > standard deviations. > > > > In the sample script the informative prior is “not so informative”, for > this reason the results are quite similar to the frequentist ones. But if > you have to choose which posterior quantity to use between non-informative > and informative, always use the informative one. Moreover, if you have to > compute some functions on the posterior quantities, I suggest computing > them on the sampled parameters and not simply on the posterior mean. In > this way, you can derive a distribution and more exact conclusions. > > > > As for the residual questions, these are not stored: storing residuals for > each sampled parameter will drain quickly the computer memory, especially > with big dataset, for this reason, these are not saved. However, you can > recover it via the sampled coefficients. > > > > I hope this may help and if you have any other questions or doubts, do not > hesitate to ask! > > > > Best regards, > > > > Luca > > > > *Da: *Brian Revell <bjr.newmail(a)gmail.com> > *Data: *lunedì, 1 aprile 2024 23:18 > *A: *Gretl list <gretl-users(a)gretlml.univpm.it> > *Oggetto: *[Gretl-users] Re: BayTool Package > > Thanks Sven for the detailed explanstion.. I had eventually arrived at the > conclusion that the mean value of the density was the point estimate being > output albeit that the densities for each parameter were also an output > plot option. It would seem then that the residuals from the mean values are > not retrievable (yet) from running the GUI version, which incidentally has > now attached itself permanently as an option within univariate options. > > > > It would appear in many applied studies that the parameter means with > credible intervals are employed. In a forecasting or projection context > with Bayesian estimation of time series data, one might suppose that the > residuals would clearly still be important for post estimation diagnosis > regarding stationarity and serial correlation. > > > > Brian > > > > On Mon, 1 Apr 2024, 20:55 Sven Schreiber, <sven.schreiber(a)fu-berlin.de> > wrote: > > Am 16.03.2024 um 17:22 schrieb Brian Revell: > > Hi > > I have tried using the BayTool package with some success (in terms of > getting output), > > I have read (perhaps not well enough) the detailed paper by Luca Pedini. > > Luca has had some email problems lately, perhaps he didn't get this one. > > However, I am puzzled why when simply using the linear model with > conjugate prior, as a default I get two sets of posterior mean parameter > and se estimates -NI-post m and I post-m and NI-Pst se and I-post se. . > Looking at the graphs of the Gaussian Posterior density estimates, they > appear to more closely centred on the I-pos -mean. Although the > difference s between the N & O posterior estimates is small, what is the > basis for selecting one rather than the other? > > I believe that NI stands for non-informative and I for informative, which > typically refers to the prior used. So basically the NI results should be > purely data-driven, which usually means the max-likelihood or frequentist > quantities. This is just for comparison, a Bayesian would of course go for > the informative ones. (Provided she really believes in the prior, which for > some reason doesn't seem to happen all that often in reality, but that's a > different story.) > > > > Second question - how can I retrieve the fitted values and residuals from > theB-Took Package. Hopefully it resides somewhere to be retirieved post > estimation. Or is iit necessary to run the input data through the equation > to generate it onself -and if so, which set of posterior mean estimates > does one use? > > Luca may correct me eventually, but it seems those things are not directly > available. I guess this makes sense from a Bayesian perspective since it > gives you densities, not point estimates. For example, do you want to > consider the posterior mean, median or mode as your quasi-point estimate? > All imply different sets of residuals. In principle they could all be > stored of course, but it's not clear whether that's the most practical > thing to do. > > Having said that, it looks like for the coefficients (in the sub-bundle > post_summ) the means are stored, so in that context this arbitrary choice > was made, apparently. > > So for the CASE 1 in the sample script with covariates list X, I guess you > could try: > > lincomb(X, out.post_summ.post_betamean) > > to get the fitted values. (Untested.) > > > > PS -the BayTool package does not seem to attach permanently to the URL > "model" list of options. > > Sorry, I don't understand. What URL? > > cheers > > sven > > _______________________________________________ > Gretl-users mailing list -- gretl-users(a)gretlml.univpm.it > To unsubscribe send an email to gretl-users-leave(a)gretlml.univpm.it > Website: > https://gretlml.univpm.it/postorius/lists/gretl-users.gretlml.univpm.it/ > > _______________________________________________ > Gretl-users mailing list -- gretl-users(a)gretlml.univpm.it > To unsubscribe send an email to gretl-users-leave(a)gretlml.univpm.it > Website: > https://gretlml.univpm.it/postorius/lists/gretl-users.gretlml.univpm.it/ >

1 year, 7 months

1
0
0 / 0

Storing string-valued series in a database

by Artur T.

Hi all, The first time, I play around with the option to store a dataset as a gretl database. This works, as mentioned in the command reference, for cross-sectional and time-series data but not panel (ok so far). I tried to store a string-valued series. However, the string-values get lost when importing again, only showing the numerical value mapping a string to some discrete positive number. I could not find any details in the manual nor the help text whether this is actually supported. I did the following: <hansl> open australia.gdt series y = 1 stringify(y, defarray("foo")) </hansl> Then: - via the menu File -> Export data -> Store series as gretl DB-format (.bin). - Delete series y in the active dataset - Open the database by dragging the bin file from the file explorer into the gretl main window. - Import series y. Thanks in advance. Artur

1 year, 7 months

2
2
0 / 0

BayTool Package

by Brian Revell

Hi I have tried using the BayTool package with some success (in terms of getting output), I have read (perhaps not well enough) the detailed paper by Luca Pedini. However, I am puzzled why when simply using the linear model with conjugate prior, as a default I get two sets of posterior mean parameter and se estimates -NI-post m and I post-m and NI-Pst se and I-post se. . Looking at the graphs of the Gaussian Posterior density estimates, they appear to more closely centred on the I-pos -mean. Although the difference s between the N & O posterior estimates is small, what is the basis for selecting one rather than the other? Second question - how can I retrieve the fitted values and residuals from theB-Took Package. Hopefully it resides somewhere to be retirieved post estimation. Or is iit necessary to run the input data through the equation to generate it onself -and if so, which set of posterior mean estimates does one use? I expect these are straightforward issues for a Bayesian -but not for a frequentist. PS -the BayTool package does not seem to attach permanently to the URL "model" list of options. regards Brian J Revell Professor Emeritus (Agricultural Economics) Tel: home 01952 728153 Mobile 07976 538712 Address: Orchard Croft, Vineyard Rd, Homer, Much Wenlock TF!3 6NG Alt. Email bjrevell(a)harper-adams.ac.uk

1 year, 7 months

3
3
0 / 0

Bai-Perron Structural Breaks Model with GUI

by Brian Revell

I am attempting a simple task of identifying structural breaks in a single time series trend. There is a single dep var t=1994-2021, two variables in X viz. constant and time The following error message comes up The variable X is read-only *** error in function SB_GUI, line 7 > list X = !nelem(X) ? null : X Some guidance would be welcome . Brian J Revell

1 year, 7 months

4
8
0 / 0

$error = 1

by Marcin Błażejowski

Hi, two questions: 1. what $error=1 means? 2. and why 'errmsg(1)' returns empty string? Marcin -- Marcin Błażejowski

1 year, 7 months

3
2
0 / 0

← Newer
1
2
Older →

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

Gretl-users April 2024