May 2014 - Gretl-devel - gretlml.univpm.it

by Sven Schreiber

Hello all panel-interested people, while using gretl for teaching with panel data (which I hadn't done much before) I noticed the following, let's say, interface nuisances compared to the usual luxury gretl offers for time series: 1: The sample and/or range in the main window (bottom) are given as pure index numbers, even if "panel data marker strings" (cf. user guide p.23) are defined. At least for the time dimension it would be useful to show the sample periods in a human-readable form (through the markers). Also, I noticed that the period numbers shown do not always coincide with the values of the "time" index variable, if subsampling is in effect. (Seen in the CEL.gdt dataset after applying the sample restriction year>1970 for example.) 1b: A slightly more general suggestion, also for non-panel data: The active sample restriction criterion could be shown next to the resulting active sample in the main window. (At least for simple restrictions, maybe not for complex, multiple ones.) 2: Menu Sample -> Set range: Only the group range can be chosen, not the periods. Actually, given the often arbitrary ordering of groups, this is really the less useful dimension to choose a contiguous range from. (I know I can use "set sample based on criterion" for periods, but that's not the point.) 3: About pshrink(): A version that returns a full panel series (with repeated values like pmean() etc.) could be useful -- practical example: in growth regressions one needs the initial value of output-per-worker as a regressor. Also maybe it should be called "pfirst()" or something instead. 4: Time-constant variables: I'm not sure how to create variables that only vary along the cross-section, like it is done with the built-in pmean() etc. functions. Or how to append them (like the user guide p.114 "adding a time series", but along the other panel dimension). 5: Constant in a fixed-effects regression: I don't understand what gretl reports as the global constant term in a fixed-effects model, and it doesn't seem to be defined in the guide. It's also confusing that gretl complains if one wants to discard the constant in the specification dialog (when fixed effects are selected). (But obviously gretl estimates the right thing as the comparison with explicit LSDV regression shows, just the constant is mysterious -- even if it's the average of the fixed effects it's not clear where the standard errors come from.) 6: Lags not showing in model spec dialog when sample is restricted to a single period: If I restrict the CEL.gdt data with year==1985, I cannot include any previously created lags (of y for example) in the regression, because they don't show up in the variable selector. Because the subsampled dataset is now treated as "undated", there's also no "lags..." button in the dialog. -- Actually I don't understand why gretl "temporarily forgets" the panel structure of the dataset when a single period is active. It would seem less problematic to treat even a T=1 sample as a special case of panel data if the underlying dataset has a panel structure; especially in conjunction with point 1 above about showing the selected periods in the sample. Ok, that was a long post, sorry, but still necessary I think. Cheers, Sven

12 years

3
5
0 / 0

our handling of daily data

by Allin Cottrell

Sven has raised the question of the handling of daily data in gretl; see the threads starting from http://lists.wfu.edu/pipermail/gretl-users/2014-May/010037.html I'm glad of that: it's time we clarified what we do now, and what we should do in future. (But please note, I'm mostly talking here about 5-day financial-market data; other sorts of daily data might require different handling.) Sorry, this is long, but I'd encourage those who work with daily data to read on... First a minor point in relation to Sven's example: I think the Bundesbank is in fact unusual in including blank weekends in business-day data files. At least, that's not the practice of the Federal Reserve, the Bank of England, the Banque de France, the Banca d'Italia, the Sveriges Riksbank... (at which point I got tired of googling). Anyway, it's (now) easy enough to strip out weekends, which leaves the more interesting question of how to deal with holidays. I think it's fair to say: (a) most econometricians who wish to apply time-series methods to daily financial market data will, most of the time, want to ignore holidays as well as weekends, treating the data as if these days did not exist and the actual trading days formed a continuous series, but (b) for some purposes it may be important to be able to recover information on (e.g.) which days were Mondays or which days followed holidays. How are these needs best supported by econometric software? I can see two possibilities: (1) The storage for 5-day data includes rows for all Mondays to Fridays (or even all days as per the Bundesbank) -- hence satisfying point (b) automatically -- and the software provides a mechanism for skipping non-trading days on demand when estimating models. (2) The data storage includes only actual trading days -- hence satisfying point (a) automatically -- but with a record of their calendar dates, and the software provides means of retrieving the information under point (b) on demand. Currently gretl includes a half-hearted gesture towards approach (1) but de facto mostly relies on approach (2). Let me explain. When we first introduced support for daily data I initially assumed that we'd want to store 5-day data including rows for all relevant days, with NAs for holidays. So in view of point (a) above I put in place a mechanism for skipping NAs in daily data when doing OLS. But this never got properly documented, and it was never extended to other estimators. What happened? Well, as we started adding examples of daily data to the gretl package it became apparent that approach (2) is quite common in practice. See for example the "djclose" dataset from Stock and Watson and the Bollerlev-Ghysels exchange-rate returns series (b-g.gdt). Both of these have non-trading days squeezed out of them; let's call this "compressed" daily data. The Bollerlev-Ghysels dataset is not the best example, as the authors did not record the actual dates of the included observations, only the starting and ending dates. But djclose will serve as a test case: although it excludes non-trading days the date of each observation is recorded in its "marker" string and it's straightforward to retrieve all the information one might want via gretl's calendrical functions, as illustrated below. <hansl> /* analysis of compressed 5-day data */ open djclose.gdt # get day of week and "epoch day" number series wd = weekday($obsmajor, $obsminor, $obsmicro) series ed = epochday($obsmajor, $obsminor, $obsmicro) # maybe we want a dummy for Mondays? series monday = wd == 1 # find the "delta days" between observations series delta = diff(ed) # the "standard" delta days in absence of holidays: # three for Mondays, otherwise one series std_delta = wd == 1 ? 3 : 1 # create a dummy for days following holidays series posthol = delta > std_delta # take a look... print wd monday delta posthol --byobs </hansl> Here's a proposal for regularizing our handling of daily data. In brief, it's this: scrap our gesture towards what I called approach (1) above, and beef up our support for approach (2). Why get rid of the mechanism for automatically skipping NAs in daily data for OLS? Because it's anomalous that it only works for OLS, it would be a lot of work to provide this mechanism for all estimators, and anyway it probably should not be automatic: ignoring NAs when they're present in the dataset should require some user intervention. By beefing up approach (2) I mean providing easy means of converting between "uncompressed" and "compressed" daily data. We already support both variants, but (a) given an uncompressed daily sequence it should be easy for the user to squeeze out NAs if she thinks that's appropriate for estimation purposes, and (b) it might be useful in some contexts to be able to reconstitute the full calendar sequence from a compressed dataset such as djclose. Such conversion is possible via "low-level" hansl, but not convenient. I've therefore added the following two things in CVS/snapshots: (1) If you apply an "smpl" restriction to a daily dataset, we try to reconstitute a useable daily time-series. If it has gaps, we record the specific dates of the included observations. At present this is subject to two conditions, which are open to discussion. (i) Define the "delta" of a given daily observation as the epoch day (= 1 for the first of January in 1 AD) of that observation minus the epoch day of the previous one. So, for example, in the case of complete 7-day data the delta will always be 1. With complete 5-day data the delta will be 3 for Mondays and 1 for Tuesdays through Fridays. The first condition on converting from "full" data to something like djclose.gdt (dated daily data with gaps) is that the maximum daily delta is less than 10. (ii) The "smpl" restriction in question may involve throwing away "empty" weekends; this will lose about 2/7 of the observations and preserve about 5/7. Allowing for this, we then require that the number of observations in the sub-sample is at least 90 percent of the maximum possible. Or in other words we're allowing up to 10 percent loss of observations due to holidays. That's generous -- perhaps too generous? (The point of these restrictions is to avoid "pretending" that a seriously gappy daily sequence -- much gappier than could be accounted for by trading holidays -- can be treated as if it were a continuous time series for econometric purposes.) (2) Second thing added: a new trope for the "dataset" command, namely dataset pad-daily <days-in-week> This will pad out a dataset such as djclose, adding in NAs for holidays and (if the days-in-week parameter is 7) for weekends too. I'm not sure if this second thing is worth keeping and documenting, but for now it permits a test of the whole apparatus by round-tripping. Here's an example, supposing we're starting from data on a complete 7-day calendar, but with empty weekends and all-NA rows for holidays (as in Sven's Bundesbank data): <hansl> open <seven-day-data> outfile orig.txt --write print --byobs outfile --close smpl --no-missing --permanent outfile compressed.txt --write print --byobs outfile --close dataset pad-daily 7 outfile reconstructed.txt --write print --byobs outfile --close string diffstr = $(diff orig.txt reconstructed.txt) printf "diffstr = '%s'\n", diffstr </hansl> So if the round trip is successful, diffstr should be empty. Ah, but with Sven's data it's not quite empty. What's the problem? It's with the logic of --no-missing, which excludes all rows on which there's at least one NA. What we really want, to skip holidays, is to exclude all and only those rows on which all of our daily variables are NA. That's feasible via raw hansl, but not so convenient. So one more modification to "smpl" in CVS: add an option --no-all-missing (the name may be debatable). Substitute --no-all-missing for --no-missing in the script above and the difference between orig.txt and reconstructed.txt really is null. If you don't have a handy Bundesbank-style data file (though it's not hard to fake one), here's another round-trip test, in the other direction: we pad out djclose then shrink it again. <hansl> open djclose.gdt -q outfile orig.txt --write print --byobs outfile --close dataset pad-daily 5 outfile padded.txt --write print --byobs outfile --close smpl --no-all-missing --permanent outfile reconstructed.txt --write print --byobs outfile --close string diffstr = $(diff orig.txt reconstructed.txt) printf "diffstr = '%s'\n", diffstr </hansl> The use of the --permanent option in the round-trip scripts is just to ensure that all vestiges of the original data are destroyed before the reconstruction takes place. In "normal usage" one could just do <hansl fragment="true"> open <seven-day-data> smpl --no-all-missing </hansl> then carry out econometric analysis without tripping over NAs. Allin

12 years, 1 month

3
7
0 / 0

long and gappy time series

by Sven Schreiber

Hello to all the data wizards out there, today I hit the limit in the GUI that the earliest year can be set to 1500. But I was looking at the really historic time series from here: http://www.ggdc.net/maddison/maddison-project/orihome.htm, which actually starts at 1 A.D. It worked ok via script, but I think it also should work via the dialog window. Now let's see if I manage to load that gappy data into the workfile... no, there are problems, and I think some of them are bugs. (This is 1.9.90 on Win7.) When I start with an empty annual dataset from 1 to 2100 and try to append the Maddison data from an Excel worksheet (where I have named the year column with "date"), the rows/years are not properly matched against the inner years ("inner" in the sense from 'join'). That's because of the (huge) gaps in the source file. Strangely, when I use "obs" instead of "date" then gretl says instead I must not use this as a variable name. I also have to rename many many variables in the xls file before gretl accepts them, and I think this is really not the optimal way to handle this because it's very time-consuming and dull; there should be some automagic "mangling" of the names by gretl, maybe accompanied by a warning message, or the whole mangling could be a user-configurable option. Then I tried to treat the whole thing as a (country) panel structure -- but I'm noticing (for the first time although it must have been there for ages) that when I choose "new dataset" from the menu, the dialog forces on me the detour to specify the overall number of obs (anybody got a calculator ready?) and then afterwards only can I impose the panel structure. Suggestion: why not have radio buttons with cross-section/time-series/panel in that dialog, and in the panel case let the user input numbers for both dimensions right away. (plus the periodicity for time series and panel as well) Another suggestion: why not allow the use of a time index variable for time series the same way that index variables are allowed for panels? I haven't succeeded so far with the import, the only solution I can think of right now are to add hundreds of empty rows to the source file to remove the gaps. Hm. cheers, sven

12 years, 1 month

2
5
0 / 0

gretl's "appdata" file

by Allin Cottrell

We recently got a request from Richard Hughes (his blog: https://blogs.gnome.org/hughsie/ ) to include an "appdata.xml" file with gretl; see http://sourceforge.net/p/gretl/feature-requests/83/ This file serves as a sort of shop window for a program in modern package managers. I've added a draft file in CVS, http://gretl.cvs.sourceforge.net/viewvc/gretl/gretl/gretl.appdata.xml?vie... but suggestions for enhancing it are welcome. The text that describes gretl can be about twice as long as it is right now and still be within the guidelines -- what are the most important/enticing things to say? Also a nicer screenshot would be good. If you'd like to pitch in on this, please read the guidelines first: http://people.freedesktop.org/~hughsient/appdata/ Allin

12 years, 1 month

2
2
0 / 0

weekdays (was Re: [Gretl-users] problems with daily data)

by Allin Cottrell

On Wed, 21 May 2014, Riccardo (Jack) Lucchetti wrote: [Re. Sven's wish to convert 7-day daily data, with nothing but NAs for Saturdays and Sundays, into 5-day data] > This should do what [Sven wanted]: not the most elegant approach, > but IMO quite clear and general. [...] > > <hansl> > nulldata 28 > setobs 7 2014-04-01 > x = normal() > print x -o > > /* > trash weekends > */ > > # first, construct a "weekend" dummy series > > scalar y1 = $obsmajor[1] > scalar m1 = $obsminor[1] > scalar d1 = $obsmicro[1] > scalar wd1 = weekday(y1, m1, d1) > series wd = time + wd1 - 1 > series we = (wd%7)==6 || (wd%7)==0 > > # clear periodicity > > setobs 1 1 > smpl we==0 --restrict [...and then it's pretty simple] </hansl> Nicely done! But I note that it's a bit of a struggle since (up till now) the weekday() function has only accepted scalar arguments. In today's CVS I've "upgraded" this via our usual overloading approach: you can now use series instead of scalars with weekday(). The relevant portion of the above could then read: <hansl fragment="true"> # construct a "weekend" dummy series series wday = weekday($obsmajor, $obsminor, $obsmicro) series weekend = wday == 6 || wday == 0 </hansl> As a general comment, I'd say it's pretty uncommon to have to do this sort of thing: almost all 5-day daily data does not include weekends stuffed with NAs. So while gretl should be able to deal with it, I don't think we have to go to great lengths to make it a unitary ("one stop shopping") operation. Allin

12 years, 1 month

3
7
0 / 0

genr (was Re: [Gretl-users] problems with daily data)

by Allin Cottrell

On Wed, 21 May 2014, Sven Schreiber wrote: > Am 21.05.2014 16:35, schrieb Ignacio Diaz-Emparanza: >> On 21/05/14 16:15, Riccardo (Jack) Lucchetti wrote: >>> On Wed, 21 May 2014, Sven Schreiber wrote: >>> >>>> Agreed (well, maybe deprecated and undocumented would be enough...); but >>>> there should still be a script way of creating seasonal/periodic >>>> dummies, and currently there is no alternative, or is there? >>> >>> <hansl> >>> tmp = time % $pd >>> list DUMS = dummify(tmp) >>> </hansl> >> >> I prefer that the number of each dummy corresponds with the observation: >> >> tmp = (time-1)%$pd + 1 >> list DUMS = dummify(tmp) >> > > Aren't you assuming that the workfile/sample actually starts with the > "right" obs here? > > Anyway, thanks for all your suggestions, but what I really meant was a > function (or command) that mirrors the menu entry like 'genr dummy' > does, not some clever way to code it... At present we have 7 "specials" with the form "genr <name>", for <name> = dummy, timedum, unitdum, time, index, unit, weekday. I haven't checked rigorously but I don't think "genr weekday" is documented (though there is a documented weekday() function). The "genr unit" special is shadowed by the accessor $unit. We could do the same for "index" if that were thought worthwhile. As for the ones that add several variables, I don't think a $-accessor would work well. We _could_ have (e.g.) an accessor $dummy that adds a bunch of series and returns a list but that would look a bit weird if you didn't need to assign the list, having "$dummy" by itself on a line of hansl. Maybe a function named "dummies" (that returns a list) with a parameter to handle the panel cases (timedum, unitdum). Allin

12 years, 1 month

2
1
0 / 0

Sugestion: set termoption dash

by Logan Kelly

Hello, I have a suggestion for the GUI interface for time series plots. Could an option be added to include something like set termoption dash in the gnuplot script. It could be a check box in the main tab of the gretl plot control dialog. Plotting with dashed lines can be done easily in hansl, but I don't think doing dashed lines is easily accomplished in the GUI??? Thanks Logan

12 years, 1 month

3
3
0 / 0

ARMA Interpolation

by GOO Creations

Hi, I'm not sure if this is possible in gretl. I want to interpolate a gap of samples with ARMA by using the value to the left and right of the gap. If I have data like this: Time lag: 1 2 3 4 5 6 7 8 9 10 11 Values: 0 0.1 0.2 0.3 * * * 0.3 0.2 0.1 0 The values marked with a star are the gap I want to interpolate (at lag 5, 6 and 7: the values should be something like 0.4, 0.5, 0.4 after interpolation). How would you go about creating a gretl DATASET with these values? And will the get_forecast function work on "predicting" the interpolated samples? I've always used ARMA for out-of-sample forecasts, but never for interpolation, so I'm not sure if this is possible. Regards Chris

12 years, 2 months

2
3
0 / 0

new jsonget function

by Allin Cottrell

Re. http://lists.wfu.edu/pipermail/gretl-devel/2014-May/005070.html Hmm, this worked perfectly at the command line but something strange is happening when I try using it in the GUI program: it works fine on the first invocation then hangs on subsequent attempts. I'm investigating. Allin

12 years, 2 months

1
0
0 / 0

built-in curl() function

by Allin Cottrell

This is a follow-up to earlier discussions of functions to retrieve data from various servers, such as the BLS. There's now a built-in curl() function in CVS (which uses the libcurl API rather than relying on the presence of a curl executable). It's very similar to the hansl function of the same name that Jack circulated. It's documented in the Function reference but its details are not set in stone at this point, so if anyone has comments/suggestions, please fire away. Allin

12 years, 2 months

2
3
0 / 0

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

Gretl-devel May 2014