On Thu, 5 Mar 2009, Marcus Marktanner wrote:
First of all I would like to express my sincere gratitude for the
gretl
project, which I like extremely. I teach a hands-on course in
econometrics and have just switched from NCSS to gretl.
Glad you like gretl!
I would like to ask two questions though:
1. Is there a way to label observations in a scatter plot by,
for example, country codes in order to have a visual tool at
hand to identify outliers?
Yes. The first step is to add "case marker" strings that identify
the observations to the data set. You can do this by preparing a
plain text file that contains a string for each observation, one
per line. Then go to the "Data" menu and select "Add case
markers"; use the file dialog to select the file that contains the
markers; and they will be added.
Then when you have a scatterplot open, you can "brush" with the
mouse to show the string associated with any given point. There
are three other relevant actions, enabled by clicking the mouse on
the graph:
* If one or more markers are shown, the pop-up menu will contain
the items "Freeze data labels" and "Clear data labels". The first
of these ensures that when you copy the graph to the clipboard,
the labels will come across as you see them on screen. The second
item does what it says it does.
* If you click on the graph and choose "Edit", to bring up the
plot-editing dialog, you have the further option of showing or
un-showing all the data labels.
Note: Up till now this facility has been limited to datasets with
120 observations or less. I recently decided that this was too
restrictive, and I've raised the limit to 250 observations in
gretl CVS.
2. NCSS has a "Data screening" feature that identifies
outliers.
I know that gretl calculates the Mahalanobis distance, but I
have a problem to infer from its value on whether a particular
observation is an outlier. Is there a particular procedure for
the identification of outliers in gretl?
Well, this is tricky: what is an outlier? To my mind, this is
relative to a model. A point may be far from the means of the
coordinate variables, yet if it falls close to the regression line
or plane defined by the other variables it's not really an
outlier.
Your options in gretl are:
* Mahalanobis distances
* Prints, plots and numerical display of residuals from a given
model (in the latter, unusually large residuals are
flagged).
* The analysis of "influential observations" under the tests
menu in the model window.
Allin Cottrell