Re: [Gretl-users] Outliers and data labeling in scatter plots

Thursday, 5 March 2009

On Thu, 5 Mar 2009, Marcus Marktanner wrote:

...
 First of all I would like to express my sincere gratitude for the
gretl
 project, which I like extremely. I teach a hands-on course in
 econometrics and have just switched from NCSS to gretl. 
Glad you like gretl!

...
 I would like to ask two questions though:

 1. Is there a way to label observations in a scatter plot by,
 for example, country codes in order to have a visual tool at
 hand to identify outliers? 
Yes.  The first step is to add "case marker" strings that identify
the observations to the data set.  You can do this by preparing a
plain text file that contains a string for each observation, one
per line.  Then go to the "Data" menu and select "Add case
markers"; use the file dialog to select the file that contains the
markers; and they will be added.

Then when you have a scatterplot open, you can "brush" with the
mouse to show the string associated with any given point.  There
are three other relevant actions, enabled by clicking the mouse on
the graph:

* If one or more markers are shown, the pop-up menu will contain
the items "Freeze data labels" and "Clear data labels".  The first
of these ensures that when you copy the graph to the clipboard,
the labels will come across as you see them on screen.  The second
item does what it says it does.

* If you click on the graph and choose "Edit", to bring up the
plot-editing dialog, you have the further option of showing or
un-showing all the data labels.

Note: Up till now this facility has been limited to datasets with
120 observations or less.  I recently decided that this was too
restrictive, and I've raised the limit to 250 observations in
gretl CVS.

...
 2. NCSS has a "Data screening" feature that identifies
outliers.
 I know that gretl calculates the Mahalanobis distance, but I
 have a problem to infer from its value on whether a particular
 observation is an outlier.  Is there a particular procedure for
 the identification of outliers in gretl? 
Well, this is tricky: what is an outlier?  To my mind, this is
relative to a model.  A point may be far from the means of the
coordinate variables, yet if it falls close to the regression line
or plane defined by the other variables it's not really an
outlier.

Your options in gretl are:

* Mahalanobis distances
* Prints, plots and numerical display of residuals from a given
  model (in the latter, unusually large residuals are
  flagged).
* The analysis of "influential observations" under the tests
  menu in the model window.

Allin Cottrell

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

Re: [Gretl-users] Outliers and data labeling in scatter plots