Re: [Gretl-users] Statistical assistance

Friday, 2 January 2015

I find it a bit difficult to understand your problem. You might explain
what you are trying to measure or test rather than outline what you are
doing.

You say that you "pull the most recent 3 jobs within the standard deviation
threshold". If your data are normally distributed and there are
no anomalous data this will only take in 68% of the data on average. If you
adjust your limits to two standard deviations you will pull in about 95% of
the non-anomalous data i.e. you will miss 1/20 of the valid observations.

I suspect that you might be better using a Gamma or related distribution to
make inferences. Without more knowledge of the process I could not be
certain that this would make a difference.

Sample sizes of 3 or 10 are very small to draw conclusions.

You might seek advice from a statistician or quality control expert if that
is the purpose of the exercise. In any discipline it is often better to
seek the advice of an expert.

Best Regards

John

John C Frain
3 Aranleigh Park
Rathfarnham
Dublin 14
Ireland
www.tcd.ie/Economics/staff/frainj/home.html
mailto:frainj@tcd.ie
mailto:frainj@gmail.com

On 2 January 2015 at 20:15, <dts(a)dagey.com&gt; wrote:

...
 Hello All:

 I apologize if this is not a proper use of the list, but you guys seem
 like the best resource.

 I have a question more related to statistical help than use of gretl. I
 believe that I can work out the gretl commands but am unsure about the
 statistical use/terminology.

 We are a manufacturing facility and are using PHP and MySQL to process
 operations through gretl (forecasting, linear regression, etc).

 The data set for my current question is a series of job iterations and
 associated times, such as the following:

 Part # 1234

 Job #                     Time (hours)

 ----------------------------------

 1                              2.0

 2                              2.5

 3                              1.8

 4                              1.9

 5                              6.7

 6                              2.2

 7                              5.0

 8                              2.3

 9                              1.9

 10                           2.2

 What we need is to remove the anomalies from the data set, as we are doing
 aggregation and the extraneous data points are throwing us off. They might
 be due to rework or training, etc, and we want to calculate an accurate
 average either without or minimizing these anomalies. For example, in the
 dataset above, the times for job #5 & job #7 should most likely be
 excluded, being over twice as much as the next highest time.

 I have been using MySQL to calculate the standard deviation. We calculate
 against the last 100 job numbers, then pull the most recent 3 jobs within
 the standard deviation threshold and average them. However, this is only
 useful **most** of the time. Sometimes the MySQL standard deviation
 throws out good values that we need to keep, so I am looking for other
 options.

 My question is whether there is a better solution that calculating a
 simple standard deviation, or how one might do so in gretl to filter the
 dataset and remove anomalies? If another statistical function or operation
 might be best, what would you suggest and how would we do so in gretl?

 I understand this is a bit unorthodox, I am a developer with limited
 statistical experience, so I appreciate any help you can provide.

 Thanks,

 --

 Ryan Dagey

 Chief Technology Officer:

 www.NeotericSystems

 www.NeotericHovercraft.com

 www.WorldHovercraft.org

 www.DiscoverHover.org

 www.hovercrafttraining.com

 Email: ryan(a)neotericsystems.com

 Ph: 812-234-1120:  800-285-3761

 Fax: 877-640-8507

 Mail: 1649 Tippecanoe Street, Terre Haute, IN USA 47807-2394

 _______________________________________________
 Gretl-users mailing list
 Gretl-users(a)lists.wfu.edu
 http://lists.wfu.edu/mailman/listinfo/gretl-users

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

Re: [Gretl-users] Statistical assistance