Hello All:

 

I apologize if this is not a proper use of the list, but you guys seem like the best resource.

 

I have a question more related to statistical help than use of gretl. I believe that I can work out the gretl commands but am unsure about the statistical use/terminology.

 

We are a manufacturing facility and are using PHP and MySQL to process operations through gretl (forecasting, linear regression, etc).

 

The data set for my current question is a series of job iterations and associated times, such as the following:

 

Part # 1234

Job #                     Time (hours)

----------------------------------

1                              2.0

2                              2.5

3                              1.8

4                              1.9

5                              6.7

6                              2.2

7                              5.0

8                              2.3

9                              1.9

10                           2.2

 

What we need is to remove the anomalies from the data set, as we are doing aggregation and the extraneous data points are throwing us off. They might be due to rework or training, etc, and we want to calculate an accurate average either without or minimizing these anomalies. For example, in the dataset above, the times for job #5 & job #7 should most likely be excluded, being over twice as much as the next highest time.

 

I have been using MySQL to calculate the standard deviation. We calculate against the last 100 job numbers, then pull the most recent 3 jobs within the standard deviation threshold and average them. However, this is only useful *most* of the time. Sometimes the MySQL standard deviation throws out good values that we need to keep, so I am looking for other options.

 

My question is whether there is a better solution that calculating a simple standard deviation, or how one might do so in gretl to filter the dataset and remove anomalies? If another statistical function or operation might be best, what would you suggest and how would we do so in gretl?

 

I understand this is a bit unorthodox, I am a developer with limited statistical experience, so I appreciate any help you can provide.

 

Thanks,

 

--

Ryan Dagey

Chief Technology Officer:

www.NeotericSystems

www.NeotericHovercraft.com

www.WorldHovercraft.org

www.DiscoverHover.org

www.hovercrafttraining.com

Email: ryan@neotericsystems.com

Ph: 812-234-1120:  800-285-3761

Fax: 877-640-8507

Mail: 1649 Tippecanoe Street, Terre Haute, IN USA 47807-2394