On Fri, 6 Oct 2017, Allin Cottrell wrote:
On Fri, 6 Oct 2017, Allin Cottrell wrote:
> On Thu, 5 Oct 2017, Ignacio Diaz-Emparanza wrote:
>
>> I just found this rare boxplot in the attachment. The help says that for
>> each boxplot both whiskers should be of size 1.5 times the interquartile
>> range, but I can see clearly that some of the lower whiskers are bigger
>> than that ¿is it possible or is this a bug?
>
> The doc is not entirely accurate: the "whiskers" are supposed tp extend to
> a _maximum_ of 1.5 times the interquartile range (IQR), but never beyond
> the range of the actual data. So the upper and lower whiskers may be of
> different lengths.
>
> That said, it would seem that, if the upper whiskers are correct in
> Ignacios's plot then some of the lower whiskers are too long.
> I'll look into it.
Having looked at our code my second reaction is, Let's see the data. The
definitions of the quartiles and the IQR can be questionable if the sample
sizes are very small and/or the data are discrete. At this point I'm not
convinced there's anything wrong with what we're doing -- though admittedly
there might be.
No, sorry, third take: on looking more carefully at our code, we've
been trying to do something clever but getting it wrong on the low
side.
The idea is this: we extend the whiskers to a maximum of 1.5 * IQR
beyond Q1 and Q3, but we limit the extent of the whiskers to the
most extreme actual data-points that fall within those limits; and
any points further out we show as outliers. I think we were getting
this right above the median, but wrong below -- in such a way as to
extend the lower whisker incorrectly. If my diagnosis is correct
this should now be fixed in git, with snapshots to follow tomorrow.
The point remains that things _could_ get strange with very small
sample sizes and discrete data, but that may not be the case in
Ignacio's data.
Allin