[Gretl-users] Re: Gretl: means by categorical variable

Thursday, 23 May 2024

Am 23.05.2024 um 08:24 schrieb Artur T.:
...
 what you ask for is called an aggregation. Gretl has built-in a 
 function called aggregate. ... Thanks, Artur, but I think "g s" / Gene
asked for a menu-based solution.
...

 Am 22.05.24 um 23:17 schrieb g s:
> ...
> I'm looking to add gretl, using the menu version. Hi, thanks very much for
your initiative there!
...
>
> I'd like to do a means table, means of a continuous variable for each 
> category in a categorical variable. I haven't found how to do that yet.
>
>
> I have a csv version of the data set here
> https://drive.google.com/file/d/11WIET3s4eMHsB6JQ6hOQBx0SKquGRKKw/view?us... 
>
>
> One variable is population, which is population by country. Another 
> variable is region, that is, world region. All of the countries are 
> included in some region. How do I get mean population by world region? 
That's a very useful and concrete example. I'm attaching the dataset in 
gretl's gdt format (import went painlessly).

One menu-based thing that's closely related is a factorized boxplot: Go 
to View / Plot specified variables / Factorized Boxplot. Then select 
"population" to be plotted and "AreaRegion" as the discrete factor.
You 
get ten different boxplots side by side. However, you cannot really read 
off the mean values.

Not quite menu-based but straightforward is to type the command "summary 
population --by=AreaRegion". You need to type that command in the gretl 
console or put it in a script window and execute the one-line script. 
While the "summary" command is also available via the menu (View / Basic 
statistics [or similar, I'm retranslating from German here]), I don't 
think you can activate the --by option in the GUI. I guess it would be 
good if this factorization possibility were added to the menu. (You 
already have the factorization option for scatter plots.)

For the sake of completeness let me also mention the contributed 
function package "summary_xy" by Yi-Nung Yang, although that is not 
fully maintained anymore. There you can specify two grouping categories. 
Other loosely related contributed packages by other people are 
"fdensity" (Factorized kernel density estimation), "PairPlot" 
(Scatterplot matrix with factor separation), "PandasPort" (already 
mentioned by Artur), and possibly others I'm not aware of.

Finally, it is possible in principle --but quite tedious I guess-- to do 
the following via the menus:

1. Right-click on the series AreaRegion, and in the context menu select 
"Dummify", and select "encode all values". You get ten new dummy 
variables in your dataset, on the pattern DAreaRegion_1 ... DAreaRegion_10.

2. Go to Sample / Restrict by condition. In the dialog window you have a 
clickable option "use dummy variable". There you select one of your 
created area dummies.

3. Select the population series, right-click and select "Basic 
statistics". There you have among others your mean value within the 
currently active region.

4. Go the Sample / Restore full range.

5. Repeat steps 2 through 4 for all the other nine regions.

I said it's tedious! But menu-only, but again, I guess it wouldn't hurt 
to add the factorized summary option.

Don't hesitate to ask about other stuff you might need for your comparison.

cheers

sven

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

[Gretl-users] Re: Gretl: means by categorical variable