Hi Sven

Thanks very much! Is it possible to revise gretl to get a menu drop down option of a 'by some other variable" when getting summary statistics? Or to add a drop down option of means by some categorical variable?

Thanks

Gene



On Thursday, May 23, 2024 at 05:07:58 AM EDT, Sven Schreiber <sven.schreiber@fu-berlin.de> wrote:


Am 23.05.2024 um 08:24 schrieb Artur T.:
> what you ask for is called an aggregation. Gretl has built-in a function called aggregate. ...
Thanks, Artur, but I think "g s" / Gene asked for a menu-based solution.
>
> Am 22.05.24 um 23:17 schrieb g s:
>> ...
>> I'm looking to add gretl, using the menu version.
Hi, thanks very much for your initiative there!

>
> I'd like to do a means table, means of a continuous variable for each category in a categorical variable. I haven't found how to do that yet.
>
> I have a csv version of the data set here
> https://drive.google.com/file/d/11WIET3s4eMHsB6JQ6hOQBx0SKquGRKKw/view?usp=sharing
>
> One variable is population, which is population by country. Another variable is region, that is, world region. All of the countries are included in some region. How do I get mean population by world region?


That's a very useful and concrete example. I'm attaching the dataset in gretl's gdt format (import went painlessly).

One menu-based thing that's closely related is a factorized boxplot: Go to View / Plot specified variables / Factorized Boxplot. Then select "population" to be plotted and "AreaRegion" as the discrete factor. You get ten different boxplots side by side. However, you cannot really read off the mean values.

Not quite menu-based but straightforward is to type the command "summary population --by=AreaRegion". You need to type that command in the gretl console or put it in a script window and execute the one-line script. While the "summary" command is also available via the menu (View / Basic statistics [or similar, I'm retranslating from German here]), I don't think you can activate the --by option in the GUI. I guess it would be good if this factorization possibility were added to the menu. (You already have the factorization option for scatter plots.)

For the sake of completeness let me also mention the contributed function package "summary_xy" by Yi-Nung Yang, although that is not fully maintained anymore. There you can specify two grouping categories. Other loosely related contributed packages by other people are "fdensity" (Factorized kernel density estimation), "PairPlot" (Scatterplot matrix with factor separation), "PandasPort" (already mentioned by Artur), and possibly others I'm not aware of.

Finally, it is possible in principle --but quite tedious I guess-- to do the following via the menus:

1. Right-click on the series AreaRegion, and in the context menu select "Dummify", and select "encode all values". You get ten new dummy variables in your dataset, on the pattern DAreaRegion_1 ... DAreaRegion_10.

2. Go to Sample / Restrict by condition. In the dialog window you have a clickable option "use dummy variable". There you select one of your created area dummies.

3. Select the population series, right-click and select "Basic statistics". There you have among others your mean value within the currently active region.

4. Go the Sample / Restore full range.

5. Repeat steps 2 through 4 for all the other nine regions.

I said it's tedious! But menu-only, but again, I guess it wouldn't hurt to add the factorized summary option.

Don't hesitate to ask about other stuff you might need for your comparison.

cheers

sven