Am 23.05.2024 um 08:24 schrieb Artur T.:
what you ask for is called an aggregation. Gretl has built-in a
function called aggregate. ...
Thanks, Artur, but I think "g s" / Gene
asked for a menu-based solution.
Am 22.05.24 um 23:17 schrieb g s:
> ...
> I'm looking to add gretl, using the menu version.
Hi, thanks very much for
your initiative there!
>
> I'd like to do a means table, means of a continuous variable for each
> category in a categorical variable. I haven't found how to do that yet.
>
>
> I have a csv version of the data set here
>
https://drive.google.com/file/d/11WIET3s4eMHsB6JQ6hOQBx0SKquGRKKw/view?us...
>
>
> One variable is population, which is population by country. Another
> variable is region, that is, world region. All of the countries are
> included in some region. How do I get mean population by world region?
That's a very useful and concrete example. I'm attaching the dataset in
gretl's gdt format (import went painlessly).
One menu-based thing that's closely related is a factorized boxplot: Go
to View / Plot specified variables / Factorized Boxplot. Then select
"population" to be plotted and "AreaRegion" as the discrete factor.
You
get ten different boxplots side by side. However, you cannot really read
off the mean values.
Not quite menu-based but straightforward is to type the command "summary
population --by=AreaRegion". You need to type that command in the gretl
console or put it in a script window and execute the one-line script.
While the "summary" command is also available via the menu (View / Basic
statistics [or similar, I'm retranslating from German here]), I don't
think you can activate the --by option in the GUI. I guess it would be
good if this factorization possibility were added to the menu. (You
already have the factorization option for scatter plots.)
For the sake of completeness let me also mention the contributed
function package "summary_xy" by Yi-Nung Yang, although that is not
fully maintained anymore. There you can specify two grouping categories.
Other loosely related contributed packages by other people are
"fdensity" (Factorized kernel density estimation), "PairPlot"
(Scatterplot matrix with factor separation), "PandasPort" (already
mentioned by Artur), and possibly others I'm not aware of.
Finally, it is possible in principle --but quite tedious I guess-- to do
the following via the menus:
1. Right-click on the series AreaRegion, and in the context menu select
"Dummify", and select "encode all values". You get ten new dummy
variables in your dataset, on the pattern DAreaRegion_1 ... DAreaRegion_10.
2. Go to Sample / Restrict by condition. In the dialog window you have a
clickable option "use dummy variable". There you select one of your
created area dummies.
3. Select the population series, right-click and select "Basic
statistics". There you have among others your mean value within the
currently active region.
4. Go the Sample / Restore full range.
5. Repeat steps 2 through 4 for all the other nine regions.
I said it's tedious! But menu-only, but again, I guess it wouldn't hurt
to add the factorized summary option.
Don't hesitate to ask about other stuff you might need for your comparison.
cheers
sven