I cannot forebear to respond to Allin's comment.
Stata's collapse command - and the expand command (which replicates
observations) - is incredibly useful if you doing any large amount of
data processing involving the manipulation of cross-section or panel
datasets with mixed periodicity or very different sources of
data. Consider adding regional statistics to state data. I am using
Stata for a large study based on cross-country panel data from a
whole variety of sources. My Stata programs probably have 50 or more
uses of collapse in one context or another. It is much more
cumbersome to write such code using matrix commands, especially when
datasets get to the limits of storage capacity.
I suspect that the key is "horses for courses". No program can do
everything equally well. I don't think that Gretl can or should
attempt to be a Swiss penknife for data manipulation or for analysing
very large datasets. Stata is expensive unless you can use it via an
academic site license, but most kinds of data manipulation can be
managed in Excel (or Gnumeric if you want to stick with open source
software).
It is worth noting that collapse (like many Stata commands) is
implemented via an ado-file. My experience is that writing ado-files
is a horrible process because of the cumbersome way of dealing with
variables - the reasons for which I understand but still don't
like. I think the real lesson for Gretl is to promote the use and
sharing of script functions with a reasonable balance between
generality and ease of use.
Gordon Hughes
On Thu, 10 Sep 2009, Irwin, James R wrote:
> Hi. Wondering if anyone can point me toward how to get Gretl to
> do the equivalent of STATA's collapse command. For example, I
> have a data set with about 1,000 observations with YEAR X2 and
> X3 (where YEAR is an integer with values from 1760 to 1880).
> I want to get a data set that is the count and average of X2 by
> YEAR. In STATA I write
>
> collapse (count) num2=X2 (mean) avg2=X2, by(YEAR)
>
> and I get a data set with that is YEAR and counts and means of
> the variable X2 for each year.
>
> From what I've seen of Gretl it seems this should be a trivial
> exercise but I seem to be stumped.
> Thanks for your consideration. -- jim irwin (economic historian,
> trying to migrate from STATA).
Welcome to the gretl list, and I hope we can help you to migrate
without too much pain!
I have to admit that Stata's "collapse" command seems oddly
specific to me -- I mean, I wouldn't have thought that such an
apparently specialized operation would merit a command to itself.
But maybe that just shows lack of imagination on my part!
Anyway, yes, gretl can do this sort of thing but you have to roll
your own "collapse". My approach below is to create a matrix
containing the "collapsed" values, then substitute this matrix for
the current dataset.