Am 06.01.2014 20:14, schrieb Allin Cottrell:
I've now done some testing for speed of "skip-padding" and actually the
improvement is not all that great. It turns out that with a big and
severely unbalanced panel it takes quite a while to count the padding
rows. I'm showing my results below. In all cases the simulated datasets
had 1200 variables and about 70 percent of the rows were filled with NAs
to represent panel padding; the cases differ by NT (the number of rows in
the balanced panel) and the compression level used.
Overall I'm seeing a slight gain in (compress + write) speed, and a more
substantial but still not dramatic gain in (read + decompress) speed.
While the size of the gdt file on disk is smaller with skip-padding, when
compression is enabled the difference is not as great as one might
imagine. Evidently zlib does a pretty good job of shrinking the padding
rows to next-to-nothing.
Let's see if I have paid enough atttention: The skip-padding takes time
and is currently (in cvs) enabled and cannot be switched off. Instead
the compression can be user-configured and basically substitutes the
skip-padding. Then it sounds like a good idea to undo the skip-padding
in cvs, no? Then I should be able to switch off compression to get the
fastest (but of course also biggest) result.
thanks,
sven