On Fri, 22 Jan 2016, Sven Schreiber wrote:
Am 21.01.2016 um 16:10 schrieb Allin Cottrell:
> Your example shows that recursion is a _lot_ faster in julia; so now we
> want a case where recursion is actually needed.
>
One more thought on this: What about the "omit --auto" command? I guess
this could be viewed as something that is done recursively, like this in
pseudo code:
function <reduced-equation> omit_one_by_one(<estimated-equation>)
if min(<signif>) < threshold
eliminate(coeff_where_min(<signif>))
omit_one_by_one(<equ_reduced_by_one>)
else
return <estimated-equation>
endif
end function
If somebody is already proficient enough in Julia (or some other
JIT-compiled language), I think it would be interesting to compare the
speed to gretl's performance there.
Many things are such that they _can_ be done by recursion (in the
sense of a function calling itself), or they can be done by a
non-recursive iteration (as in gretl's "omit --auto"), or possibly
by a simple closed-form calculation (Fibonacci numbers).
I was suggesting that we might try to think of calculations relevant
to econometrics that are _best_ solved via recursion, given julia's
huge advantage in that area; I kinda doubt whether auto-omission
falls in that category, though if anyone cares to try that would be
nice.
Another thing to consider: julia is amazingly fast at "general
computation" (almost as fast as C) but once you start using packages
-- such as GLM for regression -- you pay a big cost in set-up time,
and the package code may not be anything like as efficient as the
built-in functions. Here's a trivial example, compounded of examples
from the julia GLM documentation:
<julia>
using GLM, RDatasets
form = dataset("datasets","Formaldehyde")
lm1 = fit(LinearModel, OptDen ~ Carb, form)
cycle = dataset("datasets", "LifeCycleSavings")
fm2 = fit(LinearModel, SR ~ Pop15 + Pop75 + DPI + DDPI, cycle)
</julia>
Running this on my i7 machine takes around 5.8 seconds (the "real"
value from the unix "time" program). Then here's the gretl
equivalent (after having used R to write out the two datasets as
.dta files):
<hansl>
open formaldehyde.dta -q
ols optden 0 carb
open lifecycle.dta -q
ols sr 0 pop15 pop75 dpi ddpi
</hansl>
Running time: 0.017 seconds, or 340 times faster.
We may suppose that there's a big fixed cost in the julia case, so
my next step was to wrap each estimation function/command in a
loop of 100000 replications (and eliminate the printing). That gave:
julia: 30.928s
gretl: 0.747s
OK, so now gretl is only 40 times as fast. What about a million
replications?
julia: 4m21.023s
gretl: 0m6.138s
Still 40 x faster, so it's by no means all to do with a fixed set-up
cost.
Once again, I don't doubt there _are_ computations we could
outsource to julia with advantage, but it seems clear that running
regressions via GLM is not one of them.
Allin