continuations to optimize code

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

continuations to optimize code

Ulf Wiger-4

I've done some work optimizing the syntax highlighter in
CCviewer. It's basically a parser, but written by hand in plain
Erlang (a questionable decision, but it does have some nice
properties.)

I realised that a major rewrite was in order when I found that it
took 4.5 hours to process megaco_text_parser.erl -- 22,000 lines
of code, essentially only two functions.

The first rewrite made the parser reentrant, reading and scanning
the file in chunks, instead of all at once. This brought the time
down from 4.5 hours to ca 15 minutes -- joy!

Then I stumbled across another module, that happened to call
mnesia:transaction(Fun) with an inlined fun that consisted of 230
lines of code (!). This caused BEAM to bail out when it couldn't
allocate space for a 1 GB process heap -- no joy.

I came to the conclusion that the problem was that I was using a
construct like:

parse([{atom,Pos1,Name},{symbol,Pos2,'('}|Ts], ...) ->
   %% function call
   {Arity, Output, RestTs, ...} = function_arguments(Ts, ...),
   output([html_anchor(Name, Arity), "(", Output], ...),
   parse(RestTs, ...).

I.e. not tail-recursive.

I rewrote it in the following style:

parse([{atom,Pos1,Name},{symbol,Pos2,'('}|Ts], ...) ->
   %% function call
   Fun =
      fun(Arity, Output, RestTs, ...) ->
         output([html_anchor(Name, Arity), "(", Output], ...),
         parse(RestTs, ...)
      end,
   function_arguments(Ts, ..., Fun).

(The function function_arguments() then calls Fun(Arity, Output,
RestTs, ...) instead of returning a tuple.)

This brought the processing time of megaco_text_parser.erl down
again from 15 minutes to 100 ms -- ...!!!


Of course, it isn't exactly news that non-tail-recursive
functions are slower than tail-recursive ones, but the
difference in this case was rather extreme. I take it this is
because the data volume was large. The new code isn't quite as
readable, but it's still possible to follow.


After flogging myself for writing such slow code to begin with,
I then started thinking that the rewrite above could possibly be
automated, esp. using a profiling-based optimization tool like
described in http://www.erlang.se/euc/01/Lindgren2001.ps.

I remember that Thomas's tool did "higher-order function
removal". This would be rather the opposite: "higher-order
function exploitation", or something like that.
Basically, if a function is called non-recursively, and the cost
of the function call is high enough that applying a
continuation-passing style is insignificant by comparison, a
non-tail-recursive function could be made tail-recursive quite
easily.

Comments, anyone?

/Uffe
--
Ulf Wiger, Senior Specialist,
   / / /   Architecture & Design of Carrier-Class Software
  / / /    Strategic Product & System Management
 / / /     Ericsson Telecom AB, ATM Multiservice Networks



Reply | Threaded
Open this post in threaded view
|

continuations to optimize code

Thomas Lindgren-3

> This brought the processing time of megaco_text_parser.erl down
> again from 15 minutes to 100 ms -- ...!!!

Congratulations on an interesting performance hack!

Those are very interesting timing results, because your CPS-transformation
builds a closure instead of pushing a stack frame. As far as I
can see from the example, the savings then lie in avoiding the
tuple. If so, that's remarkable.

(A similar technique for returning multiple values is used in
SML compilers.)

And to those interested, yes, I'm actually working on doing
something along those lines in OM :-)

-- Thomas




Reply | Threaded
Open this post in threaded view
|

continuations to optimize code

Ulf Wiger-4
On Wed, 5 Dec 2001, Thomas Lindgren wrote:

>Congratulations on an interesting performance hack!
>
>Those are very interesting timing results, because your
>CPS-transformation builds a closure instead of pushing a stack
>frame. As far as I can see from the example, the savings then
>lie in avoiding the tuple. If so, that's remarkable.

Of course, there's the tiny little chance that I unknowingly
fixed some obscure bug in the process of rewriting, and that it
was that bug that accounted for the overhead...  (:

It would be interesting to see if anyone could verify (or refute)
my theory that CPS-transformation can actually yield significant
speedups (they don't have to be in the order of 9000x to be
significant.)

/Uffe
--
Ulf Wiger, Senior Specialist,
   / / /   Architecture & Design of Carrier-Class Software
  / / /    Strategic Product & System Management
 / / /     Ericsson Telecom AB, ATM Multiservice Networks



Reply | Threaded
Open this post in threaded view
|

continuations to optimize code

Björn Gustavsson-3
Was that a deep recursion?

Here is a theory that might partially explain your speedups.

In the recursive case, you presumably get a big stack. At every garbage collection,
all of the stack is scanned looking for live data. As the stack grows towards
the heap, you could also get more frequent garbage collections than in the
non-recursive case, and you would get fullsweep GCs (forcing copying of all
live data).

In the non-recursive case, the root set for the GC is probably small (since the
stack is presumably small). Each GC is presumably fast since the root set is small
and your continuations are already on the older generation heap. Perhaps no
fullsweeps are done, as your output function produces garbage, but only little or
no more live data. Therefore the continuation data will stay on the older generation
heap and not be copied very often.

Just a bunch of guesses, based on unverified assumptions.

/Bjorn

Ulf Wiger <etxuwig> writes:

> On Wed, 5 Dec 2001, Thomas Lindgren wrote:
>
> >Congratulations on an interesting performance hack!
> >
> >Those are very interesting timing results, because your
> >CPS-transformation builds a closure instead of pushing a stack
> >frame. As far as I can see from the example, the savings then
> >lie in avoiding the tuple. If so, that's remarkable.
>
> Of course, there's the tiny little chance that I unknowingly
> fixed some obscure bug in the process of rewriting, and that it
> was that bug that accounted for the overhead...  (:
>
> It would be interesting to see if anyone could verify (or refute)
> my theory that CPS-transformation can actually yield significant
> speedups (they don't have to be in the order of 9000x to be
> significant.)
>
> /Uffe
> --
> Ulf Wiger, Senior Specialist,
>    / / /   Architecture & Design of Carrier-Class Software
>   / / /    Strategic Product & System Management
>  / / /     Ericsson Telecom AB, ATM Multiservice Networks
>

--
Bj?rn Gustavsson            Ericsson Utvecklings AB
bjorn      ?T2/UAB/F/P
                            BOX 1505
+46 8 727 56 87    125 25 ?lvsj?


Reply | Threaded
Open this post in threaded view
|

continuations to optimize code

Ulf Wiger-4
On 5 Dec 2001, Bjorn Gustavsson wrote:

>Was that a deep recursion?

Yes, it could be.

Your theory sounds quite plausible, and offers an explanation to
why a closure could be so much cheaper than a stack frame in
this case.

BTW, as the parser generates HTML as it goes along, quite a lot
of garbage is produced.

/Uffe

>Here is a theory that might partially explain your speedups.
>
>In the recursive case, you presumably get a big stack. At every
>garbage collection, all of the stack is scanned looking for
>live data. As the stack grows towards the heap, you could also
>get more frequent garbage collections than in the non-recursive
>case, and you would get fullsweep GCs (forcing copying of all
>live data).
>
>In the non-recursive case, the root set for the GC is probably
>small (since the stack is presumably small). Each GC is
>presumably fast since the root set is small and your
>continuations are already on the older generation heap. Perhaps
>no fullsweeps are done, as your output function produces
>garbage, but only little or no more live data. Therefore the
>continuation data will stay on the older generation heap and
>not be copied very often.
>
>Just a bunch of guesses, based on unverified assumptions.
>
>/Bjorn

--
Ulf Wiger, Senior Specialist,
   / / /   Architecture & Design of Carrier-Class Software
  / / /    Strategic Product & System Management
 / / /     Ericsson Telecom AB, ATM Multiservice Networks