LC to CL, Semantically

Compilers abandoned bracket abstraction long ago in favour of generating custom combinators for each particular program, known as supercombinators.

We’ll buck this trend, partly for notoriety, but also for novelty: a bracket abstraction algorithm by Oleg Kiselyov breathes new life into the old approach.

Rather than treating lambda terms as a syntax to be rewritten as combinators, Kiselyov defines the meaning of a lambda term using combinators. The formal semantics can be viewed as a bracket abstraction algorithm.

Much of our previous compiler remains the same. We have standard definitions:

pair x y f = f x y;
just x f g = g x;
foldr = @Y \r c n l -> l n (\h t -> c h(r c n t));
lsteq = @Y \r xs ys a b -> xs (ys a (\u u -> b)) (\x xt -> ys b (\y yt -> x(y @=) (r xt yt a b) b));

Parser combinators library:

pure x inp = just (pair x inp);
bind f m = m @K (\x -> x f);
ap x y = \inp -> bind (\a t -> bind (\b u -> pure (a b) u) (y t)) (x inp);
fmap f x = ap (pure f) x;
alt x y = \inp -> (x inp) (y inp) just;
liftaa f x y = ap (fmap f x) y;
many = @Y \r p -> alt (liftaa @: p (r p)) (pure @K);
some p = liftaa @: p (many p);
liftki = liftaa (@K @I);
liftk = liftaa @K;
sat f inp = inp @K (\h t -> f h (pure h t) @K);
char c = sat (\x -> x(c @=));

Scott encoding of a data structure we use to hold lambda terms:

-- data LC = R (String -> String) | V String | A LC LC | L String LC
lcr s   = \a b c d -> a s;
lcv v   = \a b c d -> b v;
lca x y = \a b c d -> c x y;
lcl x y = \a b c d -> d x y;

Parser:

com = liftki (char #-) (liftki (char #-) (liftki (many (sat (\c -> @C (c(#
@=))))) (char #
)));
sp = many (alt (char # ) (alt (char #
) com));
spc f = liftk f sp;
spch = @B spc char;
var = spc ( some (sat (\x -> (#z(x @L)) (x(#a @L)) (@K @I) )));
anyone = fmap (@:) (spc (sat (@K @K)));
pre = ap (alt (fmap (@K @I) (char #@)) (fmap (@B @B @:) (char ##))) anyone;
lam r = liftki (spch #\) (liftaa (@C (foldr lcl)) (some var) (liftki (char #-) (liftki (spch #>) r)));
atom r = alt (fmap lcv var) (alt (liftki (spch #() (liftk r (spch #)))) (alt (fmap lcr pre) (lam r)));
apps = @Y \f r -> alt (liftaa @T (atom r) (fmap (\vs v x -> vs (lca x v)) (f r))) (pure @I);
expr = @Y \r -> liftaa @T (atom r) (apps r);
def = liftaa pair var (liftaa (@C (foldr lcl)) (many var) (liftki (spch #=) expr));
program = liftki sp (some (liftk def (spch #;)));

Finally, something new; conversion to De Bruijn notation:

-- data DB = Ze | Su DB | Pass LC | La DB | App DB DB
ze   = \    a b c d e -> a;
su   = \x   a b c d e -> b x;
pass = \x   a b c d e -> c x;
la   = \x   a b c d e -> d x;
app  = \x y a b c d e -> e x y;

debruijn = @Y (\r n e -> e
  (\s -> pass (lcr s))
  (\v -> foldr (\h m -> lsteq h v ze (su m)) (pass (lcv v)) n)
  (\x y -> app (r n x) (r n y))
  (\s t -> la (r (@: s n) t))
  );

And Kiselyov’s bracket abstraction algorithm from Section 4 of the paper:

closed = \t a b c -> a t;
need   = \x a b c -> b x;
weak   = \x a b c -> c x;

lclo = \r d y -> y
  (\dd -> closed (lca d dd))
  (\e -> need (r (closed (lca (lcr (@:#B@K)) d)) e))
  (\e -> weak (r (closed d) e))
  ;

lnee = \r e y -> y
  (\d -> need (r (closed (lca (lcr (@:#R@K)) d)) e))
  (\ee -> need (r (r (closed (lcr (@:#S@K))) e) ee))
  (\ee -> need (r (r (closed (lcr (@:#C@K))) e) ee))
  ;

lwea = \r e y -> y
  (\d -> weak (r e (closed d)))
  (\ee -> need (r (r (closed (lcr (@:#B@K))) e) ee))
  (\ee -> weak (r e ee))
  ;

babsa = @Y (\r x y -> x
  (\d -> lclo r d y)
  (\e -> lnee r e y)
  (\e -> lwea r e y)
  );

babs = @Y (\r t -> t
  (need (closed (lcr (@:#I@K))))
  (@B weak r)
  closed
  (\t -> r t
    (\d -> closed (lca (lcr (@:#K@K)) d))
    @I
    (babsa (closed (lcr (@:#K@K)))))
  (\x y -> babsa (r x) (r y))
  );
nolam x = babs (debruijn @K x) @I @? @?;

That leaves the code generator and the main function tying everything together:

rank ds v = foldr (\d t -> lsteq v (d @K) (\n -> @B (@:#@) (@:n)) (@B t \n -> #0(#1 @-)(n @+))) @? ds # ;
shows f = @Y \r t -> t @I f (\x y -> @B (@B (@:#`) (r x)) (r y)) @?;
dump tab = foldr (\h t -> shows (rank tab) (nolam (h (@K @I))) (@:#;t)) @K tab;
main s = program s (@:#?@K) (@B dump (@T @K));

The grammar is identical, but the generated code is far smaller.


Ben Lynn blynn@cs.stanford.edu 💡