cl - A Combinatory Logic Interpreter

Bruce Ediger
bediger@stratigery.com
Last update: 2008-11-22

This document pertains to version 1.1 of cl. You can download version 1.6 of cl here, Version 1.6 is substantially similar to v1.1, but I wrote a document for v1.6 specifically.

Table of Contents

  1. Synopsis
  2. Command Line
  3. Introduction
  4. Building and Installing
  5. The Language Interpreted
  6. The Interpreter
  7. Bracket Abstraction Algorithms
  8. Modifying the Interpreter
  9. Performance
  10. Software Engineering
  11. Future Work
  12. Other Implementations
  13. Combinatory Logic Bibliography
  14. Technical References

Synopsis

cl (Combinatory Logic) interprets a simple programming language similar to Haskell Curry's Combinatory Logic formal system. It includes enough built-in combinators to use {S, K, I}, {B, C, K, W} or {B, M, T, K} as bases.

The programming language has no input or output facilities, depending solely on the user's ability to attach meaning to normal forms of Combinatory Logic expressions.

You could use cl to understand or experiment with Combinatory Logic. It may help you with Raymond Smullyan's book To Mock a Mockingbird.

Command Line

Flags

-B algChoose a default bracket abstraction algorithm, one of "curry", "turner", "tromp", "grz", "btmk"
-C Ntreat N (one of S, K, I, B, C, W, M, T) as a non-primitive combinator
-dprint debug output during graph reduction
-eelaborate output during graph reduction
-L filenameLoad and interpret a file named filename
-mon exit, print memory usage summary on stderr.
-N numberPerform no more than number reductions, then return to top level prompt
-pDo not prompt for user input. Default prompts.
-ssingle-step reductions
-T numberevaluate input expressions for number seconds, then return to top level prompt
-ttrace reductions

The -L flag allows a user to "pre-load" one or more files full of cl code before the interpreter starts its main interactive loop.

Using a -C flag with an argument of one of "S", "K", "I", "B", "C", "W", "M" or "T" turns off the use of the respective letter as a built-in primitive. If the letter in question appears in a context where it comprises an identifier, it gets recognized as a non-primitive "combinator", and doesn't do any reductions. You can use more than one -C flag on a given command line to turn off more than one combinator.

The -B flag allows you to choose which a default bracket abstraction algorithm. Ordinarily, the plain Curry-Feys algorithm gets used as a default, but using -B with "tromp", "turner", "grz" or "btmk" will use the selected algorithm in bracket abstractions where the algorithm doesn't appear explicitly.

Nearly all of the flags have a corresponding interpreter command that allows the users of cl to invoke that behavior from inside the interpreter.

Introduction

This document describes an interpreted computer programming language greatly similar to Combinatory Logic. It also describes the design and implementation of the interpreter.

This document ends up as a conflation of man page, README and INSTALL files, ex post facto design document, and bibliography. At least it's all in one place.

Motivation

The world needs another programming language like it needs a hole in the head. Why on earth did I choose to write this?

During late 2005 and the first half of 2006, I wrote an interpreter for a language close to untyped lambda calculus. I know many people have written lambda calculators, and made them available. Most of those implementations have crippling bugs, and I wanted the understanding of lambda calculus that I believed would come from writing an interpreter for it.

My lambda calculator ran fairly slowly. In looking around for ways to speed it up, I discovered graph reduction. A desire to understand graph reduction led me to write this interpreter.

I intend to get the lambda calculator in a form I don't feel ashamed to give away. At that point, I will make its source code publicly available. This interpreter will have to do for now.

Licensing

I chose to license cl under the GNU General Public License (GPL).

Building and Installing

If you want to use cl, you have to download the source code, compile an executable, and (possibly) copy it somewhere.

Download source code

Click on the following link to download source code ← That's v1.1 of the software.

Click on the following link to download source code ← That's v1.4 of the software. You should probably use v1.4, as it has some bug fixes, and some user interface improvements.

Compile and install

cl source code will probably only compile under Solaris, Linux and BSD operating systems. You may have some luck under other operating systems. cl will probably not build or work under any "Windows" variant.

You have to compile cl from its source. I do not provide any pre-compiled packages.

To do this you will need the following command line programs installed:

Directions

  1. Unpack the software archive: gunzip -c cl-1.4.tar.gz | tar xf -
  2. Change directory into the newly-created source code: cd cl-1.4
  3. Issue a make command. The makefile contains all of the following: In general make cc or make gnu should work on almost everything. If, under Solaris, you have Sun's C-compiler installed, make cc would probably work. You will need to install lcc and/or tcc to get those two targets to work.
  4. Run the unit tests: ./runtests (Tests 45 and 46 run 30 seconds each.)
  5. Copy the executable, cl, where ever you like. Or, you can use it in-place. Your choice.

The unit test inputs, in directory tests.in, actually contain some amusing examples, including Church numerals, Scott numerals, and several examples of Y combinators.

The Language Interpreted

The interpreted language has similarities to Combinatory Logic formal systems, at least the "non-illiative" part. This constitutes the part analogous to lambda calculus. It does not have quantifier operators like Ξ, or implication operators like P, so the interpreted language does not contain first order predicate calculus.

I wrote "systems" above, as not every author describes exactly the same thing. Usually, "Combinatory Logic" (or sometimes "Combinator Calculus") consists of a set of "combinators" (for example the set {S, K}) and an applicative structure where adjacent combinators get treated as either operators or operands. Individual combinators have some set number of arguments, and once that number of arguments appear, the combinator acts by deleting, duplicating or rearranging the arguments. See Michael J. O'Donnell's The SKI Combinator Calculus a universal formal system for an example of a formal definition, together with an informal definition.

Different authors choose different "bases", or sets of combinators to use. Modern authors tend to use {S, K} or {S, K, I} as a basis. In the past, {B, C, K, W}, {B, T, M, K} or even {I, J} got used. Authors also differ in allowing non-basis combinators to appear in expressions. Some disallow non-basis combinators on strictly formal grounds: Combinatory Logic does not need them. Some authors allow them even formally. Other authors disallow non-basis combinators formally, but use them informally as abbreviations or place holders.

Interactive Session

I intend people to use cl interactively. It does read from stdin and write to stdout, so you could use it as a "filter", but I didn't really design it to work that way. In the following example session, what the user types appears in a bold typeface.

The interpreter prints a CL> prompt when waiting for the user's input.

4:06PM 5 % ./cl
CL> S I I x
S I I x     Interpreter prints parsed expression
x x        Interpreter prints normal form of expression
CL> W I x
W I x
x x
CL> def myT (C I)
CL> myT a b
C I a b     Abbreviation expanded
b a
CL> Ctrl-D     End-of-file causes interpreter to exit
4:07PM 6 %

Note that each of the CL expressions the user enters gets reduced to normal form right away, and the interpreter prints that normal form. The user must assign "meaning" to an expression and its normal form. No input or output facilities exist in the language interpreted.

Syntax elements

The language interpreted consists of lines of ASCII-encoded textual input. Each line gets subdivided into tokens. Tokens consist of:

Input expressions can contain as many matched pairs of parentheses as the user deems necessary, with the condition that parentheses pairs have to contain at least two combinators. The parser gives syntax errors on expressions like (S).

Each input expression can contain pre-defined names. The interpreter inserts a pre-constructed parse tree for the abbreviation. This makes an input expression into a "context" as the denotational semantics people talk about.

The interpreter parses an input expression, usually a single line of text, delimited by a "return" or newline. Users can back-slash the newline and have a multi-line expression. Parsing the expression creates a (possibly enormous) binary tree internally. At the receipt of a newline, the interpreter destructively "reduces" the parse tree, using an algorithm referred to as "graph reduction".

Graph reduction changes the parse tree, eliminating, duplicating or re-associating pieces of the parse tree. When no more reductions can take place, the interpreter deems the parse tree in normal form, and prints a nominally human-readable version of it.

Some "interpreter commands" exist, like timer, timeout, etc. These commands, described in a later section, do not trigger a graph reduction, and do not require a Combinatory Logic input expression.

Identifiers

Lexer code recognizes identifiers according to this pattern: [a-zA-Z][a-zA-Z_0-9]*. Identifiers consist of a letter (upper or lower case) followed by any number of letters, digits or underscores. Lexically, identifiers look like C or C++ or Java variable names.

Semantically, the class of identifiers consists of primitive combinators (S, K, I, B, C, W), and non-primitive combinators. All combinators have left-associative application.

Some identifiers constitute built-ins, or primitives. They have special properties.

Primitives

The interpreted language has eight primitive combinators, special terms that do things other than just sit there. It has the usual S, K and I combinators. It has B, C, W, M and T combinators as well. Since this interpreter started off as a test bed for graph reduction, it made sense to include B and C combinators, so as to allow for Turner's bracket abstraction.

I put in W to allow users to experience Curry's favorite basis, {B, C K, W}

I put in M and T combinators after reading To Mock a Mockingbird. I wanted to have the ability to work with the {B, T M, K} basis.

The combinators rewrite terms according to the following rules:

Note that all the combinators require a particular number of arguments before they "activate". For example, K p q → p, but K p remains K p

Internally to the interpreter, upon finding a particular combinator, the parse tree (or "graph") gets modified. The S, W and M combinators don't copy the duplicated terms, but rather add a reference to the duplicated sub-tree and increase its reference count.

Bracket Abstraction

The language interpreted by cl can include "bracket abstractions", notations that cause the interpreter to create Combinatory Logic expressions that (given correct arguments) will eventually reduce to a specified expression. Refer to the section below for algorithm details.

Actual bracket abstractions have syntax like this:

CL> [x] (x x x)
S (S I I) I
S (S I I) I
CL>

The [x] notation means something like "calculate an expression free of variable x, using the default algorithm". cl calculates that combinatory logic expression, and uses it in the current reduction. Above, cl prints the expression free of variable x, reduces that expression, then prints its normal form.

You can nest bracket abstractions:

CL> [x] [y] x y
S (S (K S) (S (K K) I)) (K I)
S (S (K S) (S (K K) I)) (K I)
CL> 

You can store the expression resulting from a bracket abstraction:

CL> def D [x] x x
CL> D a
S I I a
a a
CL> 

In short, a bracket abstraction can appear anywhere a single combinator can appear.

You can use different abstraction algorithms. The following shows what the Grzegorcyzk algorithm gives as an expression for the S combinator's action:

CL> [p]grz [q]grz [r]grz p r (q r)
B (B W) (B B C)
B (B W) (B B C)
CL> 

cl has Curry-Fey, Tromp, Turner, Grzegorcyzk bracket abstraction algorithms. By default, you get the Curry-Fey algorithm if you don't specify which algorithm for a [x] abstraction operator to use. The -B <algorithm> command line flag allows a user to choose a default algorithm at interpreter start up.

Abbreviations

The cl interpreter does allow you to use abbreviations, single-identifier names that substitute previously-defined expressions for the abbreviation.

define

The keywords "define and "def" (for human convenience) allow a cl user to create symbolic names for parse trees. These names, when used at the top-level of the interpreter, cause substitution of the previously created parse tree for the name.

Typing in define omega (S I I)(S I I)
at the CL> prompt causes the interpreter to keep an unreduced parse tree. Using the identifier omega in another expression causes the interpreter to put a copy of the unreduced parse tree in the location where omega would appear in the parse tree that results from the other expression.

Substitution for a previously-defined abbreviation only happens during parsing. This means that a def can't change a previously defined abbreviation. For example:

CL> def X S m m r   user defines "X"
CL> def m (C K K)   user defines "m", which appears in previous def
CL> X
S m m r             interpreter prints what previously got defined
m r (m r)           interpreter prints normal form
CL> m x             user types new expression with "m" in it
C K K x             Interpreter prints expression, with "m" substituted for
x                   Intepreter prints normal form
CL> 

reduce

The reduce keyword causes the interpreter to perform a graph reduction on the parse tree of the expression immediately following it. reduce some expression returns a parse tree, so the keyword-and-expression can appear anywhere a combinator can appear. In particular, it can appear in a def or define line. define doesn't ordinarily cause a reduction, which constitutes a valid behavior for a user to want, but occasionally one wants an expression reduced to normal form before storing against an identifier.

This means that reduce constitutes a part of the interpreted language. You can put the reduce keyword in front of any combinatory logic expression:

CL>reduce S (reduce I) (reduce I) x
x x           note that the parsed expression already appears in normal form
x x
CL> 

I included reduce in the interpreter to allow a user to store the normal form of an expression:

CL> def twoX (reduce S I I x)
CL>    
CL> twoX
x x           what the interpreter stored - normal form
x x           normal form identical to what got stored
CL> 

Judicious use of reduce can actually pervert the reduction process, changing it from "normal order" to "applicative order".

You can use reduce inside a bracket abstraction's specified expression. The reduced expression gets bracket-abstracted.

Top Level Loop

cl has the usual sort of functional-language-interpreter read-eval-print loop. Identifiers in the input that lexically match previously defined "abbreviations" get replaced by copies of the parse tree stored against that "abbreviation". This makes the top-level loop into a "context", as the functional programming people say. The parse tree for the abbreviation gets substituted during parsing, not during evaluation, so abbreviations really only exist at the top level.

The top level loop performs this series of steps repeatedly:

  1. Read and parse an expression, substituting for abbreviations.
  2. Print the parsed and substituted-for expression.
  3. Reduce the expression to its normal form.
  4. Print the normal form of the expression.

You can interrupt (usually control-C on the keyboard) the top level loop to exit the interpreter, or you can give it an end-of-file (usually control-D). No explicit "exit the interpreter" command exists.

Due to the location of the magic error production in the yacc grammar, a syntax error can cause (what the human user perceives as a) single expression to get parsed as two expressions. Two reductions get performed, which can confuse.

Some expressions (like W W W or M M) do not have a normal form. The graph reduction algorithm will not terminate given those expressions as input. The timeout or count interpreter commands give the graph reduction algorithm a finite amount of time to work or distinct number of reductions to perform, interrupting it at some point. In the case of an interrupt, the interpreter will not print an expression's normal form. It will print whatever the graph reduction algorithm has created.

Interpreter Commands

step

The step interpreter command causes cl to "single step" through a graph reduction. Each "step" consists of a built-in primitive performing its action. cl does not consider examination of an application node of the parse tree as a "step".

trace, debug and elaborate keywords control what output occurs during single-stepping with trace producing the least-detailed output. Confusingly, cl will give no output during single-stepping unless you tell it how much output you want with one of these three keywords.

The step command causes the interpreter to print a prompt (continue?) before every reduction takes place. What the interpreter does depends on the user's input:

trace, debug, elaborate

trace has the most utility when you want to see how a particular expression gets reduced. debug has more utility when trying to figure out what went wrong with the graph reduction algorithm, and elaborate can help figure out reference counting problems. step along with trace or debug probably has the most use when trying to watch a large expression get reduced.

You can issue these three commands without an accompanying step command, and you will see the same output you would during single-stepping.

abstraction

The abstraction <algorithm> interpreter command allows a user to switch default bracket abstraction algorithms during the lifetime of an interpreter. "curry", "turner", "tromp", "grz" and "btmk" constitute all the valid values of algorithm

timer

After invoking this command, the interpreter will print an elapsed time for all graph reductions and bracket abstraction calculations.

timeout

Set a time limit on how long the reduction of an expression can proceed. Prevents the interpreter from running forever reducing an expression with no normal form.

count

Allow the interpreter to perform only a finite number of combinator reductions before terminating graph reduction. This can keep the interpreter from running forever on expressions with no normal form.

The count command requires a number as an argument:

CL> count 100
CL> M M
M M
Reduction limit
M M
CL> count 0

Setting the "maximum reduction count" to 0 (zero) turns counted reduction limiting off.

load

The -L command line flag causes the interpreter to read in and interpret some outside file full of cl input. More than one -L flag can appear: the files get read in in order of appearance on the command line. Syntax errors in the input do not cause the interpreter to exit, nor do they "roll back" what a file's input does: once a def or reduce gets interpreted, its effect lasts.

A user can access the same function via the load "filename" command during interaction with the interpreter. At any CL> prompt, a user can enter a command like:

CL> load "tests.in/test.048"

You must double-quote the file name. Non-absolute file names get opened relative to the interpreter's current working directory.

The Interpreter

General Design

Since actual parsing of input text makes up such a small part of the run-time of any given reduction, I used old-time compiler construction favorites lex (actually flex) and yacc (actually bison or byacc), to tokenize and parse respectively. Using flex or lex limits the interpreter to ASCII input.

The yacc productions structure the input code. Input code essentially constructs a binary tree representing the parsed input expression.

Flex (or lex) does the hard part of reading input bytes. Both lexer-generators create code that understands the difference between "human user input from a keyboard" and "input from a file", and buffer input appropriately.

Grammar

I chose the grammar to make the language as much like textbook Combinatory Logic as possible: combinators as single letters, whitespace separates tokens names. The grammar allows flexible parenthesizing. Since non-primitive combinators can appear, and have more than 1 character, whitespace or parentheses must separate tokens.

You can't redefine key words or use them as plain combinators. This constitutes the biggest wart in the "programming language", outside of its complete lack of I/O facilities.

Comments

Comments consist of a '#' (octothorpe) and any number of characters, to the end of the line on which the octothorpe appears. Lexer code recognizes comments, and treats them as whitespace. Nothing about comments appears in the Yacc productions below because of that.

Yacc productions

Some productions exist to handle errors. Some productions exist to make it easier to type things in.

  1. programstmnt
  2. programprogram stmnt
  3. programerror
  4. stmntexpression end-of-line
  5. stmnt → "def"|"define" IDENTIFIER expression end-of-line
  6. stmntinterpreter_command
  7. stmnt → end-of-line
  8. interpreter_command → "timer" end-of-line
  9. interpreter_command → "elaborate" end-of-line
  10. interpreter_command → "debug" end-of-line
  11. interpreter_command → "trace" end-of-line
  12. interpreter_command → "step" end-of-line
  13. interpreter_command → "load" "file name" end-of-line
  14. interpreter_command → "timeout" number end-of-line
  15. interpreter_command → "count" number end-of-line
  16. interpreter_command → "abstraction" algorithm name end-of-line
  17. expressionapplication
  18. expressionterm
  19. expression → "reduce" expression
  20. expressionbracket_abstraction abstraction_algorithm expression
  21. abstraction_algorithmalgorithm name
  22. abstraction_algorithm → ε
  23. applicationterm term
  24. applicationapplication term
  25. bracket_abstraction → '[' IDENTIFIER ']'
  26. termconstant
  27. termIDENTIFIER
  28. term → '(' expression ')'
  29. constantPRIMITIVE

Lexer code says that the usual single-letters constitute combinators (S, K, I, B, C,T, M, W), and that Yacc-generated code should consider them to have a PRIMITIVE token type. See production 29. Command line flags can "turn off" the treatment of those single letters as combinators, in case you want to ensure that you only use S and K combinators, or some such asceticism.

The grammar includes explicit "end-of-line" tokens so that the lexer can pass that token when it reads a real newline character, or when it finds an octothorpe-prefixed comment. The lexer consumes backslashed-real-newlines and does not pass an end-of-line to the parser. A semicolon character could also cause an end-of-line, which would allow multiple expressions per physical line of input.

Production 1 allows recognition of the first statement a user inputs, production 2 recognizes every other statement. Production 3 allows Yacc or Bison to write out code to "unwind" to a stmnt when a syntax error occurs. Production 7 allows empty lines to occur without giving a syntax error.

Production 5 allows you to save a parse tree to the environment, an implicit dictionary of named parse trees. The code in the action for Production 27 looks up all IDENTIFIER tokens in the environment. When the code finds an IDENTIFIER in the environment, a copy of the previously-saved parse tree gets used as a term non-terminal.

The code in the actions for productions 4 and 19 pass a parse tree to the graph reduction function. This occurs when the parser sends an end-of-line back to the Yacc-generated code. The keyword "reduce" (production 19) exists to trigger a graph reduction before the "def"/"define" keywords save the parse tree to the environment.

Bracket abstraction (production 20) actually gives back a parse tree, so a bracket-abstraction expression can appear anywhere a regular term can.

Productions 21 and 22 exist to allow the use of other than the default bracket abstraction algorithm during run time. Production 16 allows you to change the default algorithm during run time.

Productions 17, 18, 23, 24, 26 through 29 define an "applicative structure" where juxtaposition implies the application operation, and application associates to the left.

Single Stepping

Single-stepping, time out and reduction counting leave the graph reduction algorithm using a setjmp() call, and three siglongjmp() function calls. These calls share the use of a single jmp_buf structure.

Single-stepping gives the cl user the choice of bailing out of the current graph reduction at every step. If the user chooses to quit reduction, a longjmp() returns the user to the interpreter's top level loop.

Time outs on graph reduction happen via a call to the alarm() system call. When the time out expires, the operating system gives the process a SIGALARM, involving an "upcall" to the signal handler function. The signal handler calls longjmp() to return to the interpreter's top level loop.

This ties the graph reduction function to the Yacc-generated parsing code via signal handling.

Algorithms and Data Structures

Parse Tree/Graph

The yacc-generated parser ends up composing a binary tree of struct node elements.

struct node {
    int sn;
    enum nodeType typ;
    enum combinatorName cn;
    const char *name;
    struct node *left;
    struct node *right;
    struct node **updateable;
    int branch_marker;
    int examined;
    int refcnt;
};

struct node instances make up both the interior and the leaf nodes of a parse tree. When traversing the graph, the interpreter makes a distinction based on the value of the enum nodeType typ; element: The Yacc-types defined by the grammar, rather than the combinator types, prevent using different C structs for "leafs" and "interior" nodes.

enum nodeType { UNTYPED, APPLICATION, COMBINATOR };

APPLICATION instances should appear only as interior nodes of the parse tree, while COMBINATOR nodes should appear only as leaf nodes. UNTYPED exists to allow struct node instances on a free list to have a type that should not appear in a correct parse tree. Any appearance of an UNTYPED node in a parse tree constitutes an error.

In accordance with the usual nomenclature of Combinatory Logic, all terms, whether they have names of built-in combinators (S, K, I, B, C, T, M, W) or not have type COMBINATOR. Mathematical writings about Combinatory Logic seem split over the issue of including non-built-in-combinators in the official language describing Combinatory Logic. Some authors allow non-built-in-combinators, some limit the language to S and K only, some allow I as well. Practically, every author who deals with Combinatory Logic uses non-combinators (with names like 'P', 'Q', 'R', 'x', 'y', etc) to at least denote arbitrary Combinatory Logic terms. So, the enum combinatorName cn field of the struct gets used to determine if the combinator in question does a reduction or not.

The three fields struct node **updateable, int branch_marker and int examined all get used during graph reduction. See spine stack notes.

The field int refcnt comprises a typical reference count: a nominal "freeing" of one of these structs decreases the reference count. When the reference count goes to zero, the code puts the struct on a free list, for reuse later. This constitutes extremely explicit custom memory management.

The field int sn contains a "serial number", immutable through a particular reduction. It exists solely to aid debugging. Except for memory address of a struct, no other way exists to tell one I combinator from another, which can confuse the human doing debugging.

Dummy Root Nodes

The way that I and K combinators rewrite expressions drives the use of a "dummy root node".

Expressions like I a b have a parse tree like this:

I a b - unreduced

Arrows represent the left and right elements of struct node instances.

The reduction of I a to a just changes the left-hand pointer of the application node to the a leaf node. Reference counts change as well, ultimately looking like this:

I a b - reduced

If an node's reference count falls to zero, the reference counts of any left and right elements get decremented. A zero-reference-count node gets put on the struct node free list.

The left element of the struct node that comprises application node 1 in the image above changes. Instead of containing a pointer to the I a application node, it contains a pointer to the a combinator node.

For an expression like I a, a dummy root node has to exist to allow the code to not check every time for whether it has to change a pointer, or do something special for the (non-dummy) root node.

Expressions like K a b have the same problem as I a.

As I read Philip Koopman's work, I think he left extra I nodes around, so as to not have to deal with any irregularities like dummy nodes during reduction. The expression printing routine does have to deal with the dummies. I did not find dummy root nodes especially onerous.

Strings as Atoms

This interpreter uses the "Strings as Atoms" concept from David Hanson's book C Interfaces and Implementations.

The lexer code looks up all input strings in a hash table. The hash table lookup returns a pointer to a permanent (malloc'ed) copy of the string. Should a lexically-identical string already exist in the hash table, the hash table-insert returns a pointer to the existent string in the hash table. This is analogous to Java's String.intern().

No strcmp() calls appear anywhere, using the "==" arithmetic equality test works on two potentially lexically equal "Atoms". The code does not have to explicitly deallocate strings except in the hash table deallocation. The code does have to pay attention to the read-only nature of "Atoms".

The main benefit of this occurs in the bracket abstraction algorithms. Checking a given combinator for lexical equality with an abstracted variable only involves an arithmetic equality test.

Hash table

I used a C implementation of Per Ake Larson's dynamically expandable hash table. You can find a description of this hash table in: "Dynamic Hash Tables" by Per-Ake Larson, Communications of the ACM, April 1988, volume 31, number 4.

A single instance of this hash table holds both strings (as Atoms) and parse trees (which have pointers to struct node as roots) named by strings. Strings which never get used in a define appear in the hash table without an associated parse tree.

This structure constitutes a single-chaining hash table which has a multiple-of-two number of chains ("buckets"). A multiple-of-two makes it really easy to determine which chain to start looking in, but it has the disadvantage of doubling the number of buckets whenever the number of items per chain gets too high.

Hash function

I used Dan Bernstein's DJB2 hashing algorithm. The hash table uses ASCII-Nul-terminated C strings as keys, so DJB2 seems like a good choice.

Memory Management

Memory management comprises one of those areas of programming that causes consternation. I ended up with a three-level memory management scheme. On the first level, the interpreter uses good ol' malloc and free to obtain heap memory. On the second level, it uses a stack of "arenas", (at least) page-sized blocks of memory. After every graph reduction, the interpreter "resets" the arenas. Arenas give out smaller allocations of memory, but you never bother freeing them: you just reset the arenas. On the third level, the most-frequently-allocated C-language data structure, struct node has reference counts, and its own free-list based reuse. struct spine_stack, possibly expensive-to-allocate structures, have their own free-list based reuse.

Arenas

struct memory_arena {
    char *first_allocation;
    char *next_allocation;
    int sz;                     /* max free size */
    int remaining;              /* how many bytes remain in allocation */
    struct memory_arena *next;
    int usage_info;
    int sn;                     /* serial number: malloc order */
};

"Heap" memory gets allocated on demand into (multiples of) one-page sized "arenas" as per the David Hanson paper Fast Allocation and Deallocation of Memory Based on Object Lifetimes, rather than the implementation given in his later book C Interfaces and Implementations, chapter 6. I had read both, and didn't have either on hand when I wrote the arena code, so I ended up with a paper-like, rather than book-like implementation.

The list of arenas gets reset at the conclusion of every read-eval-print loop. Note the use of "reset" and not "deallocated". Once a cl interpreter allocates a page-sized block of heap memory, it retains a reference to it until the interpreter exits. The cl interpreter keeps a list of arenas, When the interpreter needs a new chunk of memory, it walks the list of arenas in order of their allocation, searching for an unused piece of arena. Should the search fail, the interpreter allocates a new arena using malloc().

Each arena keeps a pointer to the unused remainder of its allocation (next_allocation field), and two numbers: the size of the allocation (sz field) and the amount of the arena already given away (remaining field). A "search" involves comparing the amount of memory (in bytes) with what free space the allocation has. This implementation makes no attempt at "packing" chunks of memory based on requested size. It gives back requested memory first-come, first-served.

Free Lists

Instances of of struct node and struct spine_stack get allocated and discarded almost constantly during graph reduction. The pairs of functions that allocate and deallocate those two structs cooperate by using a free list.

During input parsing and graph reduction, the functions that give back "new" instances of struct node and struct spine_stack actually try to use a previously-allocated instance from a free list. Should that struct's free list not have any entries, the functions get an allocation from an arena (for struct node) or malloc() (for struct spine_stack). Should the search through all available arenas not turn up a suitable amount of previously unused memory, the interpreter allocates a new arena using malloc.

Deallocation functions for these data structures just push a "free" structure on a simple, linked-list based, FIFO stack. The function free_node(struct node *) decrements reference counts, pushing the "free" structure on the FIFO stack only when its reference count reaches zero. If the struct node instance in question has type of APPLICATION, its left and right child nodes get "freed".

Graph Reduction

Parsing input produces a binary tree, with all interior nodes representing function application, and all leaf nodes representing combinators, either built-in primitives or arbitrary non-primitives. To calculate a the normal form of an expression, the cl interpreter destructively re-writes the binary tree, a process known as "graph reduction".

I implemented graph reduction in a single non-recursive function, void reduce_graph(struct node *graph_root);. The keys to this function lie in the following data structure:

Spine Stack
struct spine_stack {
    struct node **stack;
    int top;
    int maxdepth;
    int size;
    int sn;
    struct spine_stack *prev;
};

The struct node **stack, top and size elements comprise a last-in, first-out "spine stack". Graphically and informally, the left-hand side of a parsed Combinatory Logic expression makes up the "spine" of the expression. The cl interpreter iteratively pushes application-type struct node on the "spine stack", following the left pointer of the struct node comprising an application-type node. The interpreter changes the graph when it has pushed all the application-type nodes on the stack and has reached a combinator-type node.

After performing whatever reduction the combinator-type node specifies, the graph reduction "pops" one or more nodes off the spine stack. The depth of the spine stack may disallow a given reduction, and also prevents "popping" past the top of the spine stack.

The top and size elements of struct spine_stack allow resizing of the memory pointed to by the stack element. This gets done in the pushnode() function, when adding a new element to the spine stack would cause an overflow. Function pushnode() allocates heap memory for stack using malloc() and realloc(), standard C library functions. pushnode() uses these library functions to guarantee contiguous blocks - stacks can get very deep.

The prev element performs two functions:

  1. A pointer to the previous stack in a stack-of-stacks, during graph reduction
  2. Free list. Since struct spine_stack instances get allocated using malloc(), the free list keeps them around for more than the reduction of a single expression.

The free list for struct spine_stack instances lasts the entire life of a cl process. This differs from the free list for struct node instances, which lasts for the reduction of a single expression. I suspect that a malloc() based allocation scheme has a greater run-time cost than the arena-based allocation scheme used for nodes in the parse tree. Keeping the spine stack free list around for an entire interpreter run amortizes this run-time cost over many graph reductions, just as the arenas' allocation time get amortized.

examined, branch_marker and updateable fields

The examined, branch_marker and updateable fields of struct node only make sense in the context of stack-of-spine-stacks. Consider this expression: S (I a). An instance of struct node with typ == APPLICATION has the address of the application node of the (I a) subexpression as the value of its right element.

Since the S combinator has only one argument (the (I a) subexpression), cl cannot perform graph reduction using S. It can reduce (I a). Pushing right on the spine stack won't work: the I reduction would end up updating the left element of the application node to point to a.

Enter the use of examined, branch_marker and updateable fields of struct node.

  1. Create a new spine stack
  2. Mark the left side of the application node as already examined, by setting a bit in examined.
  3. Set node->updateable = &node->right)
  4. Push the new spine stack on the stack-of-spine-stacks
  5. Proceed with graph reduction as usual.

Any reductions will change stack_top->updateable to point to the new sub-graph.

Setting node->updateable = &node->left) happens in a normal, left-node push onto the spine stack. Setting node->updateable = &node->right) happens when the graph reduction creates a new spine stack to traverse the tree pointed to by a right element of an application-type node.

Reduction of shared subexpressions

Having S, M and W combinators create shared subexpressions rather than copying them leads to the ability of cl to do a single reduction that appears in multiple subexpressions.

Tracing reduction of S I I (M I I) shows that the M reduces once appearing in 2 subexpressions, and the (I I I) reduces once also.

CL>  S I I (M I I)
S I I (M I I)
I (M I I) (I (M I I))     (M I I) subexpression shared, not duplicated by S-reduction
M I I (I (M I I)) 
I I I (I (I I I))         M-reduction shows up in both apparent subexpressions
I I (I (I I))             (I I I) reduces to (I I) in two places as well
I (I (I I)) 
I (I I) 
I I 
I 
I
CL> 

Bracket Abstraction Algorithms

Curry-Fey

The standard, obvious algorithm.

  1. [x] x → I
  2. [x] N → K N    x does not appear in N
  3. [x] M N → S ([x] M) ([x] N)

Turner

I believe that I have only implemented a subset of the real Turner algorithm. I found hints in the literature about a more elaborate Turner algorithm, but I have no access to back issues of the expensive Software: Practice & Experience to verify.

  1. [x] x → I
  2. [x] N x → N              x not appearing in N
  3. [x] N → K N              x not appearing in N
  4. [x] M N → C ([x]M) N       x appears only in M, not in N
  5. [x] M N → B M ([x] N)       x appears only in N, not in M
  6. [x] M N → S ([x]M) ([x]N)   x appears in both M and N

The existence of B and C combinators in this algorithm prompted me to include them as distinct primitives in cl

Tromp

John Tromp created a 9 rule algorithm which optimizes for the minimum number of S and K combinators in the abstracted expression. Tromp wants to get a small self-interpreter, and he confines himself to just S and K combinators. For example, the "S K M → S K" rule could change to "S K M → I", if you have an I primitive. Also, the "(M L)(N L) → S M N L" rule undoes a reduction that will get re-done at reduction-time anyway. This seems like a classic time versus space tradeoff in the small, to me.

  1. [x](S K M) → S K (For all M)
  2. [x] M → K M (x not appearing in M)
  3. [x] x → I
  4. [x] M x → M (x not appearing in M)
  5. [x] x M x → [x] (S S K x M)
  6. [x] (M (N L)) → [x] (S ([x] M) N L) (M, N combinators)
  7. [x] ((M N) L) → [x] (S M ([x] L) N) (M, L combinators)
  8. [x] ((M L) (N L)) → [x](S M N L) (M, N combinators)
  9. [x] M N → S ([x] M) ([x] N)

I don't entirely understand what phrases like M, N combinators means in rules 6, 7 and 8. I chose to interpret those phrases as "M, N represent single, built-in primitives". I also implemented rule 8 by doing a depth-first traversal of parts of parse trees to determine if the potential "L" portions of the rule really do equate. A faster method of doing that may exist.

Grzegorcyzk "g" algorithm

cl includes an implementation of M. Price and H. Simmon's reconstruction of Grzegorcyzk's "g" algorithm, using only B, C, K, W, I combinators.

  1. [x] x → I
  2. [x] Z → K Z          x not appearing in Z
  3. [x] Q x → Q           x not appearing in Q
  4. [x] Q P → B Q ([x] P)     x appears only in P, not in Q
  5. [x] Q P → C ([x]Q) P     x appears only in Q, not in P
  6. [x] Q P → W((B(C([x]Q)))([x]P))     x appears in both P and Q

If you consider W K as a replacement for I, you can get away with only using B, C, K and W.

BTMK algorithm

I put in an algorithm that only uses B, T, M and K combinators. I devised it, but presumably someone else has come up with this same algorithm in the past. The {B, T, M, K} basis is not unknown.

  1. [x] x → B (T M) K
  2. [x] Z → K Z          x not appearing in Z
  3. [x] Q x → Q           x not appearing in Q
  4. [x] Q P → B Q ([x] P)     x appears only in P, not in Q
  5. [x] Q P → B (T P) ([x] Q)     x appears only in Q, not in P
  6. [x] Q P → B (T (B (T [x]P) (B B [x]Q))) (B M (B B T))      x appears in both Q and P

Modifying the Interpreter

Adding a new combinator

Perhaps after reading To Mock a Mockingbird, you want to add the L combinator. Steps to follow:

  1. Add an element COMB_L to enum combinatorName, which appears in node.h
  2. Add a case COMB_L to new_combinator(), which appears in node.c
  3. Add code to switch in reduce_graph(), appearing in file graph.c
  4. Add code to lexer (file lex.l), so that an input string "L" gets recognized as a built-in. Don't forget to add a reference to an extern variable L_as_combinator.
  5. Add variable L_as_combinator to file grammar.y. Fix up command line parsing and usage function, too.
  6. Recompile.
  7. Test.

Of these steps, (2), adding code to the switch, will cause the most work. The L combinator reduces like this:

L x yx (y y)

Just like W, L can only cause a reduction if the spine stack has a depth greater than 3. In fact, the code for L should look similar to the code that exists for W. The code for the new COMB_L block will have to allocate application nodes, and de-allocate the L x y parse tree. The y node will end up with an incremented reference count, as the action of L duplicates it.

Adding a bracket abstraction algorithm

Steps to follow:

  1. Add a function or functions to file bracket_abstraction.c that implements the algorithm.
  2. Add an element to struct afmp[], a string naming the algorithm, for use in the interpreted language, and a pointer to the function that allows entry to the algorithm.
  3. Fix up the file lex.l to recognize the string naming the algorithm, and return TK_ALGORITHM_NAME when it recognizes the string.

Every new "algorithm name" becomes a keyword in the interpreted language, so have a care in selecting names.

How to derive a bracket abstraction algorithm

If you just find an expression for the action of S with your chosen basis and use it in the Curry-Fey algorithm, you end up with grossly inefficient bracket abstraction: the resulting expression will contain vastly more primitives than it needs to. You want to take advantage of the primitives in your chosen basis to do special things, like B and C in the Turner bracket abstraction algorithm.

Note that this section merely explains how to derive, it does not describe an algorithm. The "working backwards" part is not algorithmic.

  1. Accept that you will write a recursive algorithm.
  2. Decide what to do when the algorithm has to abstract x from x. This usually ends up as [x] x → I.
  3. Decide what to do when the algorithm has to abstract x from N, an expression that does not have x in it. This usually ends up as [x] N → K N.
  4. Decide what to do when the algorithm has to abstract x from (P Q), where x doesn't appear in P. Your chosen basis may have an efficiency advantage here.
  5. Decide what to do when the algorithm has to abstract x from (P Q), where x doesn't appear in Q. More efficiency may appear here.
  6. Decide what to do when the algorithm has to abstract x from (P Q), where P and Q might comprise applications themselves. This case constitutes the heart of the matter, and you may end up with cases where P contains x, but Q does not, and vice versa. Recursion usually shows up in this case, as [x]P and [x]Q show up in the right-hand-side of the transformation rule(s).

    You want to figure out some expression R so that R(([x]P) ([x]Q)) x ends up reducing to ([x]P) x (([x]Q) x). Note that the S combinator does this exact reduction: S P Q x → P x (Q X).

    The best way I've discovered to do this: start with the expression P x (Q x) and "work backwards" to some expression that contains P and Q, and, when applied to x reduces to P x (Q x).

Turner's algorithm and Tromp's algorithm show that a handling a few special cases may make a big difference. In Turner's case, he used B and C combinators to make more efficient the different sub-cases of term application.

Chapter 11, "Birds Galore" of To Mock a Mockingbird and section 2.2 of Combinatory Logic in Programming give an explanation of "working backwards". The mental sensation of working backwards bears kinship to solving ordinary differential equations, or deciding what form of integration to use.

As an example of "working backwards", here's my example of getting to Grzegorcyzk's bracket abstraction for applications.

1.P x (Q x) initial expression
2.C P (Q x) x Prefix with C and reverse its action.
3.(C P) (Q x) x Make explicit parenthesizing.
4.B (C P) Q x x Prefix with B and reverse its action.
5.(B (C P) Q) x x Add explicit parentheses.
6.W (B (C P) Q) x Prefix with W and reverse its action.

When you get to the expression W (B (C P) Q) x, you can get to the bracket abstraction rule (#6, above)by using [x]P for plain P, and [x]Q for plain Q. Note that merely putting in S expressed in terms of B, W, C, K would give you a substantially less efficient bracket abstraction, at least for applications where x appears in both terms.

You only add combinators as prefixes. You must explore parenthesizing alternatives in light of the goal. In the example above, I have the goal of moving both appearances of x in the initial expression to the end of the expression. Once that happens, figure out how to consolidate the occurances into a single x.

Given the limited number of options available at any one juncture, this might be a good place to try some genetic programming type of thing, or a tree-search.

Performance

All of the following testing performed on a 1350 MHz AMD Athlon based motherboard. 512 Mb of PC133 non-ECC memory, 2931852 Kb of swap space. I used Linux kernel 2.6.22.1. Compiler versions as follows.

Algorithm improvements

I used gcc -I. -g -O -Wall to compile 4 (CVS tagged) versions:

  1. Sep182006a - last version before beginning to strive for performance, node allocation from arena.
  2. IterativeReduce3 - working iterative graph reduction.
  3. NoncopyingIandK1 - dummy root node.
  4. ReferenceCounting1 - free list of struct node.
  5. ResizeableSpineStack1 - pushnode() as function, not macro.
  6. Version1_0 - first release.

I did tests based on adding Church Numerals, represented in Combinatory Logic. I did not use the usual "add" function, which cleverly does not require recursion. Rather, I wrote an addition function that recurses using a fixed-point combinator,

The calculations performed for the tests:

AAdd Church numeral for 63 to Church numeral for 0
BAdd Church numeral 0 to Church numeral for 63
CAdd Church numeral for 63 to Church numeral for 63
DAdd Church numeral for 126 to Church numeral for 0
EAdd Church numeral for 126 to Church numeral for 126



cl version timings
A B C D E
 1  0.070 0.000 2.503 11.213 > 1200
 2  0.157 0.002 4.556 22.167 > 130
 3  0.102 0.001 3.335 19.630 > 150
 4  0.055 0.000 0.057 0.392 0.383
 5  0.060 0.000 0.061 0.429 0.425
 6  0.065 0.001 0.065 0.455 0.447

The elapsed times don't constitute reliable statistical indicators: they represent only a single run. Nevertheless, the consistency exhibited does give me some confidence.

All elapsed times in seconds. I lost patience on test E with version 1, 2 and 3, and interrupted them from the keyboard.

Three things stand out:

  1. 1D vs 2D - a performance decrease occurs. This doesn't match my recollection at the time.
  2. 3E vs 4E - the free lists make a huge difference.
  3. 3E vs 6E - performance decrease must come from increased code size (added M, T combinators) in reduce_graph() maybe causing cache misses.

I'd have to say that the iterative graph reduction probably has a bigger constant than the recursive graph reduction, so any performance benefits probably don't show up until it does a huge graph reduction like test E. I found iterative graph reduction far easier to get correct than recursive graph reduction, so I'd take it even with some performance loss.

The free-lists seem like the biggest performance increase. Arena allocation actually has benefits in preventing memory leaks, even though it doesn't seem like a big performance booster. Since spine stacks and parse-tree nodes have different free-lists, and spine stacks get allocated with malloc(), I could do some more work here to decide which caused the biggest performance gain.

It appears that iterative graph reduction caused a performance decrease. I found it far easier to get the iterative graph reduction correct than to get the recursive graph reduction correct, so despite the performance hit, I'd take iterative graph reduction.

Compiler Comparison

I compiled the version tagged Version1_0 with 3 different compilers:

  1. tcc version 0.9.23
  2. lcc $Id: lcc.c,v 4.33 2001/06/28 22:19:58 drh Exp $
  3. gcc (GCC) 4.1.2, -g, no optimization
  4. gcc (GCC) 4.1.2, -g -O
  5. gcc (GCC) 4.1.2, -g -O2

The row for "gcc -O" should compare directly with row 6 in the "cl version timings" table above.

A B C D E
tcc 0.145 0.002 0.144 1.028 1.025
lcc 0.118 0.002 0.116 0.886 0.825
gcc 0.156 0.002 0.158 1.105 1.123
gcc -O 0.072 0.001 0.073 0.504 0.543
gcc -O2 0.063 0.001 0.063 0.436 0.431

Compilers do seem to produce different quality code, with respect to execution speed on this set of tests.

Pure Reduction Speed

The algorithm improvements section shows how cl got better/worse with changes in the underlying algorithm. Showing pure reduction speed has some merit when comparing the executables produced by various compilers. Since cl has a -N <number> option, I can time a certain number of reductions of an expression. The easiest way to time a large number of reductions uses an expression with no normal form, but also one that cycles through a number of non-normal forms.

Naively, expressions like S I I (S I I) might seem to work. They don't, as the second I combinator don't get reduced in pure normal-order evaluation:

S I I (S I I)
I (S I I) (I (S I I)) 
S I I (I (S I I)) 
I (I (S I I)) (I (I (S I I))) 
I (S I I) (I (I (S I I))) 
S I I (I (I (S I I))) 
I (I (I (S I I))) (I (I (I (S I I)))) 
I (I (S I I)) (I (I (I (S I I)))) 
I (S I I) (I (I (I (S I I)))) 
S I I (I (I (I (S I I)))) 
...

Not only does it not have a normal form, it increases in size during "reduction". Increasing in size means that cl spends time allocating new arenas for the increasing number of nodes in the expression.

I want a combinatory logic expression that uses all the primitives possible, and cycles through one or more intermediate expressions. The following fit that description to a lesser or greater degree:

Cycling expressionPeriodPattern
1.M M 1M
2.W W W 1W
3.W I (W I) 2WI
4.W T (W T) 2WT
5.B I M (B I M) 3BIM
6.W (W K) (W (W K)) 3WWK
7.W (C K K) (W (C K K)) 3 WCK
8.S T I (S T I) 4STI
9.S T (I I) (S T (I I)) 4STII
10.W (B (T M) K) (W (B (T M) K)) 4WBTK
11.B (T M) K (M (B (B (T M) K) M)) 5BBTKM
12.B (K (S K K) y) (K M z) (B (K (S K K) y) (K M z)) 6BKSKKM
13.W (B ((C (W K)) M) K) (W (B ((C (W K)) M) K)) 6WBCWKK
14.C (S (C C) (C C)) (C (S (C C) (C C))) (C (S (C C) (C C))) 6CSCCCC
15.B (K (S K K) y) (K (I M) z) (B (K (S K K) y) (K (I M) z)) 7BKSKKIM
16.C C (S (C C) (C C)) (C C) (C C (S (C C) (C C)) (C C)) (C C (S (C C) (C C)) (C C)) 9CCCCSCCCC
17.B W (W (B (B (C (W K))))) (B W (W (B (B (C (W K)))))) (C K K) 10 BWWBBCWKCK
18.B B C (C C) (C C) (C C (B (B W) (B B C) (C C) (C C)) (C C)) (C C (B (B W) (B B C) (C C) (C C)) (C C)) (C C (B (B W) (B B C) (C C) (C C)) (C C)) 14BBCCCCCCCCCBBW
19.W (B (C (C C) (C (C C) (C (C C) (C C)))) (C (C C) (C (C C) (C C)))) (C (C (C C) (C (C C) (C C))) (W (B (C (C C) (C (C C) (C (C C) (C C)))) (C (C C) (C (C C) (C C)))))) (C (C (C C) (C (C C) (C C))) (W (B (C (C C) (C (C C) (C (C C) (C C)))) (C (C C) (C (C C) (C C)))))) 30WBCCCCCCCCCCCCCCCCCCCCCCCCCCCC

Time in seconds to perform 500,000 reductions (expressions 1-9)
1 2 3 4 5 6 7 8 9
Period 1 1 2 2 3 3 3 3 4
tcc 0.275 0.271 0.242 0.304 0.285 0.279 0.306 0.323 0.323
lcc 0.218 0.211 0.194 0.251 0.231 0.232 0.261 0.267 0.243
gcc 0.289 0.286 0.260 0.328 0.320 0.296 0.330 0.341 0.311
gcc -O 0.136 0.120 0.118 0.152 0.138 0.131 0.146 0.154 0.141
gcc -O2 0.106 0.108 0.098 0.125 0.119 0.111 0.124 0.131 0.117


Time in seconds to perform 500,000 reductions (expressions 10-19)
10 11 12 13 14 15 16 17 18 19 Mean
Period 4 5 6 6 6 7 9 10 14 30
tcc 0.317 0.329 0.329 0.310 0.390 0.295 0.385 0.329 0.366 0.376 0.318
lcc 0.258 0.268 0.255 0.248 0.307 0.246 0.310 0.263 0.294 0.300 0.256
gcc 0.345 0.360 0.344 0.330 0.396 0.343 0.389 0.346 0.383 0.381 0.336
gcc -O 0.153 0.163 0.147 0.149 0.181 0.146 0.189 0.164 0.175 0.175 0.151
gcc -O2 0.132 0.134 0.129 0.124 0.156 0.120 0.153 0.137 0.164 0.147 0.128

It appears that for cl, un-optimizing gcc produces worse code than lcc or tcc do. Both lcc and tcc compile cl far faster than gcc with any set of options I found, so I tended to use them during development to shorten the edit-compile-test-think cycle. The gdb debugger does not work on executables produced by tcc or lcc, so I did have to revert to compiling with gcc on occasion.

A general trend of increasing time with period of the cycling expression exists, probably due to the increased proportion of pointer chases with the larger expressions that have longer periods.

The I combinator appears cheaper to reduce than T: cycling expressions 3 and 4 differ only in executing I and T. Reducing a T requires creating a new application node in the parse tree, while I only involves swinging a pointer. Comparing expressions 8 and 9 reinforces this: 9 reduces a higher proportion of I combinators than 8. cl does 500000 reductions of 9 faster than 8.

The difference between compilers seems more dramatic than I would have thought, the slowest cl doing about 1.5 million reductions per second, and the fastest about 3.9 million.

Software Engineering Aspects

Choice of Programming Language

I wrote the interpreter in ANSI C. Why use ANSI C in 2007, the post-object-oriented-paradigm era? First, C implementations of anything typically have at least decent, and usually good performance. Second, many quality implementations exist. Third, pointer arithmetic helps you keep your chops up. Fourth, I always have fun programming in C.

Finally, this interpreter does not consist of all that many files or lines of code. It does not fit the "programming in the large" paradigm. I don't want to deal with all the "object-oriented sit-ups" that using Java or even C++ would mandate. Essentially, I've traded-off any benefits that data hiding, modularization, inheritance and exception handling might confer for the absence of abstraction-induced complexity.

Code reuse

Despite my assertion above that this project does not constitute programming in the large, I did find several opportunities for code reuse. The hash table implementation, code for abbreviations, and the strings-as-atoms code all came virtually unchanged from an earlier interpreter I wrote. I find this particularly gratifying.

Autotools

I chose not to use GNU autotools. Even at the beginning of this project, I knew I wanted to use non-GNU tools, like yacc and byacc, and compilers like lcc and tcc. Once you put software into the "autotools" paradigm, you make the whole thing more portable, but only to systems with GNU tools and compilers. The code for cl does not do anything fancy, so using "autools" constitutes some overkill.

What CVS tells us

I gave statCVS a try, to see what information hides in version numbering and check ins.

statCVS output - not terrifically informative as far as bugs fixed, because I don't track that during development. I don't have a specification to count a bug against, after all. This remains interesting none the less. Note the "step function" appearance of lines-of-code graph. I only checked in a set of changes that pass all my unit tests. Also, I rarely get to work more than 15 minutes at a time on my personal projects.

SLOCCount

The following data generated using David A. Wheeler's 'SLOCCount'

ANSI C1770 (65.05%)
yacc447 (16.43%)
lex283 (10.40%)
perl140 (5.15%)
makefile57 (2.09%)
sh24 (0.88%)
Total Physical Source Lines of Code (SLOC)                = 2,721
Development Effort Estimate, Person-Years (Person-Months) = 0.57 (6.87)
 (Basic COCOMO model, Person-Months = 2.4 * (KSLOC**1.05))
Schedule Estimate, Years (Months)                         = 0.43 (5.20)
 (Basic COCOMO model, Months = 2.5 * (person-months**0.38))
Estimated Average Number of Developers (Effort/Schedule)  = 1.32
Total Estimated Cost to Develop                           = $ 77,287
 (average salary = $56,286/year, overhead = 2.40).

SLOCCount says that development took 6.87 mythical man-months over 5.2 calendar months. statCVS says that I took 17 calendar months.

Branch Coverage

I used lcov to generate coverage output. The script-driven testing executes 69% of the code.

After a few easy, interactively-driven tests, lcov shows 80% coverage. Only built-in testing code, triggered by a currently undiscovered bug, and malloc failure handling code remains uncovered.

Careful examination of the lcov output during development revealed a few blocks of dead code. It also revealed need for unit tests to trigger previously-un-unit-tested blocks, and an actual bug introduced by changing Yacc productions.

lcov, SLOCCount and statCVS differ on the size of the code:

lcovSLOCCountstatCVS
Lines 1452 2721 5125

This implies a comment ratio of (5125 - 2721)/5125 = 0.47 which isn't too shabby, except that it includes the wordy GPL copyright notice in every file, and test inputs and validated outputs.

Methodology

I developed this using no particular methodology or process. I didn't have a "design spec" or a schedule to meet, and I only had one person doing coding.

I don't particularly believe in the efficacy of process, but I do believe in empirically improving programming. I did these things:

Tools

Just for the sake of explicitly enumerating them, I present here a list of software tools I used. Also, I feel that "methodology" gets too much emphasis, which tools get too little.

For the most part, I did development of cl under Slackware linux, on an x86 CPU. I did test early versions on Solaris 9/SPARC, a big-endian system. I compiled and ran cl as both 32- and 64-bit executables, with identical results. I tested nearly-final versions under Red Hat Enterprise Linux 3.0 running on a little-endian 64-bit AMD Opteron platform. I seriously doubt that any endianess, alignment or pointer size assumptions exist in the code.

Parsing

I used flex/lex to recognize tokens, and yacc/byacc/bison to parse those tokens into grammatical forms. I suspect that one could write the textbook Combinatory Logic grammar as some kind of recursive descent parser, rather than using yacc/byacc/bison. This would nominally increase the speed of the interpreter, but it would probably qualify as misplaced effort.

Lexing and parsing takes little CPU and runs rapidly compared to graph reduction. Using yacc/byacc/bison allowed easy addition of "interpreter commands" (like timeout, load etc) as they occurred to me. A hand-written, recursive-descent parser would have cost work to add these in any fashion, ad hoc or not.

The tradeoff involves abstraction-induced complexity, again. Using yacc/bison give you the ability to easily change the grammar. It takes away the ability to easily emit error messages that relate to locations in the input. It also makes harder doing anything outside of lexing and parsing one file/standard input. The tradeoff worked favorably for me this time around.

Interpreter's memory management

I arrived at the interpreter's memory management design after discovering that simple malloc()/free() memory management cost a huge amount of run-time. This discovery led me to trying an arena or slab-style of allocation. Using arena allocation, I found I could easily implement explicit reference counted management of struct node instances, which further increased speed. Performance concerns pushed me to more elaborate memory management.

As a side effect, situations which would cause malloc()/free() style memory management to leak (time out, keyboard interrupts, etc) only cause problems with built-in checking of free list consistency.

In essence, I created a "CustoMalloc" type allocator by gradual evolution. The struct node and struct spine_stack free lists bear a direct analogy to CustoMalloc's "front end allocator". The arena bears a relationship to CustooMalloc's "back end allocator".

My experience and the measurements above seem to directly contradict Berger, Zorn and McKinley's Reconsidering Custom Memory Allocation findings. cl may indeed consume a lot of extra memory, given that a single struct node can trigger the allocation of an entire, memory-page-sized arena. cl retains arenas throughout its lifetime, as well.

Testing

I did script-driven regression testing after compiling new code. This helps prevent "regressions" from creeping in.

I used 3 different compilers, and 2 different implementations of "yacc". Not only do different compilers give different error messages, using different compilers should prevent compiler dependencies from slipping in. It should also promote standard conformance.

I ran the regression testing for builds with every compiler, ensuring that no compiler dependencies slip in.

I also ran valgrind in memory-leak-checking mode on all test cases used in the script-driven regression testing, and on interactive sessions as well. This uncovers memory leaks outside the arena allocation and free list use.

I used randomly-generated combinatory logic expressions to do "fuzz" testing of graph reduction and bracket-abstraction algorithms.

Built-in Testing

In general, I think it helps any program to have some built-in "sanity" testing, rather than relying solely on black-box unit tests. In programs that compose large trees of data structures, or have algorithms that recurse mercilessly, I regard built-in testing as a necessity.

Code that traverses parse trees for output checks for "single-ply" looping. Addresses of child nodes (named by left and right elements of struct node) should not numerically equate to the address of the parent application node. The traversal code prints an error message, and refuses to follow the looping pointers. This finds simple errors in reduction algorithms, but still fails on "loops" of more than 1 element in depth.

The code for allocation and deallocation of struct node and struct spine_stack counts total numbers of allocations and deallocations. After reducing an expression to normal form, the interpreter can check to see if the counts of allocations and deallocations match. This simple and cheap form of built-in-test rapidly uncovers problems with reference counting and free list implementations.

Allocation and deallocation counts may not match for some nominally "normal" occurances, like syntax errors, and timeout interruptions.

Allocation and deallocation functions for struct node alert for deallocation of structs with a zero reference count.

The arena-allocation code uses the first few bytes of the arena-allocation itself as the header, an instance of struct memory_arena. Instances of struct memory_arena have a "serial number" field. The implementation of arenas keeps instances of struct memory_arena in order of increasing serial number.

Freeing the arena's contents (which does not deallocate the arena) checks to see that the arena serial numbers stay in order. This alerts one to the presence of grosser forms of buffer overflows. The same function checks for one-element "loops" in the chain of allocations by checking that the serial numbers increment by one each time it follows a pointer.

Development time line

Date vs Lines-of-code

The above graph should give some sense of the development time line. It has a 'step" appearance as I have a habit of only checking in file sets that actually work, which means "pass the script-driven regression testing".

The long fallow period from March to December of 2007 represents the time I spent writing this document.

The Road Not Taken: Design Options

Yacc as the reduction engine

The built-in primitives have single-step reductions that look almost exactly like yacc productions. Why not just write 7 or so yacc productions that mimic reductions and be done with it, letting yacc generate efficient code to do reductions?

As attractive as this seems initially, I don't think you can do it without a great deal of Yacc (or Bison) jiggery-pokey. The first problem lies in the implicit "type" of a token given to Yacc. You can certainly have flex/lex give tokens of type "combinator" back to Yacc. A given combinator's reduction doesn't care what "type" its arguments have, but Yacc's reductions definitely do care what type lex gave to a token. So, you have to have multiple Yacc productions for each (logically simple) combinator reduction. For example, the I combinator's single reduction would end up as 8 Yacc productions:

COMB_I COMB_S: COMP_S;
COMB_I COMB_K: COMB_K;
COMB_I COMB_I: COMP_I;
COMB_I COMB_B: COMP_B;
COMB_I COMB_C: COMP_C;
COMB_I COMB_W: COMP_W;
COMB_I COMB_T: COMP_T;
COMB_I COMB_M: COMP_M;

The S combinator would end up with a huge number of Yacc productions (512). Handling parenthesized subexpressions would cause an even greater explosion of production.

Another problem involves getting Yacc to recognize normal forms. It may prove impossible to write Yacc productions to do this. I didn't get past the explosion in Yacc productions caused by the typing issue.

Abbreviation Alternative

Willem van der Poel's article, The Mechanization of Lambda Calculus contains a description of a combination lambda calculus and Combinatory Logic interpreter. van der Poel has substitution for abbreviations occurring at the time of reduction of the abbreviation, not during parsing, as cl does it.

Substitution at reduction time has the advantage of leaving abbreviations in the human-readable, printed form of the expression. It also seems to have the disadvantage of having "free variables" in the abbreviation itself, with all the attendant problems they cause. If an abbreviation contains an identifier unbound at the time of abbreviation, does binding that identifier later cause the reduction time substitution to use the later binding, or leave the identifier free?

Future Work

Other Implementations of Combinatory Logic Interpreters

Another implementation might serve you better. I found these other Combinatory Logic interpreters:

Combinatory Logic Bibliography

Unfortunately, the standard reference volume on Combinatory Logic probably went out of print in the early or mid 60s. A copy of Haskell Curry's Combinatory Logic fetches about $300 now, and it bears a 1958 copyright. Those without access to many a quaint and curious volume of forgotten lore have to make due with mostly on-line resources. Strangely, only two of Haskell Curry's original papers appear on-line, as far as I can tell.

Technical References

I used the following during development and testing. They represent some fundamentals of programming in the meta-language (C, flex, Yacc) rather than in Combinatory Logic itself.