The conser programming language

Introduction

The conser programming language is a language in which the only data type is sets of pairs (of sets). It is possible to represent many things with sets of pairs--lists, numbers, not to mention sets. conser is a declarative, lazily evaluated, esoteric programming language. conser programs tend to require immense amounts of memory and processing time to perform very simple calculations.

Creating a pair is called consing, after the lisp function, cons. Since a program that uses any data (other than the empty set) needs to cons that data, it's a conser. Also, conser is sort of a combination of Conway and surreal, which is significant, since pairs of sets (of pairs) can represent surreal numbers.

Surreal numbers were invented by John H Conway, and named by Donald E Knuth, who wrote a novelette about two characters making discoveries about surreal numbers. A good introduction to surreal numbers on the web is at http://www.tondering.dk/claus/surreal.html.

A conser program consists of a set of function definitions, which may be mutually recursive. The execution begins with the evaluation of the main function, which must be defined to take zero or one argument. If main takes one argument, the argument contains the program's input. The result of evaluating main will be the program's output. Evaluation is done lazily.

Examples

The canonical example:

"hello: prints Hello world!"
main =
"01001000" ((,(((,(((
"01100101" ((,(,(((,((,
"01101100" ((,(,((,(,((
"01101100" ((,(,((,(,((
"01101111" ((,(,((,(,(,(,
"00100000" (((,(((((
"01110111" ((,(,(,((,(,(,
"01101111" ((,(,((,(,(,(,
"01110010" ((,(,(,(((,(
"01101100" ((,(,((,(,((
"01100100" ((,(,(((,((
"00100001" (((,(((((,
"00001101" (((((,(,((,
"00001010" (((((,((,(
(,)
,)),)),),),),)
),))),),),),)
),),),),)),),)
,),)),),))),)
,),))),))),)
,)),),)))),)
)))),))),)
))),)))),)
,),),),),)),),)
)))),))),)
,),))),))),)
,),))),))),)
),)),),))),)
,),),)),),)),).

An example with input:

"cat: copies stdin to stdout."
main a = a.

Implementation of basic definitions of surreal numbers:

"Evaluates to () for valid surreal numbers.  Doesn't evalute to () otherwise."
number? a = (gt>a<a number?<a number?>a).

"Evaluates to () if a > b, (,) otherwise."
gt a b = ((,)!gt b<a gt>b a).

"Addition."
+ a b = (+<a b + a<b,+>a b + a>b).

"Inverse under addition."
- a = (->a,-<a).

"Multiplication."
* a b = (+ + *<a b * a<b - *<a<b  + + *>a b * a>b - *>a>b,
         + + *<a b * a>b - *<a>b  + + *>a b * a<b - *>a<b).

Lexical description

Whitespace separates tokens. It is otherwise ignored.

Comments begin and end with double quotes. Comments cannot contain double quotes. Comments are treated as whitespace.

The characters =.<>(),&! are special tokens.

Unbroken strings of characters other than whitespace, double quotes, or any of the special token characters form identifier tokens.

Syntax

program       ::= definition { definition }
definition    ::= function-name parameters "=" function-body "."
function-name ::= identifier
parameters    ::= { identifier }
function-body ::= expression
expression    ::= "<" expression |
                  ">" expression |
                  "(" { expression } ")" |
                  "(" { expression } "," { expression } ")" |
                  "(" { expression } "&" { expression } ")" |
                  "(" { expression } "!" { expression } ")" |
                  identifier { expression }

Expressions

Predefined functions

<expr: (aka car) evaluates to the union of left sets of the pairs in expr.
>expr: (aka cdr) evaluates to the union of right sets of the pairs in expr.
(expr-list): (aka union) evaluates to the union of expr-list.
(expr-list1,expr-list2): (aka cons) evaluates to the single member set containing the pair with the left set being the union of expr-list1 and the right set being the union of expr-list2.
(expr-list1&expr-list2): (aka intersection) evaluates to the intersection of the union of expr-list1 and the union of expr-list2.
(expr-list1!expr-list2): (aka difference) evaluates to the set difference of the union of expr-list1 and the union of expr-list2.

<((a,b) (c,d)) => (a c)
>((a,b) (c,d)) => (b d)
(a b b) => (a b)
(a b & b) => b
(a b ! b) => (a!b)

Function parameters

Function parameters in expressions have the usual meaning.

identity a = a.

Functions

A function name followed as many expressions as number of parameters the function invokes that function. Since all the arguments are sets, the meaning of invoking a function is subject to these rules:

If any argument is (), the result is (), which is just a special case of the second rule.
```
f ()   => ()
g a () => ()
g () a => ()
```
When function arguments that are sets with more than one element, the result of the function is equivalent to the union of the result of invoking the function with each element of the sets as one-element sets.
```
f (a b)       => (f a  f b)
g (a b) c     => (g a c  g b c)
g a (b c)     => (g a b  g a c)
g (a b) (c d) => (g a c  g a d  g b c  g b d)
```

So, operationally, a function is only invoked with one-element sets as arguments.

Lazy evaluation

Expressions are essentially lazily evaluated. When evaluation is forced, function arguments are evaluated in parallel, so evaluation that later gets thrown away can be done. That means, for functions, if any argument evaluates to (), the function evaluates to () without having to complete the calculations for the rest of the arguments. Additionally, <, >, (&), and (!) can return results without completely evaluating their arguments.

Example:

"List of prime numbers."
primes = primes' (((,),),) n* (((,),),).
primes' a nonprimes = ?:(notin a nonprimes,((a,primes' (a,) (nonprimes n* a)),primes' (a,) nonprimes)).
n* a = n*' a + a a.
n*' a b = (b,n*' a + a b).
+ a b = ?:(<a,(b,+<a(b,))).  "[->+<]"
?: a = (?(<a,<>a)2<a>>a).
? a = (>a!2<a>a).
2 a b = b.
gt a b = ?:(<a,(b,gt<a<b)).
notin a l = ?:((a!<l),(a,?(gt a<l,in a>l))).

Execution

Input and output

The input, the contents of stdin, is the argument to the main function, and the result is the output, which will be written to stdout.

The input and output are nested pair structures suggested by Ben Rudiak-Gould. If the car of the pair is empty, then the first bit is 1, and the rest of the bits are in the cdr, otherwise, if the cdr if empty, the first bit is 0, and the rest of the bits are in the car. Since they are lazily evaluated, interactive programs are possible. Any implementation that buffers its output should flush its buffer before each time it reads input, but is not required to, since, for example, if the native output is in bytes, then buffered partial bytes need not be flushed.

If the result of main is an invalid structure, i.e. a pair with both the car and cdr nonempty, or with a car or cdr with more than one element, then what happens is undefined. The current implementation chooses whichever pair it happens upon first as defining the output. An alternative could have the output splitting into multiple output streams, one for each pair, though, due to lazy evaluation, there could be extraneous splitting if unresolved pairs later resolve to be identical.

Implementations

My original interpreter

Written in java. Input and output are surreal integers rather than nested pairs. Often generates incorrect output. Doesn't do lazy evaluation. Has a repl (read-evaluate-print-loop) mode. ~1150 lines of hackware. (,), (&), (!) often return incorrect results.

Ben Rudiak-Gould's interpreter

Written in Haskell. Doesn't support input. Does lazy evaluation. Set comparisons depend on the order of the elements, which could cause (!) and (&) to give wrong answers. Only ~250 lines of nice looking code.

My second interpreter

Written in ocaml. Input and output are nested pair structures. Does parallel lazy evaluation. Has a repl mode with numerous features. Extended to support infix syntax. Extended to allow specifying input files, the contents of which become the second, third, etc. arguments of main. Supports numerous useless optimizations and one useful optimization (available with the -O cond option). There are a number of things that it doesn't do quite correctly:

It's not completely lazy. It ``speculatively'' forces subexpressions, the results of which may be later thrown away. The main effect is that truly interactive programs can't really be made, since the speculative forcing of the input can cause evaluation to block on input before the input is actually needed. For example, when it runs the rot13.cons program, the output trails the input by about 4 characters. It may be possible to fix this by being very careful and forcing expressions involving input only after everything else is sufficiently resolved. However, it's not worth the trouble even to determine if it's possible.
Function calls and intersection don't incrementally yield results as early as they could. In particular, function calls don't yield results until all of its arguments have been resolved to be finite sets, and intersection doesn't yield results until either the left set or the right set has resolved to be a finite set. The upshot is that some expressions involving infinite sets that could terminate won't, such as (1 & (1 2 3 ... & 1 3 5 ...)). Fixing function calls to incrementally return results for potentially infinite sets would be too expensive in terms of memory and stack usage to be worth it. Fixing intersection to incrementally return results for potentially infinite sets is feasible, though.

Extensions

Infix syntax

Infix syntax makes some functions look syntactically nicer. However, they make parsing more complicated. They also provide opportunities for obfuscation. Infix functions are purely syntactic, and are semantically identical to regular functions.

To define an infix function, enclose its name with parentheses in its definition with its left arguments to its left, and its right arguments to its right. If there are no left arguments, it is the same as regular function. An infix function invocation is the same as a regular function invocation except that its left arguments are precede the function name.

"Surreal addition using infix notation."
a (+) b = (<a + b a + <b, >a + b a + >b).

Combining infix functions with other functions can cause confusion as to which takes precedence. The rule will be left associativity. If in doubt, avoid infix functions entirely. For example, if + takes one left argument and one right argument, a b + + c is equivalent to b + (a + c).

For Forth fans:

"Surreal addition using postfix notation."
a b (+) = (<a b + a <b +, >a b + a >b +).

Additional inputs

If the main function is defined with more than one argument, the first argument has the contents of stdin as usual, and the additional arguments have contents of additional input streams, determined in an implementation-dependent way. For example, on Unix and Unix-like systems, the second argument could be read from file descriptor 3, the third from file descriptor 4, etc.