As I’ve discussed previously, there are a number of good reasons why Haskell is not suitable for teaching introductory functional programming. Chief among these is laziness, which in the context of a pure functional language has fatal side effects. First, Haskell suffers from a paucity of types. It is not possible in Haskell to define the type of natural numbers, nor the type of lists of natural numbers (or lists of anything else), nor any other inductive type! (In Carollian style there are types called naturals and lists, but that’s only what they’re called, it’s not what they are.) Second, the language has a problematic cost model. It is monumentally difficult to reason about the time, and especially space, usage of a Haskell program. Worse, parallelism arises naturally in an eager, not a lazy, language—for example, computing every element of a finite sequence is fundamental to parallel computing, yet is not compatible with the ideology of laziness, which specifies that we should only compute those elements that are required later.
The arguments in favor of laziness never seem convincing to me. One claim is that the equational theory of lazy programs is said to be more convenient; for example, beta reduction holds without restriction. But this is significant only insofar as you ignore the other types in the language. As Andrzej Filinski pointed out decades ago, whereas lazy languages have products, but not sums, eager languages have sums, but not products. Take your pick. Similarly, where lazy languages rely on strictness conditions, eager languages rely on totality conditions. The costs and benefits are dual, and there seems to be no reason to insist a priori on one set of equations as being more important than the other.
Another claim is that laziness supports the definition of infinite data types, such as infinite sequences of values of some type. But laziness is not essential, or even particularly useful, for this purpose. For example, the type nat->nat is a natural representation of infinite sequences of natural numbers that supports many, but not all, of the operations that finite sequences (but not, for example, operations such a reverse, which make no sense in the infinite case). More generally, there is no inherent connection between laziness and such infinitary types. Noam Zeilberger has developed an elegant theory of eager and lazy types based on distinguishing positive from negative polarities of type constructors, the positive including the inductive and the negative including the coinductive. Coinductive types are no more about laziness than inductive types are about pointers.
I wish to argue that laziness is important, but not for pure functional programming, but rather only in conjunction with effects. This is the Kahn-MacQueen Principle introduced in the 1970’s by Gilles Kahn and David MacQueen in their seminal paper on recursive networks of stream transducers. Dan Licata and I have emphasized this viewpoint in our lectures on laziness in our new course on functional programming for freshmen.
Let’s use streams as a motivating example, contrasting them with lists, with which they are confused in lazy languages. A list is an example of a positive type, one that is defined by its membership conditions (constructors). Defining a function on a list amounts to pattern matching, giving one case for each constructor (nil and cons), and using recursion to apply the function to the tail of the list. A stream is an example of a negative type, one that is defined by its behavioral conditions (destructors). Defining a stream amounts to defining how it behaves when its head and tail are computed. The crucial thing about lists, or any positive type, is that they are colimits; we know as part of their semantics how a value of list type are constructed. The crucial thing about streams, or any negative type, is that they are limits; we know as part of their semantics how they behave when destructed.
Since we have no access to the “inside” of a stream, we should think of it not as a static data structure, but as a dynamic process that produces, upon request, successive elements of the stream. Internally, the stream keeps track of whatever is necessary to determine successive outputs; it has its own state that is not otherwise visible from the outside. But if a stream is to be thought of as given by a process of generation, then it is inherently an ephemeral data structure. Interacting with a stream changes its state; the “old” stream is lost when the “new” stream is created. But, as we have discussed previously, ephemeral data structures are of limited utility. The role of memoization is to transform an ephemeral process into a persistent data structure by recording the successive values produced by the process so that they can be “replayed” as necessary to permit the stream to have multiple futures. Thus, rather than being a matter of efficiency, memoization is a matter of functionality, providing a persistent interface to an underlying ephemeral process.
To see how this works in practice, let’s review the signatures PROCESS and STREAM that Dan Licata and I developed for our class. Here’s a snippet of the signature of processes:
signature PROCESS = sig type 'a process = unit -> 'a option val stdin : char process val random : real process end
A process is a function that, when applied, generates a value of some type, or indicates that it is finished. The process stdin represents the Unix standard input; the process random is a random number generator. The signature of streams looks essentially like this:
signature STREAM = sig type 'a stream datatype 'a front = Nil | Cons of 'a * 'a stream val expose : 'a stream -> 'a front val memo : 'a Process.process -> 'a stream val fix : ('a stream -> 'a stream) -> 'a stream end
The type ‘a front is the type of values that arise when a stream is exposed; it can either terminate, or present an element and another stream. The memo constructor creates a persistent stream from an ephemeral process of creation for its elements. The fix operation is used to create recursive networks of streams. There are other operations as well, but these illustrate the essence of the abstraction.
Using these signatures as a basis, it is extremely easy to put together a package of routines for scripting. The fundamental components are processes that generate the elements of a stream. Combinators on streams, such as composition or mapping and reducing, are readily definable, and may be deployed to build up higher levels of abstraction. For example, Unix utilities, such as grep, are stream transducers that take streams as inputs and produce streams as outputs. These utilities do not perform input/output; they merely transform streams. Moreover, since streams are persistent, there is never any issue with “buffering” or “lookahead” or “backtracking”; you just manipulate the stream like any other (persistent) data structure, and everything works automagically. The classical Bell Labs style of intermixing I/O with processing is eliminated, leading not only to cleaner code, but also greater flexibility and re-use. This is achieved not by the double-backflips required by the inheritance mechanisms of oopl’s, but rather by making a crisp semantic distinction between the processing of streams and the streaming of processes. True reuse operates at the level of abstractions, not at the level of the code that gives rise to them.
Update: It seems worthwhile to point out that memoization to create a persistent from an ephemeral data structure is a premier example of a benign effect, the use of state to evince functional behavior. But benign effects are not programmable in Haskell, because of the segregation of effects into the IO monad.
Update: Lennart Augustsson gives his reasons for liking laziness.
Update: word smithing.