All About Recursion

I. Recursive definitions.
II. A Simple Recursive Procedure.
III. The Principle of Information Hiding.
IV. When Do You Use Recursion?
V. The Four Rules.
VI. Another Simple Example.

I. Recursive definitions.

A recursive definition is a definition in which the thing being defined occurs as part of its own definition.

Sounds circular, doesn't it? Circular definitions, in computer science as elsewhere, are valueless and should be avoided. The distinction between a recursive and a circular definition lies in the use of the work "part": any recursive definition has a noncircular part, or basis, which is like an ordinary definition, and directly specifies some objects that fit the definition; and also a recursive (circular) part, which allows you to use the objects that fit the definition in determining whether new objects also fit the definition.

Some examples should clarify these ideas.

Here is an ordinary definition:

Vowel: one of the letters "a," "e," "i," "o," "u," or "y".

Clearly, there are exactly six things that fit the above definition of a vowel. Now consider the following circular definition:

Yowl: any yowl followed by a vowel.

By this definition, any yowl followed by a vowel is also a yowl; thus, if we could find one thing which is a yowl, then any things formed by adding vowels to that yowl would themselves be yowls. The problem is in finding that first yowl. Since we have no place to start, the definition is circular, and valueless.

Now consider the following definition:

Howl:

the letter "h," or
any howl followed by a vowel.

This definition is recursive. The first part is an ordinary definition, and tells us that the letter "h" is a howl; this gives up a basis, or place to start. The second part is the recursive part; it tells us that, since "h" is a howl, so are "ha," "he," "hi," "ho," "hu," and "hy". But since these are howls too, so also must be "haa," "hae,"..., "hyy,".... "haaeeuooeea," etc.

Note that this is a "good" definition in the sense that some things are howls, some things are not howls, and it is easy to tell from the definition which are which. The "circularity" of the second part of the definition causes no harm, because the first part provides a basis.

Recursion abounds in computer science. The usual definition of "identifier" (or "variable name"), for example, is as follows.

Identifier:

a letter, or
an identifier followed by a letter or a digit.

Notice that the definitions of both "howl" and "identifier" allow arbitrarily long strings. It is possible to write recursive definitions which limit the size of the things being defined, but in general this is neither easy nor desirable. Instead, if there must be limits, they are set by some external agency. Some programming languages, for example, allow arbitrarily long identifiers, but require that two different identifiers must differ in the first k characters, where k is set by the implementation. In the same way, the maximum length of a howl might be determined by your lung capacity.

You have probably noticed that the above definitions don't have to be recursive; they could be made into ordinary definitions by using the phrase "any number of...". This is true--we don't need recursion. In fact, recursion is never absolutely necessary, merely useful.

The definitions we have considered so far have all been examples of direct recursion. For an example of indirect recursion, consider the following pair of definitions (adapted from Lisp):

S-expression: an identifier or a list.

List: a sequence consisting of

a left parenthesis,
zero or more S- expressions separated by blanks, and
a right parenthesis.

Thus, the following things are lists (and also S-expressions):

( )

(ZIP ZAP ZOT)

(ONE (TWO THREE) ((FOUR)) FIVE)

( ( ) IXOHOXI ( ) )

These definitions are mutually recursive: each is defined in terms of the other. The basis for the S-expression is an identifier, while the basis for the list is the sequence ( ).

To show that these definitions "work," consider the S-expression (NOT (HARD) ).

(NOT (HARD) ) is an S-expression because it is a list.
(NOT (HARD) ) is a list because it consists of a left parenthesis, the two S-expressions NOT and (HARD), and a right parenthesis.
NOT is an S-expression because it is an identifier.
(HARD) is an S-expression because it is a list.
(HARD) is a list because it consists of a left parenthesis, the S- expression HARD, and a right parenthesis.
HARD is an S-expression because it is an identifier.

Q.E.D.

Of course you don't have to go through this complete process every time you see an S-expression, but we were being very formal.

Now is a good time to put in a plug for the value of recursion. The definition of "list" given above may seem confusing at first (because you're not used to recursive definitions), but I challenge anyone to write a reasonable definition of "list" which is equivalent to the one given above, yet does not use any form of recursion.

II. A Simple Recursive Procedure.

A recursive procedure (or function, or subroutine) is a procedure which calls itself to to part of the work.

Again, the trick is in the use of the word "part". If it called itself to do all of the work, the result would be an infinite recursion (the recursive equivalent of an infinite loop). Just as a loop does some part of the work during each iteration of the loop, so must a recursive procedure do some part of the work at each level of recursion, until eventually all the work is done.

The usual first example of recursion is the factorial function. This is in many respects a poor example, but we will use it for the same reason everyone else does: it is simple.

The factorial of a positive integer n, written n!, is the product of all the positive integers from 1 up to and including n. Thus, we have

1! = 1

2! = 1 * 2 * 2

3! = 1 * 2 * 3 = 6

4! = 1 * 2 * 3 * 4 = 24

etc.

Note that, for example, 4! = 1 * 2 * 3 * 4 = (1 * 2 * 3) * 4 = 3! * 4, and in general n! = (n - 1)! * n. This leads to the following recursive definition:

The factorial of a positive integer n, written n!, is

(1) one, if n = 1 (basis), or

(2) (n - 1)! * n, if n > 1 (recursion).

This definition leads immediately to the following computer program (written in Pascal):

function factorial (n: integer): integer;
   begin
      if n = 1 
         then factorial := 1
         else factorial := factorial (n - 1) * n
   end

This function has a basis (n = 1) and a recursive part (factorial (n - 1)), and it does part of the work (multiplying by n) at each level. Thus it seems as though the function might possibly work. But if this is the first time you have seen recursion, you probably are not at all comfortable with it.

In fact, the program does work. The main thing wrong with it is that it's a stupid way to compute a factorial--a loop would be simpler and better. Have patience.

Many authors suggest that the best way to understand a recursive routine is to trace through it, keeping track of what happens at every level of the recursion. By all means work through such a trace, if doing so helps you believe recursion can actually work. But you should never think of tracing through a recursive procedure as a means of understanding, or debugging, such a procedure. "Tracing through" has probably kept hundreds of people from ever really understanding recursion. The purpose of this paper is to describe a better technique.

III. The Principle of Information Hiding.

Perhaps the single most important tool we have for controlling the complexity of programs is modularization: breaking a single complex program up into several simpler, logically independent modules (procedures, functions, etc.).

Whenever a program is broken into two parts, there comes into being an interface between them. This interface consists of the parameter list, any global or common variables they may share between, them, and often more subtle ways of information transmission. If modularization is to succeed in reducing complexity, this interface must be kept as narrow and as explicit as possible.

The best way to keep the interface narrow is to ensure that each procedure (module) does just one thing, and that that thing can be easily described in an English sentence with few or no "buts" and "excepts" in it. Keeping parameter lists short and avoiding global variables also helps.

The best way to make the interface explicit is to ensure, whenever possible, that all information transmission occurs via the parameter list; shun global variables. There are those who feel global variables should be avoided entirely, but I don't take such an extreme position. It is often more convenient to have global access to arrays and other large data structures; just take special care not to mess them up.

Side effects should also be avoided. The main effect of a routine is the thing it's supposed to do; side effects are the little extra things it does. Side effects are unhealthful even when they're intentional. For example, if a routine sort(A,n) sorts an array A of n integers, that's its main effect; but if it changes n in the process, that's a side effect. The bad thing about it is that it happens in the sort routine, but errors which result from this side effect will show up in the routines which call the sort routine. Since these routines may themselves be correct, the likely result is that the programmer spends hours trying to debug the wrong routines.

We are now in a position to state the Principle of Information Hiding: every routine should mind its own business. No routine should meddle in the affairs of another.

If some routine calls a sort routine, that routine should neither know nor care just how the sort routine operates. Nor should the sort routine know why its calling routine wants the sorting done.

Here are three simple examples of violations of the Principle of Information Hiding. In each case the function is supposed to return the maximum value in an array of integers. Can you spot the violations?

function max (var A: array [1 . . n]
   of integer): integer;
   begin 
      sort(A,n); max := A[1]
   end;
 
 
function max (A: array [1 . . n] of integer):
integer;
   begin
      max := A[1];
      for i := 2 to n do
         if A[1] > max then max := A[i]
   end;
 
 
function max (A: array [1 . . n] of
integer):integer;
   var i : integer;
   begin
      max := 0;
      for i := 1 to n do
         if A[i] > max then max := A[i]
   end;

Answers:

Sorting the array is a rather drastic side effect.
The function uses (and changes) the global variable "i".
This function should not know that A contains only positive numbers.

These examples are trivial, and in real life are not likely to cause major problems. Things get worse when the rest of the program gets more interesting.

The following "main program" controls a two-player game between the human ant the computer, in which the players alternate moves. The Principle of Information Hiding has been largely respected in this example; indeed, it is impossible to deduce from the program just which game is being played.

begin
   setup;
   choosefirst (player);
   repeat
      if player = "human" then
         begin
            repeat
               accept (move);
               checklegality (move,ok)
            until ok;
            make (player, move, gameover);
            player := "computer"
         end
      else  (* player = "computer" *)
         begin
            decide (move);
            make (player, move, gameover);
            player := "human"
         end
   until gameover
end.

It is possible to understand this program and verify its correctness, given very simple assumptions about what the called routines do. For example, we would expect "choosefirst (player)" to somehow decide who plays first, and to set its parameter to "human" or "computer" accordingly. All information relevant to the operation of the main program is fully specified in the parameter lists.

I have not shown any of the declarations. There is, presumably, a structured variable "board" which represents the playing board; since there is only one of it, and since it is used by all routines, it seems to me to make more sense to make "board" a global variable rather than to put it in the parameter list of every routine. (It may be that procedure "decide" uses a scratch board to do its thinking on--but that's its business.)

A more serious violation of the Principle of Information Hiding is variable "move," which must somehow be declared in the main program; to do this, we must know how a move can be specified. True, the variable is not really used in the main program, just passed along from one procedure call to the next. There do exist languages which recognize this and allow you to defer the declaration, but you won't be programming in one of them anytime soon. Some things we just learn to live with.

Now consider an alternate version of the same program.

begin
   setup;
   choosefirst (player);
   repeat
      if player = "human" then
         begin
            repeat
               accept (move);
               checklegality (move);
               k := k - 1
            until ok;
            make (move)
         end
		else (* player = "computer" *)
         begin
            decide (move);
            make (move);
            player := "computer"
         end
   until gameover
end

Due to the many violations of the Principle of Information Hiding, it is impossible to tell whether the program is correct or not, without extensive reference to the called routines.

What routine or routines update "player"? Do they do it in such a way that the main program works?
If the human's move is not ok, has "player" already been updated?
How does "make" know whose move to make?
Who sets "ok"?
When does the game end? Who's responsible for deciding this?
Is "k" being initialized properly? Computed properly? What is it anyway, and who uses it?

It may seem at this point that we have drifted away from the topic of recursion, and perhaps we have, since the Principle of Information Hiding applies to all programs. As we shall see in section V, however, it has a special importance for recursive programs.

IV. When Do You Use Recursion?

It is an easily proved fact that you never need recursion; any recursive program can be changed into a nonrecursive one, for example by the use of stacks. (In the same way you can prove than any program could be written in absolute binary.)

The best answer to the question of when to use recursion is simply, when you happen to find it useful. You never just set out to "use recursion," any more than you just set out to "use a loop". You program; sometimes you loop, sometimes you use an array, sometimes you do input/output, and sometimes you recur.

But that's not a very useful answer when you're first getting started. We'll try to be more specific.

One fairly good rule of thumb is to use recursion when you're processing recursively defined data structures. If you try to evaluate an arithmetic expression, for example, parentheses may be used to enclose a "subexpression" which must be evaluated first, and is an expression in its own right. The only reasonable way to to this is to write a recursive expression evaluation routine; loops alone don't suffice.

Another (equivalent) rule is to handle nested structures with recursion. As an example, in many languages any statement may be nested in an IF statement, even another IF statement. There is a clear use for recursion in any processor (compiler, preprocessor, interpreter, debugger, pretty printer) for such a language.

Finally, programs which use a single stack can often be written more clearly as recursive programs which don't use a stack. (Programs which use two or more stacks, however, often cannot be rewritten as recursive programs without any stacks; but the proof of this fact is beyond the scope of the current paper.)

V. The Four Rules.

Here we propose four rules which, if understood and followed, will result in working recursive programs. Once you become comfortable with recursion you will realize that these rules are not absolute, and can be violated for good cause, if care is taken; but stick with them for now.

1. Handle the base cases first.

You already know that a recursive routine must have some base cases, cases that it can compute directly without need for recursion. For example, in the factorial program the only base case is n - 1, and the result returned is 1. The very first thing you should do upon entry to a recursive routine is to test whether you have a base case,

and if you do, to whatever needs to be done for that case and return. Don't mess around, and don't recur, just to what needs to be done and get out.

2. Recur only with a simpler case.

This rule is crucial for preventing circularity and infinite recursion. If you ever, even once, recur with the same (or harder) problem, your program immediately disappears off into Cloud- Cuckoo Land.

What, however, does it mean to be "simpler"? That depends on the problem. "Simpler" can mean: with a smaller number, with a smaller array, with a more nearly sorted array, with a shorter expression, with a larger number, a more nearly random array, or just about anything else.

Once you have established your base cases, those are the "simplest" cases, and "simpler" is anything that is in some clear way "closer" to one or more of those base cases. Sometimes it is obvious when we are getting closer, as in the factorial function: for n > 1, clearly n - 1 is closer to 1 than n is. Similarly when we evaluate expressions, it may be obvious that a shorter expression is a simpler one; or perhaps it is an expression with fewer parentheses, or fewer arithmetic operators. When you write a recursive function, you must be clear in your mind just when a problem is "closer" to the base case (hence "simpler"), and you must stick to it.

Suppose we try to speed up the factorial function by doing two multiplications at each level of recursion, rather than just one. We might get

function factorial (n: integer): integer;
   begin
      if n - 1 then factorial := 1
      else factorial := n * (n - 1) * factorial (n - 2)
   end

This doesn't work. Before you read any further, try to figure our the error for yourself.

The problem is that n - 2 isn't necessarily "closer" to 1 than n is. In particular, if n is even, so is n - 2, and soon we will overshoot 1 and try to compute the factorials of 0, -2, -4, -6, and so on. It is part of the notion of "closer" that (in a finite system), if we get closer enough times, sooner or later we will get there.

So you to have to be careful that your notion of "closer" will sooner or later get you there.

If this all sounds vague and mysterious, don't panic. For any particular, concrete problem, it is 99% certain that it will be obvious what meaning to attach to the word "simpler". But do take twenty seconds, before you rush ahead with the program, to decide what the word means. Then each time you write a recursive call, make sure you recur with a simpler problem.

3. Don't interfere with the correct operation of the calling routine.

Consider the following sequence:

compute the value of x,
call a subprogram,
use x for something.

If the subprogram destroys the value of x, then clearly there is a problem. We return, finally, to the Principle of Information Hiding: unless it is the specific purpose of the subprogram to modify x for us, the subprogram has no business changing x, or even knowing that x exists. In short, the subprogram should not use x.

Now for the kicker. If the subprogram call is recursive--if the routine we are in calls itself--then the called routine must use the same variable names as the calling routine. After all, it's really the same routine both times.

You should now be horrified. If you aren't, you missed something; go back to the beginning of rule 3, and read more carefully this time.

The implication seems to be that you can't use variables in a recursive routine; or, more precisely, every recursive call destroys the value of all your variables. This is pretty intolerable. Fortunately, there is an easy solution: never change the value of a global variable.

If you have forgotten about scope rules, review them now. They have suddenly become important.

Briefly, you cannot avoid using the same names over again in a recursive routine, but if you declare them inside the routine they are then local (hence, different) variables, and cannot interfere with other variables of the same name but at a different level of recursion.

A recursive routine thus may access three types of variables: (1) local variables, which are declared inside the routine itself, and are totally safe; (2) parameters, which behave rather like local variables, and are safe so long as you are careful to avoid aide effects; and (3) global variables, which are extremely unsafe--you may look at their current values, but if you change those values you are playing with fire.

Moral: Make sure all your variables are either local or are parameters. If you can't, make sure you never change the value of a global variable. And if you must alter global variables, think long and hard about what you're doing.

Fortunately, it is easy to get into the habit of avoiding globals. You don't have to think through the reasons each time. Just avoid them.

4. Don't look down.

That is, when you reach a recursive call, don't go back to the top of your routine and start working through it all over again. Don't try to "trace through" recursive calls. That way lies madness.

Consider this: if you are working your way through a complex but nonrecursive routine, and it calls, say a sorting routine, do you drop everything and rush off to see if the sorting routine works? Of course not. You simply assume that the sorting routine works (and if you have doubts you save them for later), and keep going in the routine you're checking. Rule 4 simply says to treat all calls this way, even recursive ones.

It may seem odd at first to assume the called routine works, when that called routine happens to be the very one you're checking. However, you must do so: the human mind does have limitations. Pretend, if it helps, that you are not actually recurring. but rather calling an entirely different routine (which happens to have the same name) written by someone you trust.

What is involved here is perhaps best described as an "act of faith." You have to learn to trust recursion; to believe that, if you can get the routine correct at this level, then you don't have to worry about any deeper levels. Until you can bring yourself to make this commitment, you will be compelled to look down into the recursion, and you will never untangle what you see there.

Think about rules one to three, and try to convince yourself logically that a program written according to these rules ought to work. Then, even if you can't yet manage rule four, pretend you do when you write your program.

Now run your program. It will bomb, of course--programs always do--but steadfastly refuse to look down. Go over rules one, two, and three, and make sure you didn't violate one of these. Next, go over the program the same way you always would, looking for errors that have nothing to do with the recursion itself (the error doesn't have to be in the recursion, remember). If this too fails, get help.

Don't even think about looking down, for this one simple reason: it doesn't help.

If the notion of "faith" bothers you, think about the novice programmer who complains that his program can't be wrong, so the computer must have made a mistake. You may have once done this yourself. In time, all programmers learn to trust the computer; now you must learn to trust recursion.

VI. Another Simple Example.

This time we will devise a recursive Pascal program to find the maximum value in an array. Again, alas, it would be better to do this without recursion, but it is difficult to find simple examples that really need recursion, at least until we have learned about some recursively defined data structures.

We will use recursion to find the maximum value in an array. What is a simpler problem of the same sort? Obviously, finding the maximum of a smaller array. So we will plan to recur only with smaller arrays, and our base case will be the smallest array--say, an array of only one element.

Here is one approach: pluck a number off one end of the array; recursively find the maximum of the remaining array; compare the maximum to the one number plucked off, and the maximum is the larger of these. For a base case, the maximum of a single-element array is that element.

While this approach is workable, it isn't very interesting. Here's an approach I like better: divide the array (approximately) in half; recursively find the maximum of each half; return the larger of these two maxima as the grand maximum. Again, the base case is a one-element array, with that element as the maximum.

Here is a first cut at writing the program. Remember that the base case is put first.

function max(A):
   begin
      if the array is only one element long
         then return that element
         else
            begin
               find the midpoint of array A;
               leftmax := max(left half of A);
               rightmax := max(right half of A);
               if leftmax > rightmax
                  then max := leftmax
                  else max := rightmax
            end
   end

We have handled the base case first, and each recursive call is with a simpler case (the array size is closer to one). We now run up against a practical problem, having little or nothing to do with recursion: how do we pass part of an array as a parameter?

In Pascal, at least, the answer is that we don't. What we can do, instead, is to pass the entire array, and two variables, "low" and "high," to specify the lower bound and the higher bound, respectively, of the subarray that we will be using. Thus the original call to max will be of the form "max(A,l,n)" where n is the size of the array. Our next cut, using proper Pascal syntax for the parameters, is

function max(A: array [1 . . n] of integer;
             low, high: integer) : integer;
   begin
      if low = high then max := A[low]
      else
         begin
            mid := trunc((low + high) /2);
            leftmax := max(A, low, mid);
            rightnax := max(A, mid+l, high);
            if leftmax > rightmax
               then max := leftmax
               else max := rightmax
         end
   end

This program satisfies rules one and two. When we check rule three, we find some undeclared variables that must be made local to max; declaring them globally would be a disaster. Hence we put the following declaration immediately before the first begin:

var
   mid, leftmax, rightmax: integer;

and we're done.