Writing good pseudocode.

By a pseudocode interpreter.

It may be that CIS 313(Data Structures) is not the first time in your career that you've been asked to write copious amounts of pseudocode. In CIS 122 students are asked to write in a form of pseudocode , for example. But in CIS 313, a large number of problem set questions involve writing pseudocode--it is a key aspect of the course. However, we don't really provide an expectation of how a student should write it. This is perhaps a little unfair of us, but as a result, we are willing to accept a broad range of pseudocode grammars and styles. However, even the most flexible pseudocode interpreter will break down in some situations, and you as a student certainly don't want this--it's not funny to see a pseudocode interpreter melt down and bang his or her head against the wall. Here, then, is an attempt at defining an acceptable pseudocode convention.

1. Consistency matters.

As a pseudocode interpreter, I am familiar with quite a few programming languages. You shouldn't worry about whether I'll understand pseudocode which is reasonably close to a well-known language like Java, C, Perl, Visual Basic, English, or maybe even Lisp(but warn me about that last one). You should, however, worry about whether I will be confused by what you write.

Here is a simple example, a function which swaps two values(all examples are made up by me, but they are based on experience as a pseudocode interpreter):

 Algorithm Swap(takes integers a and b as input, and swaps them):
               Set temp = a
	       a = b
	       b = temp

Ignoring, for the moment, the syntax(which is vaguely like Visual Basic), consider the three assignments which are occurring. The coder has not defined the temp variable, but since it is interacting with integers, a standard pseudocode interpreter will assume(correctly) that it is also supposed to be an integer. But if temp is also an integer, why use the term Set before an assignment to it? This is an example of inconsistent pseudocode. A grader will probably not deduct points for it(unless they're cranky), but it does make the pseudocode harder to parse. In Visual Basic, there actually is a distinction between using Set before assignments: it means an object is being assigned another object(as opposed to an integer being assigned an integer value, etc.). If you write code which is consistent, it will tend to reduce the number of times you confuse the pseudocode interpreter.

2. Pseudocode offers unique documentation advantages.

In most programming languages newer than COBOL, variable names are allowed to be quite long, often at least 64 characters. In pseudocode, even those limitations don't apply(although using 100-character variable names might be rather awkward). You as a pseudocoder are free to use whatever names you want for your variables. If you can name them in a meaningful way(for instance, previous_pointer, current_pointer, next_pointer instead of, say, temp1, temp2, temp3), it will help the pseudocode interpreter to understand what your line of thinking is for a given problem. One guideline I might suggest for knowing when to name your variables usefully is if you have two or more variables whose names contain numbers which serve only to differentiate the variables. "Pointer1", "Pointer2", and "Pointer3" just aren't very descriptive, and could very well confuse you, the coder, as well as me, the interpreter.

Another thing which it is hard for a coder to do enough of is commenting pseudocode. Just as pseudocode interpreters accept most standard language grammars, we also accept comments in many different formats. If you think it might be hard for an interpreter to see, from a high-level perspective, what you are doing, put a comment in! The best advice I've heard for knowing when to write a comment is to put one in whenever the pseudocode you've written makes you feel uneasy, somehow. Such feelings of uneasiness tend to occur with complex fragments of code, and if they are strong enough, you should double-check that the code you have written makes sense. But if you think that it does, just put a comment in explaining what you intend, and we'll sort it out.

As with many things in this document, the previous two comments also apply to writing code in general, and not just pseudocode.

3. Strive for simplicity.

Everything should be made as simple as possible, but no simpler.--Einstein

The mastery of a problem should allow a coder to write good, clean pseudocode. When a coder has distilled the essence of a problem, he or she should be able to write code that uses no more space or computation than is necessary to solve it.

Let's look at another example, this time of finding the last node in a linked list:


// A function which takes a linked list as input, and returns the
// last(tail) node as output.
Node Last(LinkedList L) 
{
    Node a = L.first();
    L.set_current(L.first());
    while(temp != NULL)
        {
            temp = L.current().next().next();
	    L.set_current(L.current().next());
	}
    
    return L.current();
}

This kind of pseudocode makes a pseudocode interpreter cringe. It is overly complex for several reasons. But first, let's notice that the syntax is reasonably close to Java. That is perfectly acceptable--the CIS department teaches Java as a first language for undergrad majors, and we interpreters know that you as a student may feel most comfortable doing your pseudocoding in a Java-like language.

Also, notice that the coder is inherently assuming that the pseudocode interpreter will understand what the first(), next(), and current() functions return. That is perfectly acceptable, so long as those functions are reasonably named. If, instead of current(), the coder had written ct(), I might have had more issues with this function. But if the coder had said in a comment that ct() returned the current node of L, that would have been enough explanation for me.

Sometimes, though, a pseudocode interpreter will not know the meaning of a function whose definition is well-described by its name. Pseudocode interpreters only accept functions which are defined as inherently belonging to a data structure. For linked lists, the element() and next() functions can be assumed to belong to a node in the linked list. However, more advanced functions, such as findLast(), must be defined. If the coder above had written the following one-liner for the Last() function, a pseudocode interpreter wouldn't be entirely happy with it: return L.findLast(). The coder is essentially wrapping another function which does the same thing, without giving any clue as to how to actually find the last element in a linked list. So although you can use any function defined to belong inherently to a data structure(like L.head() for a linked list, or T.root() for a binary tree), any others should be defined in pseudocode.

Now, on to the criticism. First, notice that this function violates the consistency rule. In the first line of the function, a Node object with the name a is declared and set equal to the first item in the list. Fine. But it is used nowhere else in the function! Probably the coder meant to declare temp instead of a, since temp is used heavily in the rest of the function. This is a good way to cause a pseudocode parse error.

But really, the worst thing about this function is the while loop. If I, as an interpreter, assume that temp(instead of a) was initialized to be L.first(), I can execute the loop in a meaningful way. But the code to execute is even more confusing. The temp variable is being moved along two nodes of the list every time the loop iterates. Probably what the coder meant was to advance temp once, but started thinking about a special case where it perhaps should be advanced twice. As a pseudocode interpreter, it is hard to judge the intent of a coder when the code is ambiguous and not commented. Also, it is extremely important for you as a coder to understand exactly how to use pointers. If you don't understand the object you are manipulating, you can't understand how to write pseudocode to change it(see the next rule).

Meanwhile, in the second line of the while loop, L's current pointer is being advanced every time the loop iterates. That is as it should be. Given that the intended purpose of temp appears to be to watch out for the end of the list, as a pseudocode interpreter I can be forgiving and assume, to some extent, that the coder had the right idea. But this code is in no way crisp, mainly because two separate variables are being used to keep track of the current location in the list, when only one is really needed. Here is a much cleaner example of pseudocode which also returns the last node in a linked list:


// A function which takes a linked list as input, and returns the
// last(tail) node as output.  Assumes that L.first() != NULL.
Node Last(LinkedList L) 
{
    temp = L.first();
    while(temp.next() != NULL)
        {
	    temp = temp.next();
	}

    return temp;
}

This algorithm takes O(n) running time, where n is the length of the list, since it traverses the entire list in order to reach the end.

This pseudocode is much easier to read, in great part because it is simpler. The only main variable used is temp, and the only function used repeatedly is next(). The coder has taken the liberty of assuming that the function will not receive an empty list, and that's acceptable. The point of pseudocode is to demonstrate a good overall knowledge of the algorithm in question, and not to demonstrate that one has dealt with every possible case. Of course, there is a balance to these things, and sometimes you might be uncertain if you should deal with a particular situation that could arise. When in doubt, do deal with it. If it's unimportant, we pseudocode interpreters will know, and pass over it. If it is important, though, we will be glad that you covered it.

4. Understand the structures that you are using.

Pointers are a very hard thing to figure out, but they form the backbone of most important data structures like linked lists, queues, binary search trees, and hash tables. It is vitally important that you as a coder understand how they work. If you aren't sure of some use of them, come talk to the course professor or one of the GTFs. If you don't feel confident about your ability to use pointers in your code, how can you write good pseudocode? It will inevitably turn out a little confused, as if you aren't certain in which ways the data structure or temporary variables can be manipulated.

5. Analyze your code.

In many cases, it is important to understand how much time or space an algorithm in pseudocode takes in order to run. Whenever you are asked to provide a linear-time algorithm, or one that runs in O(log n) space, you should automatically provide analysis with that algorithm. Preferably that analysis would include a proof of correctness, but for CIS 313 we are mainly concerned with the running time of an algorithm, and in some cases the space used in execution. Sometimes it may not take much to justify running time("This loop takes O(n) time because it iterates from 1 to n," for example), but sometimes it will--quicksort is an example of an algorithm whose worst-case running time is O(n^2), but whose best-case and average-case running time is O(nlogn). A justification for the latter running time is by no means trivial.

6. Some more examples.

The Fibonacci sequence is defined intuitively as a sequence of numbers such that any number in the sequence is the sum of the previous two numbers. The first two Fibonacci numbers are 1 and 1, which completes the definition. From this, we know that the first ten Fibonacci numbers are:

1, 1, 2, 3, 5, 8, 13, 21, 34, 55.

One common toy problem in computer science is to write a function which calculates an arbitrary Fibonacci number. (This may not seem to have any real-world application, but at the two companies at which I was employed full-time, some variation of this question was asked in most interviews. I was asked this question in one interview. This is probably useful code to understand.) Below are some algorithms, in wildly differing pseudocode, which compute the n-th Fibonacci number recursively.

int Fibonacci(int n)
    if(n <= 2)
        return 1
    else
        return Fibonacci(n - 1) + Fibonacci(n - 2)

And, for a taste of something different, here is pseudocode written almost entirely in English:

The Fibonacci function I will describe computes the n-th Fibonacci
number recursively.  It takes a positive integer, n, as input.  If n = 1
or n = 2, it returns 1.  If n is greater than 2, it calls itself
twice, once with the argument n-1, and once with the argument n-2.  It
adds the result of those two function calls, and returns that value.

The first piece of pseudocode should be pretty straightforward. The second piece, however, is about as close as pseudocode can probably get to English. It is also probably a good argument for writing pseudocode which is closer to a programming language than a natural language in its grammar--it's fairly long-winded. But still, it manages to capture the essence of a recursive Fibonacci function, and is perfectly acceptable for a pseudocode compiler.

I will now practice what I have preached, and analyze this pseudocode. Since each of these functions are recursive, their running time is not very good. For each call with n greater than 2, two calls are made recursively. Therefore, whenever n grows by 1, the number of recursive calls doubles(approximately). So the running time of these algorithms is O(2^n). That's not very fast. But most of these function calls compute the same number. Intuitively, it seems like it should be possible to write pseudocode which takes O(n) time, since there are O(n) calculations really needed to compute the n-th Fibonacci number. Here is some pseudocode which does so in linear time:

Function Iterated_Fibonacci:
Input(s): n
Output(s): n-th Fibonacci number

previous = 1
current = 1
next = 0

// The loop starts at 3 because if n is less than three, the loop
// won't execute and we will return current = 1.  That's fine.

for i = 3 to n :
    next <- previous + current // compute the new value 
    previous <- current        // move up the two previous numbers in the sequence, 
    current <- next            // so that if need be we can compute the next one.  


return current

Again, analyzing this code, the loop iterates O(n) times, and everything outside the loop takes O(1) time. So if we can figure out how long each iteration of the loop takes, we can get the running time of the program. But within the loop, three operations which each take constant time are executed. So the loop's total running time is O(n)*O(1), which is O(n).

In creating these pieces of pseudocode, I have intentionally tried to make each one different from the others. Within each function, the syntax is fairly consistent, but between functions it differs, often significantly. I made sure of this in order to demonstrate that good pseudocode can be written in pretty much any style you prefer. As a pseudocode interpreter, I don't really care what that style is, so long as I can make sense of it. I hope this guide has helped you, the pseudocoder, to understand how to write code that I can easily make sense of.

7. Feedback.

The present maintainer of this page is dhofer@cs. Please send him any comments you may have.