It may be that CIS 313(Data Structures) is not the first time in your career that you've been asked to write copious amounts of pseudocode. In CIS 122 students are asked to write in a form of pseudocode , for example. But in CIS 313, a large number of problem set questions involve writing pseudocode--it is a key aspect of the course. However, we don't really provide an expectation of how a student should write it. This is perhaps a little unfair of us, but as a result, we are willing to accept a broad range of pseudocode grammars and styles. However, even the most flexible pseudocode interpreter will break down in some situations, and you as a student certainly don't want this--it's not funny to see a pseudocode interpreter melt down and bang his or her head against the wall. Here, then, is an attempt at defining an acceptable pseudocode convention.
As a pseudocode interpreter, I am familiar with quite a few programming languages. You shouldn't worry about whether I'll understand pseudocode which is reasonably close to a well-known language like Java, C, Perl, Visual Basic, English, or maybe even Lisp(but warn me about that last one). You should, however, worry about whether I will be confused by what you write.
Here is a simple example, a function which swaps two values(all examples are made up by me, but they are based on experience as a pseudocode interpreter):
Algorithm Swap(takes integers a and b as input, and swaps them): Set temp = a a = b b = temp
Ignoring, for the moment, the syntax(which is vaguely like Visual
Basic), consider the three assignments which are occurring. The coder
has not defined the temp
variable, but since it is
interacting with integers, a standard pseudocode interpreter will
assume(correctly) that it is also supposed to be an integer. But if
temp is also an integer, why use the term Set
before an
assignment to it? This is an example of inconsistent pseudocode. A
grader will probably not deduct points for it(unless they're cranky),
but it does make the pseudocode harder to parse. In Visual Basic,
there actually is a distinction between using Set
before assignments: it means an object is being assigned
another object(as opposed to an integer being assigned an integer
value, etc.). If you write code which is consistent, it will tend to
reduce the number of times you confuse the pseudocode interpreter.
In most programming languages newer than COBOL, variable names are
allowed to be quite long, often at least 64 characters. In
pseudocode, even those limitations don't apply(although using
100-character variable names might be rather awkward). You as a
pseudocoder are free to use whatever names you want for your
variables. If you can name them in a meaningful way(for instance,
previous_pointer
, current_pointer
,
next_pointer
instead of, say, temp1
,
temp2
, temp3
), it will help the pseudocode
interpreter to understand what your line of thinking is for a given
problem. One guideline I might suggest for knowing when to name your
variables usefully is if you have two or more variables whose names
contain numbers which serve only to differentiate the variables.
"Pointer1", "Pointer2", and "Pointer3" just aren't very descriptive,
and could very well confuse you, the coder, as well as me, the
interpreter.
Another thing which it is hard for a coder to do enough of is commenting pseudocode. Just as pseudocode interpreters accept most standard language grammars, we also accept comments in many different formats. If you think it might be hard for an interpreter to see, from a high-level perspective, what you are doing, put a comment in! The best advice I've heard for knowing when to write a comment is to put one in whenever the pseudocode you've written makes you feel uneasy, somehow. Such feelings of uneasiness tend to occur with complex fragments of code, and if they are strong enough, you should double-check that the code you have written makes sense. But if you think that it does, just put a comment in explaining what you intend, and we'll sort it out.
As with many things in this document, the previous two comments also apply to writing code in general, and not just pseudocode.
Everything should be made as simple as possible, but no simpler.--Einstein
The mastery of a problem should allow a coder to write good, clean pseudocode. When a coder has distilled the essence of a problem, he or she should be able to write code that uses no more space or computation than is necessary to solve it.
Let's look at another example, this time of finding the last node in a linked list:
// A function which takes a linked list as input, and returns the // last(tail) node as output. Node Last(LinkedList L) { Node a = L.first(); L.set_current(L.first()); while(temp != NULL) { temp = L.current().next().next(); L.set_current(L.current().next()); } return L.current(); }
This kind of pseudocode makes a pseudocode interpreter cringe. It is overly complex for several reasons. But first, let's notice that the syntax is reasonably close to Java. That is perfectly acceptable--the CIS department teaches Java as a first language for undergrad majors, and we interpreters know that you as a student may feel most comfortable doing your pseudocoding in a Java-like language.
Also, notice that the coder is inherently assuming that the
pseudocode interpreter will understand what the first()
,
next()
, and current()
functions return.
That is perfectly acceptable, so long as those functions are
reasonably named. If, instead of current()
, the coder
had written ct()
, I might have had more issues with this
function. But if the coder had said in a comment that
ct()
returned the current node of L, that would have been
enough explanation for me.
Sometimes, though, a pseudocode interpreter will not know the
meaning of a function whose definition is well-described by its name.
Pseudocode interpreters only accept functions which are defined as
inherently belonging to a data structure. For linked lists, the
element()
and next()
functions can be
assumed to belong to a node in the linked list. However, more
advanced functions, such as findLast()
, must be defined.
If the coder above had written the following one-liner for the
Last()
function, a pseudocode interpreter wouldn't be
entirely happy with it: return L.findLast()
. The coder
is essentially wrapping another function which does the same thing,
without giving any clue as to how to actually find the last element in
a linked list. So although you can use any function defined to belong
inherently to a data structure(like L.head()
for a linked
list, or T.root()
for a binary tree), any others should
be defined in pseudocode.
Now, on to the criticism. First, notice that this function
violates the consistency rule. In the first line of the function, a
Node object with the name a
is declared and set equal
to the first item in the list. Fine. But it is used nowhere else in
the function! Probably the coder meant to declare temp
instead of a
, since temp
is used heavily
in the rest of the function. This is a good way to cause a pseudocode
parse error.
But really, the worst thing about this function is the while loop.
If I, as an interpreter, assume that temp
(instead of
a
) was initialized to be L.first()
, I can
execute the loop in a meaningful way. But the code to execute is even
more confusing. The temp
variable is being moved along
two nodes of the list every time the loop iterates. Probably what the
coder meant was to advance temp
once, but started
thinking about a special case where it perhaps should be advanced
twice. As a pseudocode interpreter, it is hard to judge the intent of
a coder when the code is ambiguous and not commented. Also, it is
extremely important for you as a coder to understand exactly how to
use pointers. If you don't understand the object you are
manipulating, you can't understand how to write pseudocode to change
it(see the next rule).
Meanwhile, in the second line of the while loop, L
's
current pointer is being advanced every time the loop iterates. That
is as it should be. Given that the intended purpose of
temp
appears to be to watch out for the end of the list,
as a pseudocode interpreter I can be forgiving and assume, to some
extent, that the coder had the right idea. But this code is in no way
crisp, mainly because two separate variables are being used to keep
track of the current location in the list, when only one is really
needed. Here is a much cleaner example of pseudocode which also
returns the last node in a linked list:
// A function which takes a linked list as input, and returns the // last(tail) node as output. Assumes that L.first() != NULL. Node Last(LinkedList L) { temp = L.first(); while(temp.next() != NULL) { temp = temp.next(); } return temp; }
This algorithm takes O(n) running time, where n is the length of the list, since it traverses the entire list in order to reach the end.
This pseudocode is much easier to read, in great part because it
is simpler. The only main variable used is temp
, and the
only function used repeatedly is next()
. The coder has
taken the liberty of assuming that the function will not receive an
empty list, and that's acceptable. The point of pseudocode is to
demonstrate a good overall knowledge of the algorithm in question, and
not to demonstrate that one has dealt with every possible case. Of
course, there is a balance to these things, and sometimes you might be
uncertain if you should deal with a particular situation that could
arise. When in doubt, do deal with it. If it's unimportant, we
pseudocode interpreters will know, and pass over it. If it is
important, though, we will be glad that you covered it.
Pointers are a very hard thing to figure out, but they form the backbone of most important data structures like linked lists, queues, binary search trees, and hash tables. It is vitally important that you as a coder understand how they work. If you aren't sure of some use of them, come talk to the course professor or one of the GTFs. If you don't feel confident about your ability to use pointers in your code, how can you write good pseudocode? It will inevitably turn out a little confused, as if you aren't certain in which ways the data structure or temporary variables can be manipulated.
In many cases, it is important to understand how much time or space an algorithm in pseudocode takes in order to run. Whenever you are asked to provide a linear-time algorithm, or one that runs in O(log n) space, you should automatically provide analysis with that algorithm. Preferably that analysis would include a proof of correctness, but for CIS 313 we are mainly concerned with the running time of an algorithm, and in some cases the space used in execution. Sometimes it may not take much to justify running time("This loop takes O(n) time because it iterates from 1 to n," for example), but sometimes it will--quicksort is an example of an algorithm whose worst-case running time is O(n^2), but whose best-case and average-case running time is O(nlogn). A justification for the latter running time is by no means trivial.
The Fibonacci sequence is defined intuitively as a sequence of numbers such that any number in the sequence is the sum of the previous two numbers. The first two Fibonacci numbers are 1 and 1, which completes the definition. From this, we know that the first ten Fibonacci numbers are:
1, 1, 2, 3, 5, 8, 13, 21, 34, 55.
One common toy problem in computer science is to write a function which calculates an arbitrary Fibonacci number. (This may not seem to have any real-world application, but at the two companies at which I was employed full-time, some variation of this question was asked in most interviews. I was asked this question in one interview. This is probably useful code to understand.) Below are some algorithms, in wildly differing pseudocode, which compute the n-th Fibonacci number recursively.
int Fibonacci(int n) if(n <= 2) return 1 else return Fibonacci(n - 1) + Fibonacci(n - 2)
And, for a taste of something different, here is pseudocode written almost entirely in English:
The Fibonacci function I will describe computes the n-th Fibonacci number recursively. It takes a positive integer, n, as input. If n = 1 or n = 2, it returns 1. If n is greater than 2, it calls itself twice, once with the argument n-1, and once with the argument n-2. It adds the result of those two function calls, and returns that value.
The first piece of pseudocode should be pretty straightforward. The second piece, however, is about as close as pseudocode can probably get to English. It is also probably a good argument for writing pseudocode which is closer to a programming language than a natural language in its grammar--it's fairly long-winded. But still, it manages to capture the essence of a recursive Fibonacci function, and is perfectly acceptable for a pseudocode compiler.
I will now practice what I have preached, and analyze this pseudocode. Since each of these functions are recursive, their running time is not very good. For each call with n greater than 2, two calls are made recursively. Therefore, whenever n grows by 1, the number of recursive calls doubles(approximately). So the running time of these algorithms is O(2^n). That's not very fast. But most of these function calls compute the same number. Intuitively, it seems like it should be possible to write pseudocode which takes O(n) time, since there are O(n) calculations really needed to compute the n-th Fibonacci number. Here is some pseudocode which does so in linear time:
Function Iterated_Fibonacci: Input(s): n Output(s): n-th Fibonacci number previous = 1 current = 1 next = 0 // The loop starts at 3 because if n is less than three, the loop // won't execute and we will return current = 1. That's fine. for i = 3 to n : next <- previous + current // compute the new value previous <- current // move up the two previous numbers in the sequence, current <- next // so that if need be we can compute the next one. return current
Again, analyzing this code, the loop iterates O(n) times, and everything outside the loop takes O(1) time. So if we can figure out how long each iteration of the loop takes, we can get the running time of the program. But within the loop, three operations which each take constant time are executed. So the loop's total running time is O(n)*O(1), which is O(n).
In creating these pieces of pseudocode, I have intentionally tried to make each one different from the others. Within each function, the syntax is fairly consistent, but between functions it differs, often significantly. I made sure of this in order to demonstrate that good pseudocode can be written in pretty much any style you prefer. As a pseudocode interpreter, I don't really care what that style is, so long as I can make sense of it. I hope this guide has helped you, the pseudocoder, to understand how to write code that I can easily make sense of.
The present maintainer of this page is dhofer@cs. Please send him any comments you may have.