a4

Assignment 4

Due 5pm Wednesday July 6

Your assignment is to write your own WikiWikiWeb engine. I would like to see it up and working at some URL that you send to me. I would also like you to turn in all your code.

A WikiWikiWeb is a collaborative hypertext database that allows users to add content using a very simple markup (no html) and to link from page to page solely by using words that are in CamelCase. The original WikiWikiWeb was dedicated to Design Patterns and Software Methodology, but later wikis have been bold enough to try and encompass the sum total of human knowledge. Anyone can edit most wikis, although Wikipedia also supports logins. The content contained in their collective databases generally grows quite quickly because it is so simple to add stuff. To learn more about WikiWikiWebs in general and the first one in specific, check out NewUserPages on the granddaddy of wikis. I have used wikis a lot for various projects, we used one in the department to collect information before submitting a grant proposal, and there is one on my home computer that I made for random thoughts and projects.

Implementing a basic WikiWikiWeb should not be too hard. Remember to guard everything against malicious input. Setting maxlength in the HTML code is not enough - those limits are merely hints to the browser, and are not actually enforced by the protocol. The requirements for the WikiWikiWeb that you will submit to me are:

A logo for your wiki that is displayed on every viewed page that I can click on and it will take me to the main page.

An edit text link on the bottom of each page that actually works

New pages can be created by following links to pages that do not yet exist.

Words in CamelCase become links to other pages in the wiki - the regular expression to detect a word in CamelCase is \b[A-Z][a-z]+([A-Z][a-z]+)+\b

newlines become <br /> tags

HTML markup is not allowed - all <, >, and & characters should be translated into <, >, and &, respectively (this requirement can be relaxed if you process the incoming text to avoid the dangers of cross-site scripting and malformed user input)

When I click on the title of a wiki page, it should find all the pages that link to that page. This is called a Back Link. To see it in action, simply click on the title of any page on the original wiki

Extra credit can be garnered by implementing features beyond this required set, including:

When you link to a non-existent page the link looks different.

Full text search

Title search

A RecentChanges page

Version control

Persistent usernames via cookies

More of the TextFormattingRules.

Page deletion

Converting any strings that look like URLs into hyperlinks to said URL

An SQL database backend instead of a filesystem backend

Any other feature you can think of...

Implementing a wiki becomes a lot easier when you note that WikiNames are also valid (and non-threatening) filenames. So if you store the text of each page in a a file named for its WikiName, then you can make a directory containing a bunch of files with wierd WikiNames as their filenames, and then you will know from the filenames what pages exist and what pages do not. To find BackLinks, you simply verify that the given string is a WikiName and then execute the command grep '\bNameBeingSearched\b' * (possibly ggrep if you are on Solaris) in the repository directory. and print out a list of all the files that were found.

The are a few ways of doing all this. You could write 3 cgi scripts: index.cgi for viewing content in edit mode or in view mode, edit.cgi for saving pages, and links.cgi for searching backlinks. These files will necessarily have a lot of code in common because they are accomplishing similar tasks, so creating a shared library that each file imports will probably save you a lot of typing.

Another approach is to put everything in one file, and have switches based on whether you are recieving a POST or a GET, and whether there is a particular type of operation you would like to engage in on the given page. Neither approach is innately superior, so do whichever one seems most intuitive to you. Comment your code well, and guard against malicious input.

I implemented my wiki using the "one big file approach", and it was a little under 200 lines - making this a pretty major project. You can see the results at http://www.cs.uoregon.edu/~peter/399/wiki.

Turn It In

Turn your project in using the following form:

If you have more than one file in your project, please use an archiving tool to put them all into one archive file. Acceptable archive types are .zip, .jar, .tar, and .tgz.

Please make sure that you provide any README files in plain text. MS Word .doc files are not acceptable. .html, .txt, .ps, .dvi, and .pdf are all acceptable. Really you should just be turning in ASCII text (.txt) files and source code. And for most assignments, just well documented source code.

Assignment 4

Boilerplate

Turn It In

Comments and Clarifications