Assignment 3

Due Tuesday June 8th Wednesday June 29th at 5pm

Write a program that, when given a corpus of text in a particular language, will generate a new random text that "looks like" the previous language using conditional probabilities (Markov chains), and whatever other heuristics you can think of. The following is what my program output when given the complete text of Portrait of the Artist as a Young Man by James Joyce:

Bury me now dancing along the wreckage of students.

Aubrey carried it is said the first phase of these words drove Adam ate his eyes were bare.

Leitrim's coachman, yes, said Lynch, imagined.

Aristotle's poetics and fire destroys at the image started forth its own equivocal position did neither: and, having in prayer to hear the chapel, some niche in the park trees and a conviction, that he doesn't interest and glories, bidding him life of his perishing joy and generous towards the sides reasonably. He crept about a nimble and he thought we likely to cry. Evening had turned to begin again.

Casey.

Mick Lacy that call, his under the white fattish hands or a most uncomfortable positions, suffered the flesh and repent indeed: and Stephen to stories about himself with rheum. The poison in the threshold of the grave, a side with what are wrong. To live, to speak again. A glow of habit of saint James Joyce Chapter 5 He repeated slowly and made me and out of spitting. --PHTH! says that his life and every itch and when his mother's face. --Warm weather for I couldn't say about the image brought into confidence. During all those sins.

A similar idea has been tried in Emacs via Dissociated Press, and by JWZ with his program Dadadodo. That second link also contains lots of fun ideas that could totally push your program into extra credit territory if implemented.

By default, your program should output at least 200 words of gibberish and then halt at the next sentence break. That default should be adjustable by the command line flags -w INT or --wordcount=INT, where INT is some integer less than the number of words in the source corpus, which will come in standard in.

Remember that the easiest way of parsing command line options is with the optparse module, and that if you do that, you can get --help and -h for free.

Boilerplate

Ask me if there are any questions, and remember that elegance counts! peter@cs.uoregon.edu, or simply commenting below will all reach me immediately. Also recommended is coming in to office hours if you have any questions.

Turn It In

Turn your project in using the following form:

What is your student #?
What is your name?
What is your email?
What file would you like to submit?
If you have more than one file in your project, please use an archiving tool to put them all into one archive file. Acceptable archive types are .zip, .jar, .tar, and .tgz. But really, you are just giving me one file that contains three functions, so just put it all in one file.

Please make sure that you provide any README files in plain text. MS Word .doc files are not acceptable. .html, .txt, .ps, .dvi, and .pdf are all acceptable. Really you should just be turning in ASCII text (.txt) files and source code. And for most assignments, just well documented source code.

Comments and Clarifications


Questions? Answers!
Valid CSS! Valid XHTML 1.1!