Due Tuesday June 8th Wednesday June 29th at 5pm
Write a program that, when given a corpus of text in a particular language, will generate a new random text that "looks like" the previous language using conditional probabilities (Markov chains), and whatever other heuristics you can think of. The following is what my program output when given the complete text of Portrait of the Artist as a Young Man by James Joyce:
Bury me now dancing along the wreckage of students.
Aubrey carried it is said the first phase of these words drove Adam ate his eyes were bare.
Leitrim's coachman, yes, said Lynch, imagined.
Aristotle's poetics and fire destroys at the image started forth its own equivocal position did neither: and, having in prayer to hear the chapel, some niche in the park trees and a conviction, that he doesn't interest and glories, bidding him life of his perishing joy and generous towards the sides reasonably. He crept about a nimble and he thought we likely to cry. Evening had turned to begin again.
Casey.
Mick Lacy that call, his under the white fattish hands or a most uncomfortable positions, suffered the flesh and repent indeed: and Stephen to stories about himself with rheum. The poison in the threshold of the grave, a side with what are wrong. To live, to speak again. A glow of habit of saint James Joyce Chapter 5 He repeated slowly and made me and out of spitting. --PHTH! says that his life and every itch and when his mother's face. --Warm weather for I couldn't say about the image brought into confidence. During all those sins.
A similar idea has been tried in Emacs via Dissociated Press, and by JWZ with his program Dadadodo. That second link also contains lots of fun ideas that could totally push your program into extra credit territory if implemented.
By default, your program should output at least 200 words of gibberish and then halt at the next sentence break. That default should be adjustable by the command line flags -w INT or --wordcount=INT, where INT is some integer less than the number of words in the source corpus, which will come in standard in.
Remember that the easiest way of parsing command line options is with the optparse module, and that if you do that, you can get --help and -h for free.
Ask me if there are any questions, and remember that elegance counts! peter@cs.uoregon.edu, or simply commenting below will all reach me immediately. Also recommended is coming in to office hours if you have any questions.
Turn your project in using the following form:
Questions? Answers!