Problem 5.2

CIS 210 Home Page Last updated 2007/10/11 20:57:55

Letter Frequency Analysis

Simple substitution ciphers like the Caesar cipher (which shifts each letter by a fixed amount) can be cracked using frequency analysis. That is, we count the number of occurrences of each encrypted letter, and if the message is a large enough, we can make good guesses about which letters were substituted for which letters. The first step in frequency analysis is to determine relative frequencies of elements in a body of text. LetterFrequency is a Java class for performing frequency analysis of alphabetic characters. The driver class FrequencyGraph uses this class to count letter frequencies in an input file and prints a histogram of the frequencies.

Much of the code for LetterFrequency has already been provided to you in LetterFrequency.java. To complete it you will need to fill in a few pieces of code, primarily loops that run through the arrays. To test your code, you will also need to download FrequencyGraph.java and compile it, but you will not need to make any changes to this driver. (However, if you are interested in seeing how to read from an input file you can find that in the driver code.)

Testing your code

When you have successfully completed LetterFrequency, you should be able to run the driver and obtain output similar to this:
$ java FrequencyGraph Somefile.txt
Most frequent character 'e' occurs 113 times
[a] (0080=08%) ********
[b] (0031=03%) ***
[c] (0029=02%) **
[d] (0018=01%) *
[e] (0113=11%) ***********
[f] (0011=01%) *
[g] (0028=02%) **
[h] (0037=03%) ***
[i] (0067=06%) ******
[j] (0010=01%) *
[k] (0014=01%) *
[l] (0055=05%) *****
[m] (0032=03%) ***
[n] (0061=06%) ******
[o] (0032=03%) ***
[p] (0040=04%) ****
[r] (0083=08%) ********
[s] (0062=06%) ******
[t] (0088=08%) ********
[u] (0048=04%) ****
[w] (0018=01%) *
[y] (0020=02%) **

Notice that the frequency counts are not printed for characters that appear in less than one percent of the text.

If you run your completed LetterFrequency code with this text file: [PrimitiveStorage.txt], you should obtain this output: [counts.txt].

Notes

Note that there are two methods named countLetter in LetterFrequency: one private and taking a single character, one public and taking a String. When you implement the String version, use the private single character version.

When finding the most frequently occurring letter, it is possible that more than one letter qualifies as most frequent. In this case, the first letter in the counted character range should be regarded as the most frequent.

Turn in

Turn in LetterFrequency.java.
(Do not turn in FrequencyGraph.java since you will not have made any changes to it.)