**Project 2: ciphers**
**Reed CSCI 121 Spring 2025**
*submit by 3/21 at 11:59pm*
Since ancient times humans have felt the need to hide their
communications from others. Written messages had to somehow be
encrypted so that only their intended recipient could read them. (This
was most important for military communications, but could also be used
to protect business, religious, or personal secrets.) These encryption
schemes, or ciphers, were usually based on replacing letters of the
message in some particular way (or, less often, on reordering the
letters). In this project we will write code that can encrypt or
decrypt a message with three different schemes.
All the ciphers we'll be looking at in this project are part of
classical cryptography. While we call it "classical," it covers almost
all of human history and culminates in the very elaborate encryption
machines used during World War II. Cryptography holds a special place
in the history of computing as the need to break Nazi codes motivated
the creation by the British of the first roughly-modern
computers. Computers changed cryptography forever. What was impossible
to break without them became trivial to break with them. But while
computers rendered all previous cryptography obsolete, the theorists
studying the fundamental math behind computers also came up with new,
much more secure cryptography. Modern cryptography now gives us ways
not just to send secret messages but also to carry out a variety of
other computational tasks even when an adversary is trying to get in
the way, and that is what you would study if you take a cryptography
course today.
For this project we will stick with classical cryptography, and as
such we will not just write code to encrypt and decrypt messages as
intended, but also code that an adversary could use to break the
privacy of the encryption. The project has six parts, and each will
have you write several functions to either create or break a
particular encryption scheme. You will write a lot of functions in
this project, but many of them will be very simple. We break up the
code this way because it is good coding practice and makes writing the
code easier.
(##) Terminology
Any encryption scheme consists of two algorithms. The first is the
encryption algorithm. It takes two inputs, the *plaintext*, which is the
readable text we want to encrypt, and the key, which is some piece of
secret information that we need to do the encryption. The output of
the encryption algorithm is called the *ciphertext*, and it is the
encrypted message, supposedly unreadable by an adversary. The second
algorithm is the decryption algorithm. It takes as input a ciphertext
and a key. If the key is the same key that was used to create the
ciphertext, then the decryption algorithm should output the original
plaintext.
We will also write algorithms for *breaking* a given
encryption system. A break (or *attack*) is an algorithm that can
recover the plaintext from a ciphertext *without* knowledge of the
secret key. (In practice, attacks that can learn something about the
plaintext, even if they can’t fully recover it, are worrying, but our
attacks in this project will all fully recover the plaintext.)
(##) Starting code
You can download the code and supporting files for this project
using this linki:
* [starting code folder](project2.zip)
You will write almost all the code in this project yourself, but we've
given you a couple basic things to start with. First, we've defined a
variable `alpha` at the top of the code, equal to a string that has the
English alphabet in order in all capital letters. This will be
convenient when you want to know the letter at a given position of the
alphabet, the position of a given letter, or whether some character is
a letter.
We have also provided two basic functions for reading and writing
files. Reading and writing files is best done with more elaborate
techniques than these, and we will talk about it later in the
course. For now, though, these functions will suffice. We will think
of our messages (either plaintext or ciphertext) as being single
strings. Encryption only really makes sense if you can take a message
and send it to another person, and storing our encryptions in a text
file will make that possible. (It will also let you use encryptions we
have already prepared and included as part of the project.) The function
`file_to_string` returns a string equal to the contents of a given file.
The function `string_to_file` writes a given string to a specified file. (Files
will always be in the same directory as the program you are
running. If you write to a file that already exists, it will erase
that file and replace it with a new one.)
Finally, we have imported the `random` module, since later on you will
need to use a function or two from that.
(##) Working with strings.
You're code will operate on strings of various forms, the plaintexts that are
being encrypted, the ciphertexts that are being deccrypted. We'll also be
analyzing texts, looking at the letters used with a text, making dictionaries
that store information about letters and sequences of letters, and so forth.
To do all this you'll want to know some of the operations we haven't yet
carefully discussed for Python strings. We now know how to work with lists
in Python, used to represent sequences of data. Strings too are a kind of
sequence in Python, and it turns out that many of the operations we just learned
about lists can be applied to strings too:
~~~
>>> s = "HAPPY BIRTHDAY!"
>>> s[1]
'A'
>>> s[1:7]
'APPY B'
>>> 'A' in 'APPLE'
True
>>> 'Z' in 'APPLE'
False
>>> for x in "HAPPY":
... print(x)
...
H
A
P
P
Y
>>> len(s)
15
~~~
Letters are represented as strings of length 1, often called *characters*.
You can compare characters. For example, this expression tells me whether a
character `x` is an upper case letter of the English alphabet:
~~~
('A' <= x) and (x <= 'Z')
~~~
It will be `True` if `x` is one of 'A' through 'Z', and `False` for other characters that might occur in a string.
Some of the characters that can occur in Python strings have a number associated
with them, and this is indeed true for the letters of the English alphabet.
The function `ord` computes that number. And the function `chr` does the
opposite, giving the character associated with that number.
~~~
>>> ord('A')
65
>>> chr(64+26)
'Z'
~~~
Check the Python documentation, or with us, if there is some string operation
you feel you might need to use to complete the code for this project.
(#) A Caesar cipher
Perhaps the most basic cipher is the Caesar cipher, also called a
shift cipher. It is known to have been used by Julius Caesar to
encrypt military communications. In a Caesar cipher, encryption works
by shifting each letter forward in the alphabet a certain amount,
wrapping around to the beginning if that shift would pass the end of
the alphabet. The amount of the shift is the key, and we will
represent it with a letter. We'll zero-order the letters, so `A`
represents a shift of 0, `B` a shift of 1, `C` a shift of 2, and so
on. For example, if one were to encrypt the message `"HAPPYBIRTHDAY"`
with a key of `D` (meaning a shift of 3), we would get the following:
`H A P P Y B I R T H D A Y` $\ \ \ \ \ \ \Leftarrow$ plaintext
`K D S S B E L U W K G D B` $\ \ \ \ \ \ \Leftarrow$ ciphertext
Note that the `Y` shifts 3 but wraps around, going to `Z`, then `A`, then finally `B`.
For simplicity, our messages will always consist of only capital letters.
(Generally, if you remove punctuation and capitalize all letters,
it's typically easy for people to recognize a message reasonably easily.)
(##) write `simplify_string`
Because our ciphers will all use these simplified strings of only
capital letters, we can't really work with anything until we can first
simplify general text. The function `simplify_string` should take as input a string and
should output a new string that contains only the letters in the old
string (removing all punctuation, spaces, and newline symbols). Those
letters should also all be capitalized. The `upper` operation on strings
will be useful to use here.
~~~
>>> "Happy birthday!".upper()
'HAPPY BIRTHDAY!'
>>> "a".upper()
'A'
>>> simplify_string("Happy birthday!")
'HAPPYBIRTHDAY'
~~~
(##) write `num_to_let`
This is a simple (but very useful) function. It should take as input a
single number and it should return the associated (capital)
letter. Remember that we start numbering the alphabet with `A` numbered
0.
~~~
>>> num_to_let(0)
'A'
>>> num_to_let(2)
'C'
>>> num_to_let(24)
'Y'
~~~
You can use `alpha` to write this code.
Or else you can use `chr` mentioned in the section on strings.
(##) write `let_to_num`
This reverses the function above.
It takes a capital letter and returns the corresponding number.
~~~
>>> let_to_num('A')
0
>>> let_to_num('C')
2
>>> let_to_num('Y')
24
~~~
You can use `alpha` to write this code.
Or else you can use `ord` mentioned in the section on strings.
(##) write `shift_char`
This function takes as input two letters. The first is a starting
letter and the second is a letter indicating how much to shift by (so
a `D` indicates a shift by 3). The output is the result of shifting the
first letter by the amount specified by the second.
~~~
>>> shift_char('A', 'D')
'D'
>>> shift_char('C', 'F')
'H'
>>> shift_char('Y', 'D')
'B'
~~~
(##) write `caesar_enc`
This is the final encryption function.
It takes as input a plaintext (a simplified string) and a key (a single capital letter).
It returns the ciphertext, generated by shifting each letter in the plaintext forward
by the amount specified by the key.
~~~
>>> caesar_enc('HAPPYBIRTHDAY', 'D')
'KDSSBELUWKGDB'
~~~
(##) write `caesar_dec`
This reverses the encryption above. It takes a ciphertext and a key,
and shifts all letters backwards by the amount specified by the key,
so that if the key generated the ciphertext, the output is the correct
plaintext.
~~~
>>> caesar_dec('KDSSBELUWKGDB', 'D')
'HAPPYBIRTHDAY'
~~~
We have included the file `hitchhiker's.txt`,
which includes an excerpt from The Hitchhiker's Guide to the Galaxy.
Try encrypting it with your `caesar_enc` function.
Then decrypt it using `caesar_dec`.
It should match the original text.
(#) Breaking the Caesar cipher
Now that the Caesar cipher is working, it's time to break it. The
Caesar cipher is not very secure at all. You could break it by hand
easily. You could just try all 26 possible keys and see which one
gives you something that looks like proper English. That would work
with a computer too, but we'd need it to be able to recognize "proper
English," which can be done but is somewhat involved. Instead we'll do
something else.
We will use frequency analysis. English text doesn't contain all
letters equally often. The most common letter is clearly `E`, with `A`, `I`,
`O`, and `T` all very common as well. The simplest frequency-based attack
would be to simply find the most common letter and assume that it is
meant to be `E`, and try whatever key would decrypt that letter to
`E`. That will work pretty often.
Frequency-based attacks seem obvious to us now, but they actually took
a long time to develop. In fact, the Caesar cipher probably worked
well when Julius Caesar used it. (Many of his adversaries were not
even literate, and the ciphertexts might have been assumed to be an
unknown language.) The discovery of frequency analysis would have to
wait until al-Kindi in the 9th century. By the time the Russian army
used a Caesar cipher in 1915 (because their troops had difficulty with
more complicated methods), the German and Austrian cryptanalysts were
easily able to decrypt their messages. Even today, there will
occasionally be reports of organized crime or terrorist organizations
using a Caesar cipher, with minimal effectiveness.
(##) write `letter_counts`
The first thing we need to do in order to make our frequency analysis
work is the ability to count how often letters appear in a given
string. `letter_counts` takes as input a string and returns counts of
how many time each letter occurs, stored in a dictionary. The
dictionary should always have 26 keys, one for each letter. The keys
should be single-character strings of a capital letter, and the value
stored should be the number of times that letter occurred in the input
string. It's fine to assume that the input string is simplified (i.e.,
has only capital letters).
(##) write `normalize`
What we *really* want to do is compute fraction of the string
represented by each letter, *not* the raw counts. The `normalize`
function takes as input a dictionary of counts and should modify the
dictionary (not return a new one) by dividing each count by the total
of all the counts in the dictionary. (We'll use this again later with
a dictionary with different keys, so do not assume that the keys are
always the 26 letters. This function should work with any dictionary
where all the values are positive integers.)
Once this is working, you can uncomment the two lines of code below
it. We have included a file `twocities_full.txt` that contains the
entire text of *A Tale of Two Cities*. We will use this as a large
corpus of English text from which to calculate frequencies. (To truly
do this correctly, you'd want to use a corpus consisting of various
works of different kinds and by different authors, but this will be
good enough for our needs.) The variable `english_freqs` will now give
the rough frequency of each letter in true English, and we will use it
for comparison. (By computing it only once and saving it we'll make
our code faster, since we'll need this data many times and we don't
want our program to be constantly re-computing it.)
Whenever we refer to the "frequencies" of letters in a given string,
we always mean the normalized fractions, not the raw counts.
(##) write `distance`
Now, for given frequencies of letters in some text, we want to be able
to measure how "far" that distribution is from the distribution of
letters in standard English. We can compute that using the following
formula:
$$\mbox{distance} = \sum_x \frac{(\mbox{obs}_x - \mbox{exp}_x)^2}{\mbox{exp}_x}$$
Here $x$ represents a letter, and we're summing a calculation over all
possible letters. One piece of information we use in the sum is
$\mbox{obs}_x$. This is the frequency of $x$ that we observed in the
string of text. The other piece of information, $\mbox{exp}_x$, is
the frequency of $x$ that we expect in standard English.
The `distance` function computes the above distance formula.
It takes two inputs.
The first is a set of frequencies observed in some particular piece of text.
The the second is the known frequencies for standard English.
It should output a positive floating point number.
(##) write `break_caesar`
This function should attack a string encrypted with a Caesar cipher
and return the original plaintext. It takes two inputs, the
ciphertext and a dictionary of frequencies in standard English. Its
output should be a list of two elements. The first is a
single-character string equal to the key used to encrypt and decrypt
the ciphertext. The second is the recovered plaintext.
To do this, you should simply try each possible key. For each one,
decrypt the ciphertext to get a possible plaintext. Then measure the
distance between letter frequencies in that plaintext and letter
frequencies in standard English. Return whatever key gives you the
smallest distance value and the plaintext that goes with it.
~~~
>>> break_caesar("KDSSBELUWKGDB", english_freqs)
["D", "HAPPYBIRTHDAY"]
~~~
(This test won't work because the message is too short for frequency
analysis to be reliable, but it should give you an indication of the
specifications.)
You can return to the text from `hitchhiker's.txt`, which you
encrypted before. Now try breaking your ciphertext without the key,
using `break_caesar`. You should be able to get the same result. We have
also included a file, `mystery1.txt`, which is already encrypted with a
Caesar cipher. You should now be able to break that encryption.
(#) The Vigenere cipher
Caesar ciphers can be generalized to a class called substitution ciphers. We will get back to those in Part 5.
But for hundreds of years after al-Kindi first described frequency analysis, all ciphers in use were vulnerable to it.
The effort to frustrate frequency analysis eventually led to the Vigenère cipher.
The Vigenere was reinvented many times, the first time by Giovan Battista Bellaso in 1553.
(It was later misattributed to Blaise de Vigenere, who invented a different but similar cipher around the same time.)
It uses a word as its key, rather than a single letter.
Each letter of the plaintext is shifted, just as in a Caesar cipher, but the same shift is not used
every time.
Instead, each letter of the key specifies a shift, and they are used in order and cycled.
This can be visualized by writing the key, repeating, underneath the plaintext.
For example, let's imagine again that we encrypt `HAPPYBIRTHDAY`, this time with the key `BAD`.
(In practice, a longer word would probably be used.)
This results in the following computation:
`H A P P Y B I R T H D A Y` $\ \ \ \ \ \ \Leftarrow$ plaintext
`B A D B A D B A D B A D B` $\ \ \ \ \ \ \Leftarrow$ key
`I A S Q Y E J R W I D D Z` $\ \ \ \ \ \ \Leftarrow$ ciphertext
In our previous example of a Caesar cipher, we used the key of `D`. That
would send `E` to `H`, making `H` likely to be the most common letter in a
long ciphertext. But here an `E` is sometime sent to `H`, but also
sometimes to `E` and sometimes to `F`. This balances out the frequency of
each letter in the resulting message, making straightforward frequency
analysis not work. (It's still breakable, but before we get to that we
need to get code to actually encrypt and decrypt using a Vigenere
cipher.)
(##) write `vigenere_enc`
Write a function that computers the encryption algorithm for the
Vigenere cipher. It should take two inputs, the first being a
(simplified) plaintext, and the second being they key, represented as
a string of capital letters. The output should be the correct
ciphertext encrypting that plaintext under that key.
~~~
>>> vigenere_enc('HAPPYBIRTHDAY', 'BAD')
'IASQYEJRWIDDZ'
~~~
(##) Write `vigenere_dec`
This is the decryption function that reverses the encryption above. It
takes two inputs, a ciphertext and a key, and it returns the original
plaintext.
~~~
>>> vigenere_dec("IASQYEJRWIDDZ", "BAD")
"HAPPYBIRTHDAY"
~~~
(#) Breaking the Vigenere cipher
While cryptanalysts occasionally deciphered particular messages, no
general attack on the Vigenere cipher was published for hundreds of
years. The first published attack was by Kasiski in 1863, though
Charles Babbage is known to have also developed an attack as early as
1854. In the American Civil War, the Confederacy regularly used
Vigenere ciphers to send messages, but the Union regularly broke their
encryption. This didn't stop Scientific American from declaring the
cipher "impossible of translation" in 1917, and in general the
Vigenere cipher has a reputation that greatly exaggerates its
security.
(##) write `split_string`
This is a simple utility function. The Vigenere cipher essentially
applies several different Caesar ciphers to a plaintext, but applies
each one to only one subset of the letters. The function `split_string` divides the
plaintext into those relevant parts. The function takes two inputs, a
string and an integer. The integer can be thought of as the length of
the cipher's key, and it specifies the number of parts that the string
should be divided into. The output should be a list of that many
strings, each equal to the letters that would be encrypted using a
particular letter of the key.
~~~
>>> split_string("HAPPYBIRTHDAY", 3)
['HPIHY', 'AYRD', 'PBTA']
~~~
Note that the first string in the output contains all the letters that
would be encrypted using the first letter of the key (assuming the key
is 3 letters long). The second element of the list is a string of the
letters that would be encrypted under the second letter of the key,
and so forth.
(##) write `vigenere_break_for_length`
This function takes three inputs, the ciphertext, the length of the
key, and a dictionary of frequencies in standard English. It should
break the cipher and return a list of two elements, the key and the
plaintext. It is not a general break because it assumes we have
somehow learned the length of the key, but it is a step in the right
direction.
To break the ciphertext, you should simply use the same frequency
analysis we used to break a Caesar cipher. After all, the parts that
`split_string` will return are each a set of letters from standard
English encrypted under a Caesar cipher. Carry out that attack
separately on each part and then combine the results.
~~~
>>> vigenere_break_for_length("KDSSBELUWKGDB", 3, english_freqs)
['BAD', 'HAPPYBIRTHDAY']
~~~
(This test won't work because the message is too short for frequency
analysis to be reliable, but it should give you an indication of the
specifications.)
(##) write `vigenere_break`
This function takes three inputs, the ciphertext, a maximum length of
the key, and a dictionary of frequencies in standard English. It
should break the encryption and return the key and plaintext, as long
as the real key is of length less than or equal to the maximum length
given. To do this, simply use the attack above assuming each possible
length of key. Then for each resulting plaintext, compare the letter
frequencies to those of standard English. Whichever is closest you can
assume is the correct key length. If multiple key lengths result in
the same distance to standard English frequencies, return the shorter
key.
~~~
>>> vigenere_break(“KDSSBELUWKGDB”, 10, english_freqs)
['BAD', 'HAPPYBIRTHDAY']
~~~
(This test won't work because the message is too short for frequency
analysis to be reliable, but it should give you an indication of the
specifications.)
Again, you can use the text in `hitchhiker's.txt` to test your
encryption and decryption functions, and you can break the encrypted
version to test your breaking functions. We have also included
`mystery2.txt`, which contains a ciphertext encrypted under a Vigenere
cipher. You should now be able to break it.
(#) the substitution cipher
We're going to take a step backward historically and move to
substitution ciphers, a generalization of Caesar ciphers. (We're going
out of order because substitution ciphers are going to require some
different techniques.) A substitution cipher, like a Caesar cipher,
replaces each letter in the plaintext with a particular letter in the
ciphertext. However, the choices of which letter is replaced by what
are independent, rather than all letters being shifted by a certain
amount. A key specifies a particular replacement for each letter. For
example, say we have the following key (written under the alphabet for
convenience):
`A B C D E F G H I J K L M N O P Q R S T U V W X Y Z`
`J Z M S K I R L P W V N U E X D T C G Y F Q B H O A`$\ \ \ \ \ \ \Leftarrow$ key
According to this key, every instance of `A` is replaced by a `J`, every
instance of `B` replaced by a `Z`, and so forth. So if we again encrypt
`HAPPYBIRTHDAY`, we get the following result:
`H A P P Y B I R T H D A Y` $\ \ \ \ \ \ \Leftarrow$ plaintext
`L J D D O Z P C Y L S J O` $\ \ \ \ \ \ \Leftarrow$ ciphertext
This means there are a lot more possible keys ($26!$, or about $2^88$), and it is infeasible to try them all.
(##) write `sub_gen_key`
This is a function that generates a random key for your cipher,
something like the key above. It takes no inputs and its output should
be a string of the 26 capital letters in the alphabet, arranged in a
random order. All orders should be equally likely.
(##) write `sub_enc`
This is the encryption function. It takes two inputs, a simplified
string and a key, and it outputs a valid ciphertext. They key should
be interpreted as above, meaning that the first letter is the
ciphertext's replacement for `A`, the second is the replacement for `B`,
and so forth.
~~~
>>> sub_enc("HAPPYBIRTHDAY", "JZMSKIRLPWVNUEXDTCGYFQBHOA")
'LJDDOZPCYLSJO'
~~~
(##) write `sub_dec`
This is the decryption function that reverses the encryption above. It takes two inputs, a ciphertext and a key, and it returns the original plaintext.
>>> sub_enc("LJDDOZPCYLSJO", "JZMSKIRLPWVNUEXDTCGYFQBHOA")
'HAPPYBIRTHDAY'
(#) Breaking the substitution cipher
Finally, we want to break the substitution cipher. This isn't that
hard-- people do these by hand all the time as "cryptogram" puzzles in
newspapers. The problem is that doing it well requires some knowledge
of English. If all you know is letter frequencies, you can get a rough
guess. The most common letter in a large piece of text is probably
`E`. The next most common are probably `A`, `I`, `O`, and `T`. But `A`, `I`, `O`, and
`T` have very similar frequencies, so you can't rely on frequency alone
to distinguish them (unless your text is extremely long). Humans look
at things like which letters appear twice in a row-- lots of words
have `OO` or `TT` in them, but very few have `AA` or `II`. They also recognize
particular words or other common patterns like `ING` or `ED`
endings. We're going to need our program to do something like this.
(##) write `count_trigrams`
It turns out that we don't need to consider full words, but just small
patterns that appear. We'll look at trigrams, series of three
letters. Some trigrams like `THE` and `ING` are very common in English,
whereas others like `QAZ` are very rare. We will use these frequencies
to determine what keys are reasonable candidates.
This function should take as input a string and output a dictionary of
trigrams. Each key should be a trigram, something like `"THE"`, and the
corresponding value should be the number of times it appears in the
string. It is okay if trigrams that don't appear in the string at all
also don't appear in the resulting dictionary.
Once this is done, you can uncomment the lines of code that compute
`english_trigrams`. This uses the above function to count the number of
times each trigram appears in a big piece of text. It then normalizes
the dictionary so that for each trigram we have a rough estimate of
what fraction of trigrams in English have that value. (This is why
`normalize` had to work for dictionaries other than just those with the
26 letters as keys.)
Given a string, we want to compute a score for that string equal to
the "probability" that the trigrams in that string would be the
trigrams in a random string. (This isn't quite right mathematically
because it ignores that trigrams overlap, but it'll be good enough.)
If the trigrams in a string have frequencies $t_1, t_2, \ldots, t_n$,
then the score would be
$$\mbox{score} = t_1 \times t_2 \times \cdots \times t_n.$$
The problem is that this is the product of lots of very low numbers,
so we will have lots of floating point errors, and those errors will
be big enough to matter. So instead, we take the logarithm (base $e$) of
both sides. This changes the multiplications of tiny numbers to
additions of reasonable numbers.
$$\ln (\mbox{score}) = \ln(t_1) + \ln(t_2) + \cdots + \ln(t_n).$$
We can then compare the log-scores instead of the original
scores. (They aren't the same, but the comparison between the scores
of two strings will still result in the same decision on which string
has the better score.)
In order to do this computation, we're going to just always use the
logs of frequencies instead of the frequencies, so we need to write
`map_log`, which takes as input a dictionary of frequencies and replaces
every value with the natural log of that value.
You can then uncomment the line that applies this function to our
dictionary of trigram frequencies.
If everything is working correctly, you'll find that
`english_trigrams["THE"]` has a value of roughly `-3.8913`, while
`english_triagrams["ARH"]` has a value of roughly `-9.9034`. This shows
that the trigram `THE` is much more common in English than the trigram
`ARH`.
(##) write `trigram_score`
Given a string, we want to compute the above (log-)score for that
string. This is simply the sum of the log-frequency (in standard
English) of each trigram in the string. (See the formula above.) Low
scores indicate that a string has more unusual trigrams in it. This
function should take two inputs, a string and a dictionary of trigram
log-frequencies, and it should return the score of that string. If a
trigram isn't present in the dictionary (meaning that it never
appeared in our large sample of English text), then count its
log-frequency as `-15`. (A frequency of 0 would result in a
log-frequency of negative infinity, but it's not impossible that the
decryption contains a trigram not in our English text. It's just very
unlikely.)
You should find that `trigram_score("HAPPYBIRTHDAY", english_trigrams)`
returns roughly `-100.829`.
(##) write `sub_break`
This is the function that finally breaks a substitution cipher, taking
as input a ciphertext and a dictionary of trigram log-frequencies. It
works a little differently from what we did before. The basic
algorithm is as follows:
* Pick a random key as a starting guess. Decrypt the ciphertext with that key and compute the trigram score of that decryption.
* Pick two random letters in the key and swap them. (All choices of two letters should be equally likely.) This gives a new candidate key.
* Decrypt the ciphertext with the candidate key. If that decryption has a higher score than the decryption under the current key guess, replace the current key guess with the candidate key.
* Keep trying new swaps. These will often not be improvements, but sometimes they will be and your guessed key will get closer to the real key.
* If at some point you try 1000 random swaps without finding one that improves the score of the decryption, then assume you are done. Return your current key guess and the corresponding decryption.
You should implement the algorithm above.
As always, you can use an encryption of `hitchhiker's.txt`
to check that your encryption, decryption, and breaking functions work.
When you are done you should be able to decrypt `mystery3.txt`,
which has been encrypted using a substitution cipher.