From Atbash to Enigma

2500 years of cryptography in a few lines of Haskell, from ancient ciphers to the infamous Enigma machine:

Ringstellung: Grundstellung:

Input:

Output:

Our code can be run by downloading the source of this page:

$ wget https://crypto.stanford.edu/~blynn/haskell/enigma.lhs
$ ghci enigma.lhs

Preliminary administrivia:

module Enigma where
import Data.Bool
import Data.Char
import Data.List
import Data.Maybe

Writing Wrong

The substitution cipher is one of the oldest encyption schemes. The idea is so simple that I inadvertently invented it myself long ago: my handwriting is so bad that readers often confuse some of my letters for others! Substitution ciphers just do this deliberately.

We work with the uppercase Latin alphabet. Our code leaves other characters untouched.

abc = ['A'..'Z']

A substitution cipher is defined by a permutation \(\pi\) of the alphabet. The encryption of a letter \(x\) is \(\pi(x)\). We represent permutations with words, that is, we write the permutation \(\pi\) as \(p_A p_B …​ p_Z\) where \(p_x = \pi(x)\).

Given a permutation p, we can write functions to encrypt and decrypt with the corresponding substitution cipher as follows:

sub   p x = fromMaybe x $ lookup x $ zip abc p
unsub p x = fromMaybe x $ lookup x $ zip p abc

The keyword cipher is a folk method for constructing a permutation. We take a keyword and append the alphabet, removing any duplicate letters. To guard against bad inputs we filter so that only uppercase letters are kept:

instaperm k = nub $ filter isUpper k ++ abc

prop_permExample = instaperm "SWORDFISH" == "SWORDFIHABCEGJKLMNPQTUVXYZ"
prop_keywordExample =
  (sub (instaperm "SWORDFISH") <$> "SCOLD HIRE") == "POKER HAND"

prop_unsubUndoesSub k =
  (unsub (instaperm k) . sub (instaperm k) <$> abc) == abc

The Atbash cipher for the Latin alphabet is the substitution cipher using the permutation whose word representation is the reverse of the alphabet:

atbash = sub $ reverse abc

prop_atbashExample = map atbash "SLIM GIRL" == "HORN TRIO"

We chose our examples carefully. Few English words encrypt to other English words. [Exercise: write code to find them all from a given dictionary.]

A Caesar cipher or Caesar shift is a substitution cipher where each letter is replaced by the letter that is a fixed number of positions ahead, wrapping around the alphabet if necessary. For example, if A maps to D, then B maps to E, C to F, and so on.

We represent a Caesar shift by the letter to which A is mapped. This letter is the secret key.

shift   k = sub   $ dropWhile (/= k) $ cycle abc
unshift k = unsub $ dropWhile (/= k) $ cycle abc

prop_rot13Example = (shift 'N' <$> "ABJURER") == "NOWHERE"

Shifting by one letter comes in handy, so we give this special case a short name:

bump = shift 'B'

prop_unshiftAtbash = and
  [unshift k c == bump (shift (atbash k) c) | k <- abc, c <- abc]
prop_2001 = map bump "HAL" == "IBM"

Polyalphabetic Substitution

The Vigenère cipher is a repeated sequence of Caesar shifts. We repeat a given key \(k_1…​k_n\) until it is as long as the message, and shift each plaintext letter by the corresponding letter in the extended key:

vigenere   = zipWith shift   . cycle
unvigenere = zipWith unshift . cycle

prop_vigenereExample = vigenere "LEMON" "ATTACKATDAWN" == "LXFOPVEFRNHR"

Our code is inconsistent. Earlier, our functions worked on one character at a time, but now they expect entire strings. This might seem unavoidable because of polyalphabetic substitution, that is, because the permutation used depends on the position of the character in our plaintext.

However, we can still write an encryption function that operates on a single character at a time as long as we hang on to some extra state, namely, which letter of the keyword to use next. And Data.List.mapAccumL is tailor-made for calling such a function in order to encrypt a string:

vigenereChar (k:ks) x = (ks, shift k x)
vigenere' ks = snd . mapAccumL vigenereChar (cycle ks)

According to Wikipedia, this cipher is poorly named. Apparently, Giovan Battista Bellaso originally described this cipher in 1553, and Blaise de Vigenère in fact described the autokey cipher in 1586:

autokey   ks xs = zipWith shift xs $ ks ++ xs
unautokey ks xs = m where m = zipWith unshift (ks ++ m) xs

prop_autokeyExample   = autokey   "QUEENLY" "ATTACKATDAWN" == "QNXEPVYTWTWP"
prop_unautokeyExample = unautokey "QUEENLY" "QNXEPVYTWTWP" == "ATTACKATDAWN"

One-rotor Enigma

Enigma machines contain rotors, or wheels. Each rotor is a hard-wired substitution cipher. Literally: there are wires fixed in position which represent a particular permutation of the alphabet.

Let’s take a well-known rotor:

pI = "EKMFLGDQVZNTOWYHXUSPAIBRCJ"

On the first keypress, the rotor rotates by one letter, so that the wire from A to E now goes from Z to D, the wire from B to K now goes from A to J, and so on. Current flows from the letter that was typed to the encryption of that letter through the wire that connects them.

This repeats for subsequent letters: the rotor rotates by one letter, then electric current flows from the letter that was struck to determine its encryption. Thus encryption for one wheel can be described as follows:

oneRotorEnigma = unvigenere abc . map (sub pI) . vigenere abc

prop_oneRotorEnigmaExample = oneRotorEnigma "AAAAA" == "EJKCH"

We use letters to denote how far a wheel has rotated, from A for 0 rotations to Z for 25 rotations.

Above, we really should have used bump abc instead of abc because the rotor turns before the first letter is enciphered. On a real machine, if we started with the wheel in the A position (no rotations), we would get "JKCHB"; we would get "EJKCH" by starting in the Z position. However, this is a trivial off-by-one issue that we’ll fix later.

Assuming the adversary obtains a copy of the wheels, the secret key is the choice of wheel and its initial position.

Early Enigma

The earliest Enigma machines consisted of three rotors connected in series. The first rotor rotated by one letter for each keypress. The second rotor rotated when the first rotor completed a full revolution, that is, every 26 keypresses. Similarly, the third rotor rotated when the second rotor completed a full revolution, that is, every 262 keypresses:

pII  = "AJDKSIRUXBLHWTMCQGZNPYFVOE"
pIII = "BDFHJLCPRTXVZNYEIWGAKMUSQO"

abcDup n = concatMap (replicate n) abc
rotor1 f = unvigenere abc             . f . vigenere abc
rotor2 f = unvigenere (abcDup   26)   . f . vigenere (abcDup   26)
rotor3 f = unvigenere (abcDup $ 26^2) . f . vigenere (abcDup $ 26^2)

almostEarlyEnigma = rotor3 f3 . rotor2 f2 . rotor1 f1
  where [f1, f2, f3] = map . sub <$> [pI, pII, pIII]

Actually, the above is slightly incorrect for a couple of reasons we shall explore later (excluding the off-by-one bug described above), but suffices for a first approximation.

Assuming the adversary obtains a copy of the wheels, the secret key is the choices of the wheels, their ordering, and their initial positions.

Self-inverse Engima

From a user’s perspective, a drawback of the original Enigma machine is that decryption is different to encryption. If the encryption function were its own inverse (an involution), then the machine would be simpler to use: whether encrypting or decrypting, just set up the rotors and type away.

Thus a reflector was introduced. This is a hard-wired substitution cipher that is its own inverse. For example, if A maps to Y, then Y must map to A. For reasons we will explain below, the reflector must map each letter to a distinct letter. For example, despite being self-inverse, the identity permutation is an invalid reflector.

In mathematical terms, the permutation is a product of thirteen 2-cycles.

reflectorB = "YRUHQSLDPXNGOKMIEBFZCWVJAT"

prop_reflectorNoFixed = and $ zipWith (/=) abc $ map (sub reflectorB) abc

prop_reflectorSelfInverse = iterate (map (sub reflectorB)) abc!!2 == abc

Then on each keypress, after the rotors have turned, the electrical current is sent through the wheels in one direction to the reflector and then sent back through the wheels in the opposite direction. This why the reflector permutation must have no fixed points: current can only travel in one direction on a wire.

The reflector makes the Enigma cipher its own inverse: the letter on one end of the path taken by the current swaps with the letter on the other end. The encryption of a letter must be a different letter, a weakness gleefully exploited by cryptanalysts.

In mathematical terms, two permutations have the same cycle structure if and only if they are conjugates, though here we only need one direction of this fact.

Because of the physical layout of the machine, if we use the "default" setting and place the wheels I, II, and III from left to right, then the current goes through wheel III, then II, then I, then the reflector, then back through I, then II, then III:

almostEnigma = rotor1 b1 . rotor2 b2 . rotor3 b3 .
  reflect . rotor3 f3 . rotor2 f2 . rotor1 f1
  where
    [f1, f2, f3] = map . sub   <$> [pIII, pII, pI]
    [b1, b2, b3] = map . unsub <$> [pIII, pII, pI]
    reflect      = map $ sub reflectorB

prop_almostEnigmaAAAAAA = almostEnigma "AAAAAA" == "UBDZGO"

prop_almostEnigmaSelfInverse s = (almostEnigma . almostEnigma) s == s

Our code almost simulates the Enigma I with its wheels set to AAZ (not AAA due to the off-by-one bug).

Dubstep

We’ve almost recreated an Enigma machine. Some differences stem from mechanical engineering.

In a computer program, turning the second rotor for every 26 turns of the first rotors might be implemented with a counter. With physical rotors, an elegant solution is to carve a notch in the wheel that causes the next wheel to turn. This notch is always in the same place, so we do not always reach it on the 26th keystroke. Instead, the first time we reach the notch depends on the starting position of the rotor (and afterwards we reach the notch again every 26 turns).

A related problem caused by the notches is the double stepping anomaly. We ignore the mechanical details, and just state its effects. If we reach the notch on the middle wheel, then the middle wheel turns when the right wheel turns. Hence the name "double stepping": after a keystroke causes the middle wheel to turn to its notch, the next keystroke causes it to turn again, past its notch (which in turn will cause the left wheel to turn).

It’s as if we had a malfunctioning 3-digit counter:

...
0 8 7
0 8 8
0 8 9
0 9 0
1 0 1
1 0 2
...

Some Enigma variants feature rotors with multiple notches.

Round and Round

One more subtlety. We have been using letters to indicate how far a wheel has rotated. These are called the indicator settings or the Grundstellung. On Enigma machines, each rotor is labeled with the letters of the alphabet on an index ring and the indicator settings appear in a row of little windows.

It turns out we can also rotate the wiring relative to the index ring, and we also denote the extent of such a rotation with a letter. These are called the ring settings or the Ringstellung. Grundstellung and Ringstellung rotations are measured in opposite directions.

We can account for the ring settings by modifying our shift offset. However, the notch is a feature of the index ring, and not the wiring. The propagation of rotation to the next wheel always occurs for the same letter in the indicator window, but the wiring inside the current wheel may be in a different position. Our code must handle this correctly.

Enigma Variations

To make life harder for codebreakers, the German military augmented commercial Enigma with an "aftermarket" part known as the Steckerbrett or the plugboard. This allowed users to plug in cables to swap up to 13 pairs of letters just before a signal enters the rotors and just after it leaves.

In other words, it is a self-inverse permutation applied before and after the standard Enigma cipher. Unlike the reflector, the Steckerbrett may have fixed points, that is, it can be any self-inverse permutation.

All these extra details encourage us to introduce a data structure to hold the state of an Enigma machine:

data Enigma = Enigma
  { rotors        :: [(String, String)]
  , reflector     :: String
  , grundstellung :: String
  , ringstellung  :: String
  , steckerbrett  :: String
  } deriving (Eq, Show)

The rotors list holds descriptions of the rotors from left to right on a physical machine; the electrical signal from a keystroke enters the rightmost wheel first. Each rotor is a pair of strings: the first string is a permutation of the alphabet as a word, and the second describes all notches on the rotor.

wI   = ("EKMFLGDQVZNTOWYHXUSPAIBRCJ", "Q")
wII  = ("AJDKSIRUXBLHWTMCQGZNPYFVOE", "E")
wIII = ("BDFHJLCPRTXVZNYEIWGAKMUSQO", "V")
wIV  = ("ESOVPZJAYQUIRHXLNFTGKDCMWB", "J")
wV   = ("VZBRGITYUPSDNHLXAWMJQOFECK", "Z")

ukwA = "EJMZALYXVBWFCRQUONTSPIKHGD"
ukwB = "YRUHQSLDPXNGOKMIEBFZCWVJAT"
ukwC = "FVPJIAOYEDRZXWGCTKUQSBNMHL"

defaultEnigma = Enigma
  { rotors = [wI, wII, wIII]
  , reflector = ukwB
  , grundstellung = "AAA"
  , ringstellung = "AAA"
  , steckerbrett = abc
  }

UKW stands for Umkehrwalze, the German term for the reflector. The secret key consists of the ring settings, indicator settings, reflector choice, rotor choices, rotor order, and plugboard cables.

One iteration of the turning of the wheels can be described as follows:

turn m = m { grundstellung =
  [ bool g1 (bump g1) $ g2 `elem` n2
  , bool g2 (bump g2) $ g2 `elem` n2 || g3 `elem` n3
  ,          bump g3
  ]} where
    [g1, g2, g3] = grundstellung m
    [n1, n2, n3] = snd <$> rotors m

The zap function follows the current through the wires when a key is struck on an Enigma machine m to find its encryption:

conjugateSub p k = unshift k . sub p . shift k

rotorSubs m = zipWith conjugateSub (fst <$> rotors m) $
  zipWith unshift (ringstellung m) $ grundstellung m

zap m = st . unsub p . sub (reflector m) . sub p . st where
  p  = foldr1 (.) (rotorSubs m) <$> abc
  st = sub $ steckerbrett m

It remains to write wrappers. For an uppercase letter, the enigmaChar function advances the machine one iteration then finds the encryption of the letter. Otherwise we just leave the machine alone and return the input character unchanged. The enigma function passes this function to mapAccumL to encrypt strings.

enigmaChar m k = bool (m, k) (m', zap m' k) $ isUpper k where m' = turn m

enigma m = snd . mapAccumL enigmaChar m

prop_enigmaExample =
  enigma (defaultEnigma { ringstellung = "BBB" }) "AAAAA" == "EWTYX"

The top of this webpage features a simulation of an Enigma machine with rotors I, II, and III from left to right, the B reflector, and no plugboard. The ring and indicator settings are initially both AAA, but these can be adjusted by dragging the black rings or boxes on the wheels, or by typing in the text areas.

I built it on top of the code throughout this article, also with Haskell, and compiled it to JavaScript with Haste. Hopefully, my simulation agrees with other online Enigma simulations:


Ben Lynn blynn@cs.stanford.edu 💡