## Tuesday, December 27, 2011

### 32: How Polish Math Saved Europe

You may have heard the term "The Enigma", the German encoding
machine that was in use from the 1920's through World War II.  The
Germans considered this form of encoding unbreakable, and used it to
send supposedly secure messages throughout the war.  But what they
didn't know was that Allied mathematicians had identified numerous
weaknesses in their coding scheme, and in fact, the Allies could
decode a significant proportion of their messages.  It has been
commonly estimated that this ability accelerated the end of the
European war by as much as two years.
In the 1930's, most foreign nations considered the Enigma encoding
impregnable.  Let's look at why this was the case.  The machine was
based on the concept of substitution ciphers, where you would
transform a message by substituting each letter with a different one.
For example, a common substitution cipher might replace each letter
with the next one, so 'C-A-T' becomes 'D-B-U'.  Such simple ciphers
are easily breakable, by technigques such as looking for common small
words or analyzing letter frequency data.  But a trick that makes
the cipher much harder to break is if you change the encoding after
each letter:  so the first letter of your message might be replaced by
the next one, the second letter replaced by the one before, etcetera.
Then the same letter might stand for different other letters in
different parts of the message, and it would be really hard to decode
the substitution.  In the days before commonplace availability of
digital computers, this was very painful to implement.
The Enigma machine, a complex pre-computer encryption machine,
took this idea to an extreme while enabling quick encoding and
decoding of messages.  The machine had a typewriter attached to three
'scramblers', each which implemented a different substitution cipher.
So you can think of each scrambler as a list of the 26 letters, in a
scrambled order, and for each letter on its input it would generate a
different letter on its output.  When a letter was typed, it would be
sent in sequence to all the scramblers, to be transformed each time.
Then some of the scramblers would automatically rotate, shifting the
code by one place, so the next letter would get a different
encryption.  For example, let's assume we have a simplified three
letter alphabet, and a scrambler reads 'CBA', with C in the first
position.  That means an A would become a C, a B would be kept as
a B, and a C would transform to an A.  Then the scrambler would
rotate, the 'A' at the end moving to the beginning, so the 'CBA' would
become 'ACB'.  The next letter typed would be transformed with this
new cipher.  So a message consisting of AA would become CA after
encoding, the first A being coded by a different cipher than the
second one.
So how complex was an Enigma cipher to break?  With three
26-letter scramblers, which could each be initially in any position,
this would mean 26x26x26 possible ciphers, or 17,576.  The three
scramblers could also be removed and replaced in a different order,
multiplying by a further factor of 6.  A plugboard was then added
which could further swap any subset of letter pairs in addition to the
cipher implemented by the rest of the machine, multiplying the number
of possible ciphers by another hundred billion or so.  There are also
some other details which I'm not including here.  Thus it's not
surprising that most foreign nations considered the codes unbreakable,
and didn't even try.  But the Polish knew they did not have a choice--
if war broke out, they would be one of Germany's likely targets.
A statistician named Marian Rejewski made the first major
fatal mistake:  due to worries about special messages being lost in
static, they had mandated that an initial code word be repeated twice
at the beginning of each message.  The two repetitions would be
enciphered with different sets of letters, due to the nature of the
machine.  But by knowing the length of the initial word, usually three
letters, one could find sets of letters that represented the same
scrambler positions three letters down.  So suppose the daily code
word was 'CAT'.  The initial message string would be 'CATCAT', which
might be encoded as something like 'DOGPIG'.  We can't tell from the
encoded string 'DOGPIG' what the original word was, but we can tell
that the 'D' in scrambler position 1 encodes the same letter as 'P' in
scrambler position 4.  This may not seem like much information to go
on, but the Polish were able to use this kind of information to build
up a 'signature' of the scrambler positions.  In other words, by
analyzing large sets of these pairs and comparing to known patterns
that would come from each of the 105,456 possible scrambler settings,
they could figure out the actual settings for the day!
This wasn't a full solution, of course, and in the days before
true computers, it took an immense amount of effort to build machines
that could make use of this information.  Essentially they had to
rapidly compare the signature they found to the hundred thousand
possiblities, to find the correct one.  But they did it, and were
able to listen to many German communications as a result.  The Germans
eventually negated this success by adding more possible scramblers to
the machine.  But this was just the beginning of the cracking of the
Enigma-- when Poland fell, Britain resumed their work to identify
additional weaknesses, with major successes that I will probably
describe in a later podcast.  If it wasn't for this initial
breakthrough by the Polish, though, it's highly questionable whether
any of the Allies would have imagined that the Enigma could be
broken.  And the war in Europe might very well have extended for
several more years.