SPOILER WARNING - Try the cryptopals crypto challenges first!

We recently started working on the cryptopals crypto challenges in Python 3, coming in with zero cryptography knowledge. Here are some notes on our thought process while solving Set 1, along with our solutions. Don’t read this if you want to try them out too.

Challenge 1: Convert hex to base64 ¶

As simple as it gets, once we became aware of binascii‘s existence.

In [1]:

from binascii import unhexlify, b2a_base64, a2b_base64, hexlify
import string

In [2]:

def hex_to_base64(hex_string: str) -> bytes:
    return b2a_base64(unhexlify(hex_string))

Challenge 2: Fixed XOR ¶

A function that takes two equal length buffers and returns their XORed result. At this point we reread Challenge 1’s instructions and realized how much nicer it is to always work on bytes.

In [3]:

def fixed_xor(buffer1: bytes, buffer2: bytes) -> bytes:
    return bytes([(b1 ^ b2) for b1, b2 in zip(buffer1, buffer2)])

I’m not sure why they’re called buffers here, don’t think the terminology is used in the other challenges. Anyway, here’s a helper function to convert the printable bytes in a list to a string. Also, unhexlify is your friend.

In [4]:

def bytes_to_str(byte_list: bytes) -> str:
    return "".join(filter(lambda x: x in string.printable, "".join(map(chr, byte_list))))

In [5]:

bytes_to_str(fixed_xor(unhexlify("1c0111001f010100061a024b53535009181c"), unhexlify("686974207468652062756c6c277320657965")))

Out[5]:

"the kid don't play"

Challenge 3: Single-byte XOR Cipher ¶

Decrypt a string that’s been encrypted by XORing with a single byte.

I think this challenge is what got us hooked. Hardly any instructions, just a nudge in the right direction.

I started out by googling for an engligh character frequency table. Found the Wiki page of course, but that didn’t have an easily copy-able one and I’m lazy. Further googling led to a frequency analysis Python example, which had the frequency table in a nicely accessible dict, and also looked like a pretty neat way to learn bits of Python.

The idea was to XOR the string with a single byte (from 0 to 256) and score each resulting string based on how well its letters agreed with English. But, turns out there’s something in English sentences usually a bit more important than just letters - spaces.

It took me a while to figure this out. This challenge, as well the further ones which use the same code, can work with the examples on the site just by counting the number of spaces and picking the string with the most. But I was too proud of my frequency table to give it up, so I kept it and just added space to it with a bogus frequency value that’s greater than all the other letters.

In [6]:

def single_byte_xor_cipher(byte_string: bytes) -> tuple:
    english_character_frequency = {'E': 12.70, 'T': 9.06, 'A': 8.17, 'O': 7.51, 'I': 6.97, 'N': 6.75, 'S': 6.33,
                                   'H': 6.09,
                                   'R': 5.99, 'D': 4.25, 'L': 4.03, 'C': 2.78, 'U': 2.76, 'M': 2.41, 'W': 2.36,
                                   'F': 2.23,
                                   'G': 2.02, 'Y': 1.97, 'P': 1.93, 'B': 1.29, 'V': 0.98, 'K': 0.77, 'J': 0.15,
                                   'X': 0.15,
                                   'Q': 0.10, 'Z': 0.07, ' ': 35}
    strings = [fixed_xor(byte_string, bytearray(bytes([num]) * len(byte_string))) for num in range(256)]
    scores = [
        sum([bytes_to_str(s).upper().count(c) * english_character_frequency[c] for c in english_character_frequency])
        for s in strings]
    index = max(range(len(scores)), key=scores.__getitem__)
    return strings[index], max(scores), index

In [7]:

bytes_to_str(single_byte_xor_cipher(unhexlify("1b37373331363f78151b7f2b783431333d78397828372d363c78373e783a393b3736"))[0])

Out[7]:

"Cooking MC's like a pound of bacon"

A cryptic message.

Challenge 4: Detect single-character XOR ¶

Given a file with hex-encoded lines, find which one’s been single-byte XORed.

Pretty similar to the previous one, but we had some issues with non-ascii characters. They can be ruled out anyway.

In [8]:

def detect_single_byte_xor_cipher(byte_strings: list) -> bytes:
    scored_strings = []
    for line in byte_strings:
        try:
            line.decode('ascii')
            scored_strings.append(single_byte_xor_cipher(line))
        except UnicodeDecodeError:
            pass
    return sorted(scored_strings, key=lambda x: x[1], reverse=True)[0][0]

In [9]:

byte_strings = [unhexlify(line.strip()) for line in open("data/crypto/4.txt").readlines()]
bytes_to_str(detect_single_byte_xor_cipher(byte_strings).strip())

Out[9]:

'Now that the party is jumping'

Challenge 5: Implement repeating-key XOR ¶

Like single-byte XOR ‘cept it’s not a single-byte.

This one made us miss Haskell’s cycle function.

In [10]:

def repeating_key_xor(byte_string: bytes, byte_key: bytes) -> bytes:
    return fixed_xor(byte_string, (byte_key * len(byte_string))[:len(byte_string)])

Challenge 6: Break repeating-key XOR ¶

Decrypt a repeating-key-XORed base64ed file.

The challenge starts off with a scary box warning us of hard times ahead. But the instructions were very clear so it felt much easier than the previous ones.

First off, bit-wise hamming distance - made easier with the knowledge that XOR is 1 if the two bits are different.

In [11]:

def hamming_distance(buffer1: bytes, buffer2: bytes) -> int:
    distance = sum(bin(i).count("1") for i in fixed_xor(buffer1, buffer2))
    return distance

I think the idea here is that a block of 3 bytes XORed with ‘ICE’ is going to be similar to another block of 3 bytes XORed with ‘ICE’ since they’re both using ‘ICE’. But a block of 4 bytes XORed with ‘ICEI’ won’t be so close to another 4-byte block XORed with ‘CEIC’. So we find the key size for which the hamming distance between adjacent blocks (of that key size) is lowest, average of 4 to make sure.

In [12]:

byte_string = b''.join([a2b_base64(line.strip()) for line in open("data/crypto/6.txt").readlines()])
keysize_distances = []
for keysize in range(2, 40):
    blocks = [byte_string[i * keysize: (i + 1) * keysize] for i in range(4)]
    distances = [hamming_distance(blocks[i], blocks[j]) for i in range(len(blocks)-1) for j in range(1, len(blocks))]
    distance = sum(distances) / len(distances)
    distance /= keysize
    keysize_distances.append((keysize, distance))
keysize = sorted(keysize_distances, key=lambda x: x[1])[0][0]

Now that we have the key size we know which bytes in the text are XORed with the same key byte. After splitting up the text into blocks of key size, the first byte of each block is XOred with the first byte of the key, seconds with the second, and so on. Each of these combinations is just a single-byte cipher.

In [13]:

byte_blocks = [byte_string[i * keysize: (i + 1) * keysize] for i in range(int(len(byte_string) / keysize))]
transposed_blocks = [bytearray([b[i] for b in byte_blocks]) for i in range(keysize)]
keys = []
for block in transposed_blocks:
    _, _, index = single_byte_xor_cipher(block)
    keys.append(index)
key = bytearray(keys)

In [14]:

bytes_to_str(key)

Out[14]:

'Terminator X: Bring the noise'

Challenge 7: AES in ECB mode ¶

Another new library - pycrypto (pip installed)

In [15]:

from Crypto.Cipher import AES

In [16]:

def aes_in_ecb_mode(byte_string: bytes, key: bytes, encrypt: bool = False) -> bytes:
    cipher = AES.new(key, AES.MODE_ECB)
    if encrypt:
        return cipher.encrypt(byte_string)
    else:
        return cipher.decrypt(byte_string)

In [17]:

byte_string = b''.join([a2b_base64(line.strip()) for line in open("data/crypto/7.txt").readlines()])
for line in bytes_to_str(aes_in_ecb_mode(byte_string, b'YELLOW SUBMARINE', encrypt=False)).split("\n")[:10]:
    print(line)

I'm back and I'm ringin' the bell 
A rockin' on the mike while the fly girls yell 
In ecstasy in the back of me 
Well that's my DJ Deshay cuttin' all them Z's 
Hittin' hard and the girlies goin' crazy 
Vanilla's on the mike, man I'm not lazy. 

I'm lettin' my drug kick in 
It controls my mouth and I begin 
To just let it flow, let my concepts go

It’s a song! Why didn’t I think of that.

Challenge 8: Detect AES in ECB mode ¶

Find the line encrypted with ECB mode AES.

I think we’re assuming there’s repeated words in the sentence, (if not I’ve done this exercise completely wrong). If there’s a 16 byte repeat anywhere in the sentence then we’ve got the line cornered, since ECB encrypts identical key-size blocks identically.

In [18]:

def detect_aes_in_ecb_mode(byte_string: bytes,
                           block_length: int) -> bool:
    byte_blocks = [byte_string[i*block_length: (i+1)*block_length]
                   for i in range(int(len(byte_string) / block_length))]
    unique_blocks = set(byte_blocks)
    return len(unique_blocks)/len(byte_blocks) < 1

In [19]:

byte_strings = [unhexlify(line.strip()) for line in open("data/crypto/8.txt").readlines()]
BLOCK_SIZE = 16
[i for i, byte_string in enumerate(byte_strings) if detect_aes_in_ecb_mode(byte_string, BLOCK_SIZE)]

Out[19]:

[132]

That’s it for Set 1.

Here’s Set 2

Cryptopals Set 1

Basics