set 1 challenges

1. Convert hex to base64

we can simply use the built in base64 python module. the b64decode method takes a bytes-like object so let's turn the hex string representation into a bytes object first.

import base64

def convert_hex_to_b64(hex_string):
  return base64.b64encode(bytes.fromhex(hex_string)).decode()

2. Fixed XOR

quick reminder on XOR - output is true only when inputs differ (i.e. 1 and 0).

here's a small toy i built to help illustrate:

input 1

input 2

output 2

in most high level languages you can XOR two numbers with the ^ operator.

we can enumerate over the two input buffers and xor each byte

def byte_xor(x, y):
    return bytes([a ^ b for a, b in zip(x, y)])

def fixed_xor(x, y):
    return byte_xor(bytes.fromhex(x), bytes.fromhex(y)).hex()

3. Single-byte XOR cipher

we know that each byte of the ciphertext has been XOR'd by some unknown byte, the key, which we want to determine. The key is just one byte, i.e. a number between 0 to 255, (or 0x00 to 0xff [or 0b00000000 to 0b11111111])

we can pretty much just 'brute force' this. for each candidate key, we'll need to feed the ciphertext and a byte array of the same length (which will just be the candidate key at each byte position).

def single_byte_xor(x, xor_char):
    x_ = bytes.fromhex(x)

    return byte_xor(x_, bytes([xor_char] * len(x_))).hex()

for byte_candidate in range(256):
    xorred_bytes = bytes.fromhex(single_byte_xor('1b37373331363f78151b7f2b783431333d78397828372d363c78373e783a393b3736', byte_candidate))
    print(xorred_bytes)

now we get the 255 resulting XOR outputs in our stdout, and while at this scale it's easy enough to manually comb through the results and see with our eyes which byte was the correct key, let's come up with some metric to automatically evaluate our results, using character frequency as the challenge suggests.

since we know the plaintext is english, outputs such as b'\t%%!#$-j\x07\tm9j&#!/j+j:%?$.j%,j(+)%$' or b'\x9b\xb7\xb7\xb3\xb1\xb6\xbf\xf8\x95\x9b\xff\xab\xf8\xb4\xb1\xb3\xbd\xf8\xb9\xf8\xa8\xb7\xad\xb6\xbc\xf8\xb7\xbe\xf8\xba\xb9\xbb\xb7\xb6' are clearly wrong. we can make a fair assumption that the correct output plaintext will consist mostly of latin characters. perhaps we could sum the count of all bytes in the output which are latin or whitespace characters in the ascii set.

ords = [ord(x) for x in string.ascii_letters + ' ']

def predict_key(x):
    # will map candidate byte to the count of ascii letters in output
    d = {}

    for byte_candidate in range(256):
        xorred_bytes = bytes.fromhex(single_byte_xor(x, byte_candidate))

        d[byte_candidate] = 0

        for c in xorred_bytes:
            if c in ords:
                d[byte_candidate] += 1

    key = max(d, key=d.get)

    return (key, bytes.fromhex(single_byte_xor(x, key)))

calling predict_key with the hex 1b37373331363f78151b7f2b783431333d78397828372d363c78373e783a393b3736 from the challenge returns us a tuple containing the key whose output had the highest count of ascii letters and the output itself.

(88, b"Cooking MC's like a pound of bacon")

we can see that using the key 88 (in decimal, corresponding to the letter 'X' in ascii), we get a clear plaintext english phrase as a result of XORing the ciphertext.

4. Detect single-character XOR

we're faced now with a list of strings, ONE of which has been single-byte XOR'd.

we could loop through each string and run it through our predict_key function.

with open('./data/c4.txt') as file:
    for line in file:
        print(predict_key(line))

then manually comb through the stdout to look for the correct output. but we can do better. building off our previous assumption, let's assume that english plaintext would be at least 90% ascii letters and spaces.

def is_english_plaintext(x):
    return (sum([c in ords for c in x]) / len(x)) > 0.9

def detect_single_byte_xor():
    with open('./data/c4.txt') as file:
        for line in file:
            (k, candidate) = predict_key(line)

            if is_english_plaintext(candidate):
                print((k, candidate))

this time only one result comes to stdout and we find the key was 53 (the character '5')

(53, b'Now that the party is jumping\n')

5. Implement repeating-key XOR

in this next challenge, we implement repeating key XOR. recall that we want the key buffer to be the same length as the plaintext in order to XOR each bit position. for single byte XOR this was trivial as we just need to fill a byte array of input length. here we may need to 'cut' the key when the buffer length is not a multiple of the key length. some list slicing and we're done

def repeating_key_xor(x, key):
    quotient = len(x) // len(key)
    modulus = len(x) % len(key)

    padded_key = key * quotient + key[:modulus]

    return byte_xor(x, padded_key).hex()

let's try the inputs from the challenge

x = repeating_key_xor(b"""Burning 'em, if you ain't quick and nimble
I go crazy when I hear a cymbal""", b"ICE")
print(x)

and we get returned the expected hex string of 0b3637272a2b2e63622c2e69692a23693a2a3c6324202d623d63343c2a26226324272765272a282b2f20430a652e2c652a3124333a653e2b2027630c692b20283165286326302e27282f

6. Break repeating-key XOR

our first challenge where we're tasked with breaking 'encryption'. exciting.

we need to find the repeating-key used to XOR encrypt this file (which has also been base64'd)

let's first write the function to calculate hamming distance between two strings. note that this should calculate BIT LEVEL hamming distance - not hamming distance of the characters

def hamming_distance(a, b):
    xor = int(a.hex(), 16) ^ int(b.hex(), 16)
    return bin(xor).count('1')

i tried at first following the challenge suggestion of only using 2 or 4 sample keysized blocks (step 4) to calculate the normalised hamming distance but found that the correct keysize would correspond to the ~5th smallest normalised hamming distance. this is why i ended up sampling across the entire range of ciphertext.

def get_blocks(lst, n):
    blocks = list(chunk(lst, n))
    blocks[-1] = blocks[-1].ljust(n, b'')
    return blocks

def chunk(lst, n):
    for i in range(0, len(lst), n):
        yield lst[i:i + n]

d = {}

with open('./data/c6.txt') as file:
    raw_data = base64.b64decode(file.read())
    
    for key_size in range(2, 41):
        blocks = get_blocks(raw_data, key_size)

        samples = len(blocks)
        # if length is odd, minus one from no. of samples
        # to avoid out of index when we try to access index i+1 below
        if samples % 2:
            samples -= 1

        score = 0

        for i in range(0, samples, 2):
            score += hamming_distance(blocks[i], blocks[i+1])

        score /= key_size
        score /= samples

        d[key_size] = score

    candidate_key_size = min(d, key=d.get)
    blocks = get_blocks(raw_data, candidate_key_size)

    transposed = [[b[i] for b in blocks] for i in range(candidate_key_size)]

    candidate_key = []
    for t in transposed:
        (key_byte, _) = predict_key(bytes(t).hex())
        candidate_key.append(key_byte)

    print(bytes.fromhex(repeating_key_xor(raw_data, bytes(candidate_key))))

7. AES in ECB mode

i'm not familiar at all with block ciphers and their modes of operations, so had to do some reading and research at this point. turns out ECB (electronic code book) mode is not great to use as encryption operations don't use salts or initialisation vectors in this mode. this means any plaintext blocks which are the same will produce the exact same cipherblocks.

AES operates with blocks of 16 bytes length. keys can be 128, 192, or 256 bits (16, 24, or 32 bytes) long.

we'll use the pycryptodome python library for this

from Crypto.Cipher import AES

def aes_ecb_decrypt(ciphertext, key):
    cipher = AES.new(key, AES.MODE_ECB)
    return cipher.decrypt(ciphertext)

with open('./data/c7.txt') as file:
    s = base64.b64decode(file.read())
    print(aes_ecb_decrypt(s, b"YELLOW SUBMARINE"))

run the above and vanilla ice fills our stdout.

8. Detect AES in ECB mode

this challenge wants us to go through the hex values in this file and determine which one has been AES encrypted with ECB mode

we know from researching ECB for the previous step that plaintext blocks which are the same will produce cipherblocks which are the same. this challenge also states as much :D

we can simply split the ciphertext into 16 byte blocks (reusing a function from challenge 6) and check if any blocks are the same

with open('./data/c8.txt') as file:
    for line in file:
        s = bytes.fromhex(line)
        blocks = list(chunk(s, 16))

        if (len(blocks) != len(set(blocks))):
            print(line)
            print(blocks)

run the above and we indeed detect the line which was AES'd in ECB mode

d880619740a8a19b7840a8a31c810a3d08649af70dc06f4fd5d2d69c744cd283e2dd052f6b641dbf9d11b0348542bb5708649af70dc06f4fd5d2d69c744cd2839475c9dfdbc1d46597949d9c7e82bf5a08649af70dc06f4fd5d2d69c744cd28397a93eab8d6aecd566489154789a6b0308649af70dc06f4fd5d2d69c744cd283d403180c98c8f6db1f2a3f9c4040deb0ab51b29933f2c123c58386b06fba186a