< back# set 1 challenges

## 1. Convert hex to base64

## 2. Fixed XOR

## 3. Single-byte XOR cipher

## 4. Detect single-character XOR

## 5. Implement repeating-key XOR

## 6. Break repeating-key XOR

## 7. AES in ECB mode

## 8. Detect AES in ECB mode

we can simply use the built in base64 python module. the b64decode method takes a bytes-like object so let's turn the hex string representation into a bytes object first.

```
import base64
def convert_hex_to_b64(hex_string):
return base64.b64encode(bytes.fromhex(hex_string)).decode()
```

quick reminder on XOR - output is true only when inputs differ (i.e. 1 and 0).

here's a small toy i built to help illustrate:

input 1

0

input 2

0

output 2

0

in most high level languages you can XOR two numbers with the `^`

operator.

we can enumerate over the two input buffers and xor each byte

```
def byte_xor(x, y):
return bytes([a ^ b for a, b in zip(x, y)])
def fixed_xor(x, y):
return byte_xor(bytes.fromhex(x), bytes.fromhex(y)).hex()
```

we know that each byte of the ciphertext has been XOR'd by some unknown byte, the key, which we want to determine. The key is just one byte, i.e. a number between `0`

to `255`

, (or `0x00`

to `0xff`

[or `0b00000000`

to `0b11111111`

])

we can pretty much just 'brute force' this. for each candidate key, we'll need to feed the ciphertext and a byte array of the same length (which will just be the candidate key at each byte position).

```
def single_byte_xor(x, xor_char):
x_ = bytes.fromhex(x)
return byte_xor(x_, bytes([xor_char] * len(x_))).hex()
for byte_candidate in range(256):
xorred_bytes = bytes.fromhex(single_byte_xor('1b37373331363f78151b7f2b783431333d78397828372d363c78373e783a393b3736', byte_candidate))
print(xorred_bytes)
```

now we get the 255 resulting XOR outputs in our stdout, and while at this scale it's easy enough to manually comb through the results and see with our eyes which byte was the correct key, let's come up with some metric to automatically evaluate our results, using character frequency as the challenge suggests.

since we know the plaintext is english,
outputs such as `b'\t%%!#$-j\x07\tm9j&#!/j+j:%?$.j%,j(+)%$'`

or `b'\x9b\xb7\xb7\xb3\xb1\xb6\xbf\xf8\x95\x9b\xff\xab\xf8\xb4\xb1\xb3\xbd\xf8\xb9\xf8\xa8\xb7\xad\xb6\xbc\xf8\xb7\xbe\xf8\xba\xb9\xbb\xb7\xb6'`

are clearly wrong. we can make a fair assumption that the correct output plaintext will consist mostly of latin characters. perhaps we could sum the count of all bytes in the output which are latin or whitespace characters in the ascii set.

```
ords = [ord(x) for x in string.ascii_letters + ' ']
def predict_key(x):
# will map candidate byte to the count of ascii letters in output
d = {}
for byte_candidate in range(256):
xorred_bytes = bytes.fromhex(single_byte_xor(x, byte_candidate))
d[byte_candidate] = 0
for c in xorred_bytes:
if c in ords:
d[byte_candidate] += 1
key = max(d, key=d.get)
return (key, bytes.fromhex(single_byte_xor(x, key)))
```

calling predict_key with the hex `1b37373331363f78151b7f2b783431333d78397828372d363c78373e783a393b3736`

from the challenge returns us a tuple containing the key whose output had the highest count of ascii letters and the output itself.

`(88, b"Cooking MC's like a pound of bacon")`

we can see that using the key 88 (in decimal, corresponding to the letter 'X' in ascii), we get a clear plaintext english phrase as a result of XORing the ciphertext.

we're faced now with a list of strings, ONE of which has been single-byte XOR'd.

we could loop through each string and run it through our `predict_key`

function.

```
with open('./data/c4.txt') as file:
for line in file:
print(predict_key(line))
```

then manually comb through the stdout to look for the correct output. but we can do better. building off our previous assumption, let's assume that english plaintext would be at least 90% ascii letters and spaces.

```
def is_english_plaintext(x):
return (sum([c in ords for c in x]) / len(x)) > 0.9
def detect_single_byte_xor():
with open('./data/c4.txt') as file:
for line in file:
(k, candidate) = predict_key(line)
if is_english_plaintext(candidate):
print((k, candidate))
```

this time only one result comes to stdout and we find the key was 53 (the character '5')

`(53, b'Now that the party is jumping\n')`

in this next challenge, we implement repeating key XOR. recall that we want the key buffer to be the same length as the plaintext in order to XOR each bit position. for single byte XOR this was trivial as we just need to fill a byte array of input length. here we may need to 'cut' the key when the buffer length is not a multiple of the key length. some list slicing and we're done

```
def repeating_key_xor(x, key):
quotient = len(x) // len(key)
modulus = len(x) % len(key)
padded_key = key * quotient + key[:modulus]
return byte_xor(x, padded_key).hex()
```

let's try the inputs from the challenge

```
x = repeating_key_xor(b"""Burning 'em, if you ain't quick and nimble
I go crazy when I hear a cymbal""", b"ICE")
print(x)
```

and we get returned the expected hex string of `0b3637272a2b2e63622c2e69692a23693a2a3c6324202d623d63343c2a26226324272765272a282b2f20430a652e2c652a3124333a653e2b2027630c692b20283165286326302e27282f`

our first challenge where we're tasked with breaking 'encryption'. exciting.

we need to find the repeating-key used to XOR encrypt this file (which has also been base64'd)

let's first write the function to calculate hamming distance between two strings. note that this should calculate BIT LEVEL hamming distance - not hamming distance of the characters

```
def hamming_distance(a, b):
xor = int(a.hex(), 16) ^ int(b.hex(), 16)
return bin(xor).count('1')
```

i tried at first following the challenge suggestion of only using 2 or 4 sample keysized blocks (step 4) to calculate the normalised hamming distance but found that the correct keysize would correspond to the ~5th smallest normalised hamming distance. this is why i ended up sampling across the entire range of ciphertext.

```
def get_blocks(lst, n):
blocks = list(chunk(lst, n))
blocks[-1] = blocks[-1].ljust(n, b'')
return blocks
def chunk(lst, n):
for i in range(0, len(lst), n):
yield lst[i:i + n]
d = {}
with open('./data/c6.txt') as file:
raw_data = base64.b64decode(file.read())
for key_size in range(2, 41):
blocks = get_blocks(raw_data, key_size)
samples = len(blocks)
# if length is odd, minus one from no. of samples
# to avoid out of index when we try to access index i+1 below
if samples % 2:
samples -= 1
score = 0
for i in range(0, samples, 2):
score += hamming_distance(blocks[i], blocks[i+1])
score /= key_size
score /= samples
d[key_size] = score
candidate_key_size = min(d, key=d.get)
blocks = get_blocks(raw_data, candidate_key_size)
transposed = [[b[i] for b in blocks] for i in range(candidate_key_size)]
candidate_key = []
for t in transposed:
(key_byte, _) = predict_key(bytes(t).hex())
candidate_key.append(key_byte)
print(bytes.fromhex(repeating_key_xor(raw_data, bytes(candidate_key))))
```

i'm not familiar at all with block ciphers and their modes of operations, so had to do some reading and research at this point. turns out ECB (electronic code book) mode is not great to use as encryption operations don't use salts or initialisation vectors in this mode. this means any plaintext blocks which are the same will produce the exact same cipherblocks.

AES operates with blocks of 16 bytes length. keys can be 128, 192, or 256 bits (16, 24, or 32 bytes) long.

we'll use the pycryptodome python library for this

```
from Crypto.Cipher import AES
def aes_ecb_decrypt(ciphertext, key):
cipher = AES.new(key, AES.MODE_ECB)
return cipher.decrypt(ciphertext)
with open('./data/c7.txt') as file:
s = base64.b64decode(file.read())
print(aes_ecb_decrypt(s, b"YELLOW SUBMARINE"))
```

run the above and vanilla ice fills our stdout.

this challenge wants us to go through the hex values in this file and determine which one has been AES encrypted with ECB mode

we know from researching ECB for the previous step that plaintext blocks which are the same will produce cipherblocks which are the same. this challenge also states as much :D

we can simply split the ciphertext into 16 byte blocks (reusing a function from challenge 6) and check if any blocks are the same

```
with open('./data/c8.txt') as file:
for line in file:
s = bytes.fromhex(line)
blocks = list(chunk(s, 16))
if (len(blocks) != len(set(blocks))):
print(line)
print(blocks)
```

run the above and we indeed detect the line which was AES'd in ECB mode

```
d880619740a8a19b7840a8a31c810a3d08649af70dc06f4fd5d2d69c744cd283e2dd052f6b641dbf9d11b0348542bb5708649af70dc06f4fd5d2d69c744cd2839475c9dfdbc1d46597949d9c7e82bf5a08649af70dc06f4fd5d2d69c744cd28397a93eab8d6aecd566489154789a6b0308649af70dc06f4fd5d2d69c744cd283d403180c98c8f6db1f2a3f9c4040deb0ab51b29933f2c123c58386b06fba186a
```