we can simply use the built in base64 python module. the b64decode method takes a bytes-like object so let's turn the hex string representation into a bytes object first.
import base64
def convert_hex_to_b64(hex_string):
return base64.b64encode(bytes.fromhex(hex_string)).decode()
quick reminder on XOR - output is true only when inputs differ (i.e. 1 and 0).
here's a small toy i built to help illustrate:
input 1
0
input 2
0
output 2
0
in most high level languages you can XOR two numbers with the ^
operator.
we can enumerate over the two input buffers and xor each byte
def byte_xor(x, y):
return bytes([a ^ b for a, b in zip(x, y)])
def fixed_xor(x, y):
return byte_xor(bytes.fromhex(x), bytes.fromhex(y)).hex()
we know that each byte of the ciphertext has been XOR'd by some unknown byte, the key, which we want to determine. The key is just one byte, i.e. a number between 0
to 255
, (or 0x00
to 0xff
[or 0b00000000
to 0b11111111
])
we can pretty much just 'brute force' this. for each candidate key, we'll need to feed the ciphertext and a byte array of the same length (which will just be the candidate key at each byte position).
def single_byte_xor(x, xor_char):
x_ = bytes.fromhex(x)
return byte_xor(x_, bytes([xor_char] * len(x_))).hex()
for byte_candidate in range(256):
xorred_bytes = bytes.fromhex(single_byte_xor('1b37373331363f78151b7f2b783431333d78397828372d363c78373e783a393b3736', byte_candidate))
print(xorred_bytes)
now we get the 255 resulting XOR outputs in our stdout, and while at this scale it's easy enough to manually comb through the results and see with our eyes which byte was the correct key, let's come up with some metric to automatically evaluate our results, using character frequency as the challenge suggests.
since we know the plaintext is english,
outputs such as b'\t%%!#$-j\x07\tm9j&#!/j+j:%?$.j%,j(+)%$'
or b'\x9b\xb7\xb7\xb3\xb1\xb6\xbf\xf8\x95\x9b\xff\xab\xf8\xb4\xb1\xb3\xbd\xf8\xb9\xf8\xa8\xb7\xad\xb6\xbc\xf8\xb7\xbe\xf8\xba\xb9\xbb\xb7\xb6'
are clearly wrong. we can make a fair assumption that the correct output plaintext will consist mostly of latin characters. perhaps we could sum the count of all bytes in the output which are latin or whitespace characters in the ascii set.
ords = [ord(x) for x in string.ascii_letters + ' ']
def predict_key(x):
# will map candidate byte to the count of ascii letters in output
d = {}
for byte_candidate in range(256):
xorred_bytes = bytes.fromhex(single_byte_xor(x, byte_candidate))
d[byte_candidate] = 0
for c in xorred_bytes:
if c in ords:
d[byte_candidate] += 1
key = max(d, key=d.get)
return (key, bytes.fromhex(single_byte_xor(x, key)))
calling predict_key with the hex 1b37373331363f78151b7f2b783431333d78397828372d363c78373e783a393b3736
from the challenge returns us a tuple containing the key whose output had the highest count of ascii letters and the output itself.
(88, b"Cooking MC's like a pound of bacon")
we can see that using the key 88 (in decimal, corresponding to the letter 'X' in ascii), we get a clear plaintext english phrase as a result of XORing the ciphertext.
we're faced now with a list of strings, ONE of which has been single-byte XOR'd.
we could loop through each string and run it through our predict_key
function.
with open('./data/c4.txt') as file:
for line in file:
print(predict_key(line))
then manually comb through the stdout to look for the correct output. but we can do better. building off our previous assumption, let's assume that english plaintext would be at least 90% ascii letters and spaces.
def is_english_plaintext(x):
return (sum([c in ords for c in x]) / len(x)) > 0.9
def detect_single_byte_xor():
with open('./data/c4.txt') as file:
for line in file:
(k, candidate) = predict_key(line)
if is_english_plaintext(candidate):
print((k, candidate))
this time only one result comes to stdout and we find the key was 53 (the character '5')
(53, b'Now that the party is jumping\n')
in this next challenge, we implement repeating key XOR. recall that we want the key buffer to be the same length as the plaintext in order to XOR each bit position. for single byte XOR this was trivial as we just need to fill a byte array of input length. here we may need to 'cut' the key when the buffer length is not a multiple of the key length. some list slicing and we're done
def repeating_key_xor(x, key):
quotient = len(x) // len(key)
modulus = len(x) % len(key)
padded_key = key * quotient + key[:modulus]
return byte_xor(x, padded_key).hex()
let's try the inputs from the challenge
x = repeating_key_xor(b"""Burning 'em, if you ain't quick and nimble
I go crazy when I hear a cymbal""", b"ICE")
print(x)
and we get returned the expected hex string of 0b3637272a2b2e63622c2e69692a23693a2a3c6324202d623d63343c2a26226324272765272a282b2f20430a652e2c652a3124333a653e2b2027630c692b20283165286326302e27282f
our first challenge where we're tasked with breaking 'encryption'. exciting.
we need to find the repeating-key used to XOR encrypt this file (which has also been base64'd)
let's first write the function to calculate hamming distance between two strings. note that this should calculate BIT LEVEL hamming distance - not hamming distance of the characters
def hamming_distance(a, b):
xor = int(a.hex(), 16) ^ int(b.hex(), 16)
return bin(xor).count('1')
i tried at first following the challenge suggestion of only using 2 or 4 sample keysized blocks (step 4) to calculate the normalised hamming distance but found that the correct keysize would correspond to the ~5th smallest normalised hamming distance. this is why i ended up sampling across the entire range of ciphertext.
def get_blocks(lst, n):
blocks = list(chunk(lst, n))
blocks[-1] = blocks[-1].ljust(n, b'