Welcome to another chapter of Crypto Series. Last time I demonstrated some basic mechanisms in Langdon Framework, today we will look at some practical use. We will learn something about XOR operation, entropy and will apply the knowledge to solve first 6 Cryptopals challenges. These are available at https://cryptopals.com/sets/1 for free!
1.1 – Convert hex to base64
In the first task you are expected to convert the string:
49276d206b696c6c696e6720796f757220627261696e206c696b65206120706f69736f6e6f7573206d757368726f6f6d
into Base64. It should produce:
SSdtIGtpbGxpbmcgeW91ciBicmFpbiBsaWtlIGEgcG9pc29ub3VzIG11c2hyb29t
Such conversion is luckily and purposefully Langdon’s basic functionality.
*** a = 0x49276d206b696c6c696e6720796f757220627261696e206c696b65206120706f69736f6e6f7573206d757368726f6f6d [.] a = 0x49276d206b696c6c696e67...6f7573206d757368726f6f6d *** a~Base64 Base64: SSdtIGtpbGxpbmcgeW91ciBicmFpbiBsaWtlIGEgcG9pc29ub3VzIG11c2hyb29t
NEXT!
1.2 Fixed XOR
Finally something new, an exclusive OR (XOR) operation. Langdon supports it and takes it very seriously:
- different data lengths are supported (smaller pattern is repeated),
- you can XOR in two ways – using a command as simple bitwise operation or as an Algorithm (used later in task 1.6),
- you can invoke help page with
help xorcommand. Later I am planning to write a post or two about XOR.
Now the situation is simple, xoring 0x1c0111001f010100061a024b53535009181c and 0x686974207468652062756c6c277320657965 should give us 0x746865206b696420646f6e277420706c6179.
*** a = 1c0111001f010100061a024b53535009181c [.] a = 0x1c0111001f010100061a024b53535009181c *** b = 686974207468652062756c6c277320657965 [.] b = 0x686974207468652062756c6c277320657965 *** c = xor a b *** c~Hex Hex: 0x746865206b696420646f6e277420706c6179
Note: Other common bitwise operations are also supported, namely and, or and not. They are not that useful in practical cryptography, but it is handy to have them available.
1.3 – Single-byte XOR cipher
We have 1b37373331363f78151b7f2b783431333d78397828372d363c78373e783a393b3736 value XORed with single unknown character. If the character was known, you could use same approach as in previous task and it would work. But we need a bruteforce here.
*** x = 0x1b37373331363f78151b7f2b783431333d78397828372d363c78373e783a393b3736 [.] x = 0x1b37373331363f78151b7f...2d363c78373e783a393b3736 *** brute-single-xor x
First a variable is created and then brute-single-xor is executed with it. The brute-single-xor command takes the variable and xores it with all 256 possible byte values, one at a time. Results are stored as <variable>_<byte>. Like this:
*** vars
x 0x1b37373331363f78151b7f...2d363c78373e783a393b3736
x_00 \x1b77316?x\x15\x1b\x7f+x413=x9x(7-6<x7>x:9;76
x_01 \x1a66207>y\x14\x1a~*y502<y8y)6,7=y6?y;8:67
x_02 \x1955134=z\x17\x19})z631?z;z*5/4>z5<z8;954
x_03 \x1844025<{\x16\x18|({720>{:{+4.5?{4={9:845
x_04 \x1f33752;|\x11\x1f{/|0579|=|,3)28|3:|>=?32
...
Now we can subject variables to simple wordlist analysis:
*** english = file:../../wordlists/english.txt [.] english = b'a\naa\naaa\naah\naahed...witterion\nzwitterionic' *** wordlist english [.] Variable Score [.] Running thread (samples 1 through 32). [.] Running thread (samples 33 through 64). [.] Running thread (samples 65 through 96). [.] Running thread (samples 97 through 128). [.] Running thread (samples 129 through 160). [.] Running thread (samples 161 through 192). [.] Running thread (samples 193 through 224). [.] Running thread (samples 225 through 256). [.] Running thread (samples 257 through 257). [+] x_58: 1.000 Cooking MC's like a pound of bacon [+] x_78: 1.000 cOOKING\x00mc\x07S\x00LI...\x00POUND\x00OF\x00BACON
A wordlist of english words is loaded and then the wordlist command is used. This will test all words in each variable and see how many of them are present in the wordlist. That pinpoints 2 possible answers.
1.4 – Detect Single-byte XOR
Same task as previous, but you must also find the correct ciphertext. Download the data:
wget -O /tmp/ciphers https://cryptopals.com/static/challenge-data/4.txt
In a few cases the cryptopals challenges are way too specific and there is no reasonable benefit of full exact implementation. This is one of these situations. In this case Langdon will still deal with bruteforce, entropy and that stuff, but we will prepare commands and deal with the output in Bash.
There are 2 options available. You can do brute-single-xor for each variable and then use wordlist to find correct variable and XOR key. It is tedious, but here is a Bash script to do it for you (if you have time):
# prepare commands sed = /tmp/ciphers | sed 'N;s/\n/ = 0x/' > /tmp/variables cut -d ' ' -f 1 /tmp/variables | sed 's/^/brute-single-xor /' > /tmp/xor_commands # with wordlist cat /tmp/variables /tmp/xor_commands - << EOF | ./langdon english = file:../../wordlists/english.txt wordlist english EOF
Or you can use entropy for this. I am planning to write something about the topic too, but basics will be enough now. Entropy essentially means how unpredictable your data is. A stream of NULL bytes (and bits, to be precise) will have an entropy of 0. Encrypted data should have very high entropy – close to 1.
Recall that I am using entropy as a percentage, that is <0; 1>. You may come across <0; 8> interval more often and that uses the term Bits of Entropy.
Common plaintext should have entropy of roughly 0.5. So should our mysterious ciphertext, because single-byte XOR does not change entropy! Other strings have slightly bigger entropy (they are more random). Feed proper commands into Langdon and use the power of Bash to get entries with the lowest entropy:
echo -e "multiline /tmp/ciphers a\nentropy" | \
./langdon | grep '^ ' | gawk '{print $(NF-1),$NF}' \
| sort -t' ' -k2 | head -n 3
The multiline command is designed for the exact purpose of loading multiple variables from one file that are separated by newline. Names will be deduced from the line index (starting at zero) and prefixed with given value (a). The result shows up immediately:
a_170 0.49832
a_102 0.53999
a_101 0.54040
Mission is complete, of course you can run single-byte-xor and wordlist on the most suspicious ciphertext.
1.5 – Implement repeating-key XOR
Repeating-key XOR is similar to single-byte XOR, but we do not repeat only one byte, but a phrase. That is something Langdon already knows. Just a quick validation:
*** m = 'Burning 'em, if you ain't quick and nimble I go crazy when I hear a cymbal' [.] m = b"Burning 'em, if you ai...zy when I hear a cymbal" *** k = ICE [.] k = ICE *** c = xor m k *** c~Hex Hex: 0x0b3637272a2b2e63622c2e69692a23693a2a3c6324202d623d63343c2a26226324272765272a282b2f20690a652e2c652a3124333a653e2b2027630c692b20283165286326302e27282f
1.6 – Break repeating-key XOR
First download the data and unbase it for convenience:
wget -qO- https://cryptopals.com/static/challenge-data/6.txt | base64 -d > /tmp/a
Now how to crack repeating-key XOR? First you should realize that XOR with repeating key can be dealt with as n single-byte XORs. For example, knowing that pjzhiktrcarsociomtvhlcexwksqkl ciphertext has been encrypted with key of length 5, you can split the ciphertext into groups of 5:
pjzhi ktrca rsoci omtvh lcexw ksqkl
and we know each column has been encrypted with single-byte XOR! But how to find out the key length?
Finding the key length with entropy
Entropy can help here, too. If you compute the entropy of one of the columns, on reasonably long ciphertext the result should reflect the language. If the result is higher, the column is too unpredictable -> the number of columns might be wrong. In Langdon there is transpose-entropy command that can be used to compute average entropy for data aligned into n “columns”:
*** a = file:/tmp/a *** transpose-entropy a 2 0.753474 3 0.756876 4 0.749756 5 0.749106 ... 28 0.677642 29 0.585834 30 0.677900 ...
There should be some dropping tendency, but a few big downward spikes (in the task it happens for key lengths 29 and 58). Focus on those, as the entropy essentially says: “If you align the data into 29 columns, they suddenly look less random.” It also happens for 58, because it is a multiple. If you have too many columns, each column has less bytes to analyze and is therefore less statistically significant. This is one of the reasons why later on the average entropy goes “fuzzy” and even lower than the correct solution.
Note: There is also chunk-entropy command if you want to get average entropies of “rows”.
Finding the key length with Hamming distance
Another way of getting the key length (and recommended by the authors of the task) is the Hamming distance. It is number of different bits in two chunks of same length. Hamming distance of 2 blocks of correct key length should be significantly smaller than of 2 blocks of incorrect length. This is because when correct length is used, you are computing Hamming distance of 2 plaintexts affected (encrypted) with the same key (=in the same way) and thus the resulting distance is not affected with the key. On the other hand, if the length is incorrect, you are essentially computing 2 plaintexts encrypted with different keys and because of them the distance is affected. If you want more thorough explanation and even the proof, check this. Langdon has hamming command that does the computing.
Solving each single-byte XOR
If you know the key length, you can solve each “column” of the ciphertext independently. Unfortunately, we cannot use dictionary attack anymore, because we are testing every n‘th letter. It is time for frequency analysis, which is something I am not a fan of. It uses the fact that in a language some letters are much more common than others. For this you need proper frequency table. Langdon has one for english language. It is worth mentioning that usually not only frequency tables of single characters are used, but also tables of digraphs and trigraphs. Anyway, this would be useless in this case.
The solution
Langdon has break-xor command designed to use Hamming distance to guess the key length and then applying frequency analysis to attempt to reveal the key. Use it as follows:
*** x = break-xor a english
...
*** x
Key: TerMinator X: Bring the noiSe
Plaintext: I'm\x00back and I'm ring...ay that funky Music\x00\n
Ciphertext: \x1dB\x1fM
\x0f\x02\x1fO...x05\x16I\x1e\x10'
\x11Mc
X will not hold just the resulting key now! Because the key might be somehow mutated due to inaccurate frequencies used and bad luck, the break-xor command returns an XOR object. The object has the advantage to recognize own elements (that is: plaintext, ciphertext, key) and can work with them. Another advantage is that in Langdon all such Algorithm objects have (or will have) their own documentation containing basic theory and thorough description of security flaws. You can get the help with help <algorithm> command, in this case help xor.
But wait! There’s more! I thought it would be cool if you have the opportunity to manually edit some element of the object and the rest would be automatically recomputed afterwards. Because of this I can let break-xor get the approximate key and then use my human language recognition to work out the rest. See the asciinema below to see how it works.
You can create algorithm objects manually like this:
*** c = file:/tmp/a [.] c = b'\x1dB\x1fM\x0b\x0f\x02...16I\x1e\x10\'\x0c\x11Mc' *** k = 'Terminator X: Bring the noise' [.] k = Terminator X: Bring the noise *** x = XOR key=k ciphertext=c *** p = decrypt x *** p~Raw Raw: b'I\'m back and I\'m ringin\' the bell \nA rockin\' on thb mike while the fly girls yell \nIn ecstasy in the back of me \nWell that\'s my DJ Desha~ cuttin\' all them Z\'s \nHittin\' harc and the girlies goin\' crazy \nVanilla\'s on the mike, man I\'m not lazy. ...
Remember the decrypt command, it tells the Algorithm to use ciphertext, key and possible other vital information and derive the plaintext. Naturally, encrypt command does the opposite.
Conclusion
Even though cryptography presented in this chapter is very low level, it is a good start, because XOR is present in currently used symmetric algorithms, such as AES. And that will be the topic for the next chapter. Today we learned that:
- there are
xor,and,or, andnotcommands for common bitwise operations, - Langdon can load multiple variables at once with the
multilinecommand, - XOR is very useful operation in cryptography, but you should not use it with too short key,
- entropy can tell us important things about weird data and Langdon can do simple
entropyof the whole data, averagechunk-entropyof consequential chunks or averagetranspose-entropyof every n’th byte chunks, - there is something called the Hamming distance,
- algorithms in Langdon will be implemented as objects with their own properties and we use
encryptanddecryptcommands on the whole objects to do the crypto.XORis one of them.