Welcome to another chapter of Crypto Series. Last time I demonstrated some basic mechanisms in Langdon Framework, today we will look at some practical use. We will learn something about XOR operation, entropy and will apply the knowledge to solve first 6 Cryptopals challenges. These are available at https://cryptopals.com/sets/1 for free!

1.1 – Convert hex to base64

In the first task you are expected to convert the string:

49276d206b696c6c696e6720796f757220627261696e206c696b65206120706f69736f6e6f7573206d757368726f6f6d

into Base64. It should produce:

SSdtIGtpbGxpbmcgeW91ciBicmFpbiBsaWtlIGEgcG9pc29ub3VzIG11c2hyb29t

Such conversion is luckily and purposefully Langdon’s basic functionality.

*** a = 0x49276d206b696c6c696e6720796f757220627261696e206c696b65206120706f69736f6e6f7573206d757368726f6f6d
[.] a = 0x49276d206b696c6c696e67...6f7573206d757368726f6f6d
*** a~Base64
Base64: SSdtIGtpbGxpbmcgeW91ciBicmFpbiBsaWtlIGEgcG9pc29ub3VzIG11c2hyb29t

NEXT!

1.2 Fixed XOR

Finally something new, an exclusive OR (XOR) operation. Langdon supports it and takes it very seriously:

  • different data lengths are supported (smaller pattern is repeated),
  • you can XOR in two ways – using a command as simple bitwise operation or as an Algorithm (used later in task 1.6),
  • you can invoke help page with help xor command. Later I am planning to write a post or two about XOR.

Now the situation is simple, xoring 0x1c0111001f010100061a024b53535009181c and 0x686974207468652062756c6c277320657965 should give us 0x746865206b696420646f6e277420706c6179.

*** a = 1c0111001f010100061a024b53535009181c
[.] a = 0x1c0111001f010100061a024b53535009181c
*** b = 686974207468652062756c6c277320657965
[.] b = 0x686974207468652062756c6c277320657965
*** c = xor a b
*** c~Hex
Hex:    0x746865206b696420646f6e277420706c6179

Note: Other common bitwise operations are also supported, namely and, or and not. They are not that useful in practical cryptography, but it is handy to have them available.

1.3 – Single-byte XOR cipher

We have 1b37373331363f78151b7f2b783431333d78397828372d363c78373e783a393b3736 value XORed with single unknown character. If the character was known, you could use same approach as in previous task and it would work. But we need a bruteforce here.

*** x = 0x1b37373331363f78151b7f2b783431333d78397828372d363c78373e783a393b3736
[.] x = 0x1b37373331363f78151b7f...2d363c78373e783a393b3736
*** brute-single-xor x

First a variable is created and then brute-single-xor is executed with it. The brute-single-xor command takes the variable and xores it with all 256 possible byte values, one at a time. Results are stored as <variable>_<byte>. Like this:

*** vars
   x  0x1b37373331363f78151b7f...2d363c78373e783a393b3736
x_00  \x1b77316?x\x15\x1b\x7f+x413=x9x(7-6<x7>x:9;76
x_01  \x1a66207>y\x14\x1a~*y502<y8y)6,7=y6?y;8:67
x_02  \x1955134=z\x17\x19})z631?z;z*5/4>z5<z8;954
x_03  \x1844025<{\x16\x18|({720>{:{+4.5?{4={9:845
x_04  \x1f33752;|\x11\x1f{/|0579|=|,3)28|3:|>=?32
...

Now we can subject variables to simple wordlist analysis:

*** english = file:../../wordlists/english.txt
[.] english = b'a\naa\naaa\naah\naahed...witterion\nzwitterionic'
*** wordlist english
[.]   Variable  Score
[.] Running thread (samples 1 through 32).
[.] Running thread (samples 33 through 64).
[.] Running thread (samples 65 through 96).
[.] Running thread (samples 97 through 128).
[.] Running thread (samples 129 through 160).
[.] Running thread (samples 161 through 192).
[.] Running thread (samples 193 through 224).
[.] Running thread (samples 225 through 256).
[.] Running thread (samples 257 through 257).
[+]   x_58:  1.000  Cooking MC's like a pound of bacon
[+]   x_78:  1.000  cOOKING\x00mc\x07S\x00LI...\x00POUND\x00OF\x00BACON

A wordlist of english words is loaded and then the wordlist command is used. This will test all words in each variable and see how many of them are present in the wordlist. That pinpoints 2 possible answers.

1.4 – Detect Single-byte XOR

Same task as previous, but you must also find the correct ciphertext. Download the data:

wget -O /tmp/ciphers https://cryptopals.com/static/challenge-data/4.txt

In a few cases the cryptopals challenges are way too specific and there is no reasonable benefit of full exact implementation. This is one of these situations. In this case Langdon will still deal with bruteforce, entropy and that stuff, but we will prepare commands and deal with the output in Bash.

There are 2 options available. You can do brute-single-xor for each variable and then use wordlist to find correct variable and XOR key. It is tedious, but here is a Bash script to do it for you (if you have time):

# prepare commands
sed = /tmp/ciphers | sed 'N;s/\n/ = 0x/' > /tmp/variables
cut -d ' ' -f 1 /tmp/variables | sed 's/^/brute-single-xor /' > /tmp/xor_commands

# with wordlist
cat /tmp/variables /tmp/xor_commands - << EOF | ./langdon
english = file:../../wordlists/english.txt
wordlist english
EOF

Or you can use entropy for this. I am planning to write something about the topic too, but basics will be enough now. Entropy essentially means how unpredictable your data is. A stream of NULL bytes (and bits, to be precise) will have an entropy of 0. Encrypted data should have very high entropy – close to 1.

Recall that I am using entropy as a percentage, that is <0; 1>. You may come across <0; 8> interval more often and that uses the term Bits of Entropy.

Common plaintext should have entropy of roughly 0.5. So should our mysterious ciphertext, because single-byte XOR does not change entropy! Other strings have slightly bigger entropy (they are more random). Feed proper commands into Langdon and use the power of Bash to get entries with the lowest entropy:

echo -e "multiline /tmp/ciphers a\nentropy" | \
./langdon | grep '^  ' | gawk '{print $(NF-1),$NF}' \
| sort -t' ' -k2 | head -n 3

The multiline command is designed for the exact purpose of loading multiple variables from one file that are separated by newline. Names will be deduced from the line index (starting at zero) and prefixed with given value (a). The result shows up immediately:

a_170 0.49832
a_102 0.53999
a_101 0.54040

Mission is complete, of course you can run single-byte-xor and wordlist on the most suspicious ciphertext.

1.5 – Implement repeating-key XOR

Repeating-key XOR is similar to single-byte XOR, but we do not repeat only one byte, but a phrase. That is something Langdon already knows. Just a quick validation:

*** m = 'Burning 'em, if you ain't quick and nimble I go crazy when I hear a cymbal'
[.] m = b"Burning 'em, if you ai...zy when I hear a cymbal"
*** k = ICE
[.] k = ICE
*** c = xor m k
*** c~Hex
Hex:    0x0b3637272a2b2e63622c2e69692a23693a2a3c6324202d623d63343c2a26226324272765272a282b2f20690a652e2c652a3124333a653e2b2027630c692b20283165286326302e27282f

1.6 – Break repeating-key XOR

First download the data and unbase it for convenience:

wget -qO- https://cryptopals.com/static/challenge-data/6.txt | base64 -d > /tmp/a

Now how to crack repeating-key XOR? First you should realize that XOR with repeating key can be dealt with as n single-byte XORs. For example, knowing that pjzhiktrcarsociomtvhlcexwksqkl ciphertext has been encrypted with key of length 5, you can split the ciphertext into groups of 5:

pjzhi
ktrca
rsoci
omtvh
lcexw
ksqkl

and we know each column has been encrypted with single-byte XOR! But how to find out the key length?

Finding the key length with entropy

Entropy can help here, too. If you compute the entropy of one of the columns, on reasonably long ciphertext the result should reflect the language. If the result is higher, the column is too unpredictable -> the number of columns might be wrong. In Langdon there is transpose-entropy command that can be used to compute average entropy for data aligned into n “columns”:

*** a = file:/tmp/a
*** transpose-entropy a
2     0.753474
3     0.756876
4     0.749756
5     0.749106
...
28    0.677642
29    0.585834
30    0.677900
...

There should be some dropping tendency, but a few big downward spikes (in the task it happens for key lengths 29 and 58). Focus on those, as the entropy essentially says: “If you align the data into 29 columns, they suddenly look less random.” It also happens for 58, because it is a multiple. If you have too many columns, each column has less bytes to analyze and is therefore less statistically significant. This is one of the reasons why later on the average entropy goes “fuzzy” and even lower than the correct solution.

Note: There is also chunk-entropy command if you want to get average entropies of “rows”.

Finding the key length with Hamming distance

Another way of getting the key length (and recommended by the authors of the task) is the Hamming distance. It is number of different bits in two chunks of same length. Hamming distance of 2 blocks of correct key length should be significantly smaller than of 2 blocks of incorrect length. This is because when correct length is used, you are computing Hamming distance of 2 plaintexts affected (encrypted) with the same key (=in the same way) and thus the resulting distance is not affected with the key. On the other hand, if the length is incorrect, you are essentially computing 2 plaintexts encrypted with different keys and because of them the distance is affected. If you want more thorough explanation and even the proof, check this. Langdon has hamming command that does the computing.

Solving each single-byte XOR

If you know the key length, you can solve each “column” of the ciphertext independently. Unfortunately, we cannot use dictionary attack anymore, because we are testing every n‘th letter. It is time for frequency analysis, which is something I am not a fan of. It uses the fact that in a language some letters are much more common than others. For this you need proper frequency table. Langdon has one for english language. It is worth mentioning that usually not only frequency tables of single characters are used, but also tables of digraphs and trigraphs. Anyway, this would be useless in this case.

The solution

Langdon has break-xor command designed to use Hamming distance to guess the key length and then applying frequency analysis to attempt to reveal the key. Use it as follows:

*** x = break-xor a english
...
*** x
Key: TerMinator X: Bring the noiSe
Plaintext: I'm\x00back and I'm ring...ay that funky Music\x00\n
Ciphertext: \x1dB\x1fM
                      \x0f\x02\x1fO...x05\x16I\x1e\x10'
                                                       \x11Mc

X will not hold just the resulting key now! Because the key might be somehow mutated due to inaccurate frequencies used and bad luck, the break-xor command returns an XOR object. The object has the advantage to recognize own elements (that is: plaintext, ciphertext, key) and can work with them. Another advantage is that in Langdon all such Algorithm objects have (or will have) their own documentation containing basic theory and thorough description of security flaws. You can get the help with help <algorithm> command, in this case help xor.

But wait! There’s more! I thought it would be cool if you have the opportunity to manually edit some element of the object and the rest would be automatically recomputed afterwards. Because of this I can let break-xor get the approximate key and then use my human language recognition to work out the rest. See the asciinema below to see how it works.

You can create algorithm objects manually like this:

*** c = file:/tmp/a
[.] c = b'\x1dB\x1fM\x0b\x0f\x02...16I\x1e\x10\'\x0c\x11Mc'
*** k = 'Terminator X: Bring the noise'
[.] k = Terminator X: Bring the noise
*** x = XOR key=k ciphertext=c
*** p = decrypt x
*** p~Raw
Raw:    b'I\'m back and I\'m ringin\' the bell \nA rockin\' on thb mike while the fly girls yell \nIn ecstasy in the back of me \nWell that\'s my DJ Desha~ cuttin\' all them Z\'s \nHittin\' harc and the girlies goin\' crazy \nVanilla\'s on the mike, man I\'m not lazy.
...

Remember the decrypt command, it tells the Algorithm to use ciphertext, key and possible other vital information and derive the plaintext. Naturally, encrypt command does the opposite.

Conclusion

Even though cryptography presented in this chapter is very low level, it is a good start, because XOR is present in currently used symmetric algorithms, such as AES. And that will be the topic for the next chapter. Today we learned that:

  • there are xor, and, or, and not commands for common bitwise operations,
  • Langdon can load multiple variables at once with the multiline command,
  • XOR is very useful operation in cryptography, but you should not use it with too short key,
  • entropy can tell us important things about weird data and Langdon can do simple entropy of the whole data, average chunk-entropy of consequential chunks or average transpose-entropy of every n’th byte chunks,
  • there is something called the Hamming distance,
  • algorithms in Langdon will be implemented as objects with their own properties and we use encrypt and decrypt commands on the whole objects to do the crypto. XOR is one of them.

Leave a Reply

Your email address will not be published. Required fields are marked *