Information Theory, Entropy, and Key Strength

CS 463 Lecture, Dr. Lawlor

"Entropy" has a fairly technical definition in crypto work: it's a measure of the unexpectedness of a message. The concept and equations come from information theory.

Here's the conceptual model:

Sentence start	Low entropy ending	Higher entropy ending
Hey, what's	going on?	on fire?!
Want to	get lunch?	TiVo Le Voyage dans la Lune?
How are	you doing today?	my face spiders?

(I find entropy funny and refreshing, at least for a while. The random sentence generator in the homework, or SCIgen, are really amusing for the first few tries, though it does get old eventually.)

Mathematically, entropy can be quantified as:
entropy = sum over messages of (probability of message * - log(probability of message) );

The base used for the logarithm is a matter of taste. Computer folks often use a base-2 log, which gives you an entropy in bits.

Probability runs from 0 to 1, while -log runs from +infinity to 0 over that range. This means probability-1 events or probability-0 events both have no entropy--there's no uncertainty if it's guaranteed to happen, or can't happen.

Probability	-log₂	bits of entropy	Why?
1	0	0	It always happens, so there's no uncertainty.
0.5	1	0.5	Happens half the time.
0.1	3.3	0.33	-log is bigger, but probability shrinks faster.
0.01	6.6	0.066	as above, but worse.
0	inf	0	Never happens, so doesn't contribute any entropy.

Probability is a fairly slippery concept (what's the probability for a byte of 0x61?), but there are several ways to measure entropy directly. For example, if we run a data compression algorithm like gzip or bzip2 on a dataset, it will remove some of the redundancies from a low-entropy dataset, but won't be able to shrink a high-entropy dataset. Here are the data sizes, in bytes, that I measured for ten types of data:

	Raw	Gzip -9	Bzip2	7zip	Comments
Binary zeros	1048576	1057	45	327	All zeros has the lowest possible entropy: 0 bits.
Wikipedia XML	1048576	359184	287769	294871	Good compression rate, typical for text. Inferred entropy is about 2.5 bits of entropy per byte of the file (30%).
H.P. Lovecraft Text	1048576	402926	309576	328106	Similar.
gdb ELF64	1048576	406290	386042	332042	Nominally a binary file, but has ASCII strings, and x64 machine code and tables with lots of zeros.
Random ASCII hex	1048576	607628	544118	553549	Nominally ASCII, but very unpredictable. Expected value is 4 bits of entropy per byte of the file (50%).
Base64 random data	1048576	796856	795415	806685	Expected value is 6 bits of entropy per byte of the file (75%).
Tarball	1048576	1039974	1050758	1046726	Already gzip'd, so most entropy already gone.
Zipfile	1048576	1041544	1050592	1047977	Similar.
JPEG	1048576	1047418	1044382	1055571	After the DCT phase, JPEG uses a huffman encoding already.
Random binary data	1048576	1048761	1053340	1063068	Purely random binary data is almost perfectly incompressible--very nearly 1 bit of entropy per bit of the file (100% entropy).

Generally, any kind of human-readable text has quite low entropy, which means there are redundancies that can be found even by a general-purpose compression program.

Entropy for crypto

For an attacker, entropy is the enemy. Low entropy anywhere, especially via repeating keys or text, will produce measurable statistical correlations, which are the basis of much of cryptanalysis. By contrast, more entropy means a bigger and less predictable key search space, with fewer and more difficult to detect redundancies and correlations. The central issue is this:

An n-bit key can represent 2ⁿ possible values.

The entropy in a cipher's key (its "key strength, in bits") is thus a measure of how difficult it is to break the cipher via brute force.

Cipher	Key Entropy	Cryptanalysis?
Rot-13	0 bits (no key)	Trivial
Rot-K	< 5 bits (k)	Easy--brute force, known plaintext, letter frequency statistics, ...
DES	<= 56 bits	Medium--brute force is currently feasible
AES-128	<= 128 bits	Very Hard
OTP	1 bit per message bit	Not possible (all plaintexts equally likely)

The operation of the cipher typically sets an upper limit on the number of bits of key entropy that can be used, although this number is somewhat flexible in both directions. You can increase the effective key entropy somewhat by running the cipher repeatedly with different keys; for example, two passes of AES with two different keys approximately doubles the key length. If the attacker doesn't know which cipher you used, this gains you a few more bits (although there aren't very many likely possibilities, so you don't gain many bits this way).

Sadly, it's very easy to decrease the key entropy. For example, a 10-byte password has only 10*8=80 bits of entropy, so even if it's 100% binary garbage you're still well below AES-128's potential. If you restrict yourself to printable ASCII, you're down to only 10*6=60 bits of entropy, which is approaching feasible to brute force. Letter frequency and coincidence analysis (more 'a' than 'z'; more 'qu' than 'qk') can cut this to only a few bits of entropy per byte, putting an attacker well into brute force territory. If you use a random word from the dictionary, you're at only 17 bits of entropy (assuming 2¹⁷=128K words in the dictionary). If you use the word "password", you're at nearly 0 bits of entropy.

Generally, humans are poor entropy sources, and poor storage devices, yet passwords rely on both of these. Longer passphrases have been recommended, although I've found many sites have limits as low as 16 characters.

Known Plaintext/Ciphertext Attacks

Whatever entropy is in the key, poor ciphers tend to leak out this entropy into the ciphertext. This is especially hard to prevent wherever the plaintext has low entropy, such as a leading "ACHTUNG!", or a trailing "END TRANSMISSION". A well-designed cipher keeps the key secret even against a known-plaintext, known-ciphertext attack, typically by scrambling it with a well-designed substitution box (coming soon!).

For a trivial cipher, if we know the plaintext and ciphertext, we can reconstruct the key:

Name	Encryption	Decryption	To crack?
Rot-13	out[i]='A'+((in[i]-'A'+13)%26);	(same)	Run decryption.
Rot-K	out[i]='A'+((in[i]-'A'+K)%26);	K = 26-K;	K=(out[i]-in[i]+26)%26; // for any i
XOR-fixed	out[i]=in[i]^K;	(same)	K=out[i]^in[i]; // for any i
XOR-cycle	out[i]=in[i]^K[i%K.size()];	(same)	K[i]=out[i]^in[i]; // for each i
Bits-1	out[i]=ror(in[i],K);	rol	Try the 8 possible K's.
Bits-2	x=ror(in[i],K); out[i]=x^L;	x=out[i]^L; in[i]=rol(x,K);	Try the 8 possible K's. L=ror(in[i],K)^out[i];
Bits-3	x=ror(in[i],K); y=x^L; out[i]=ror(y,M);	y=rol(out[i],M); x=y^L; in[i]=rol(x,K);	Try the 8*8 possible M's and K's. L=ror(in[i],K)^rol(out[i],M);

"ror" is a circular bit rotate to the right: uint ror(uint v,uint bits) { return (v>>bits)|(v<<(32-bits)); }
"rol" rotates the other direction.
"^" is the bitwise XOR operation.

Bit rotation does a good job of mixing up values, resulting in binary ciphertext that *looks* random (and hence high entropy). But even a complex series of bit rotations and arithmetic can normally be unwound from either end, allowing an analyst to solve for the operations in the middle, especially if they can (1) reverse-engineer, guess, or bruteforce the algorithm; and (2) guess at least a portion of the plaintext.