What is maximum entropy

Entropy: IT

Back to Overview

entropy can be seen as a measure of the chance that is contained in a system or a sequence of information.

For every random event with several possible consequences, one has to distinguish between three terms:

  • the likelihood of each possibility
  • the number of possibilities (event space of a random experiment)
  • the information-theoretical entropy of the event

The entropy is more related to the term Number of possibilities, because it is derived from the number of possibilities and only made smaller and more manageable by taking the logarithm.

For the ideal one-time coin toss, for example:

  • Probability of each possibility 0.5
  • Number of possibilities: 2
  • Entropy = log2(2) = 1 bit

If you repeat the coin toss twice, then the following applies:

  • Probability of each possibility 0.25
  • Number of possibilities: 4
  • Entropy = log2(4) = 2 bits

The term in the Information theory is named in analogy to entropy in thermodynamics and statistical mechanics. Both terms have something in common, but cannot simply be equated with one another.

The information-theoretical understanding of the term entropy goes back to Claude Elwood Shannon and has existed since about 1948. That year Shannon published his fundamental work A Mathematical Theory of Communication and thus founded modern information theory.

Definition [edit]

Shannon defined entropy H given information I. over an alphabet Z by


in which pj is the probability that that j-th symbol zj of the alphabet Z in the information text I. occurs.

Don't get confused by this formula anytime soon. The best way to get along with the formula is to do a few arithmetic problems related to it. See entropy # arithmetic problems

Any computer with a programming environment offers another method of understanding this formula. You write a small computer program and let the variables vary in a loop. The matter becomes clearer, according to the motto: If I have not programmed it, then I have not yet understood it.

In the following, the formula is broken down into its individual parts to make it easier to understand.

Sum sign [edit]

The symbol Σ, which you may not be familiar with, only stands for a sum with several parts.


Example 2:

Or in general:

See also http://de.wikipedia.org/wiki/Summe#Notation_mit_dem_Summenzeichen

log2[To edit]

If you don't understand the sign logarithm, just take a look at the following page:

The Shannon definition uses the binary logarithm. Unfortunately, this is not found on the usual small computers in the PC.

On the Internet, for example, you can calculate it online here:

as a Linux user you can bc use: Linux-Praxisbuch: _bc Or use the following programs:

On normal pocket calculators you can calculate the binary logarithm with the natural (ln) logarithm: log2x: = ln (x) / ln (2)

Probability p [edit]

The probability p goes into the formula. If you haven't dealt with probability values ​​yet, just take a look at the following page:

p can have values ​​between 0 and 1.

  • p = 0
  • p = 0.5
    • Probability of tossing an ideal coin number or coat of arms.

Simple basic program for calculating H for p values ​​with equal rights [edit]

load "entropy.bas

5 REM ENTROPY FOR URN WITH X distinguishable BALLS 10 FOR X = 1 TO 16 20 Y = LOG (X): 24 P = 1 / X 27 ZZ = 1.44265 * LOG (X): rem binary logarithm 28 H = - P * 1.44265 * LOG (P) * X 29 H = INT (H * 100 + 0.5) / 100: rem round to 2 places after the decimal point 30 PRINT "Number of balls"; X 32 PRINT "Probability"; P 33 PRINT "Entropy" ; H 40 NEXT

The program outputs the following result:

Number of balls 1 probability 1 entropy 0 number of balls 2 probability .5 entropy 1 number of balls 3 probability .333333333 entropy 1.58 number of balls 4 probability .25 entropy 2 number of balls 5 probability .2 entropy 2.32 number of balls 6 probability. 166666667 entropy 2.58 number of balls 7 probability .142857143 entropy 2.81 number of balls 8 probability .125 entropy 3 number of balls 9 probability .111111111 entropy 3.17 number of balls 10 probability .1 entropy 3.32 number of balls 11 probability .0909090909 entropy 3.46 number of Balls 12 probability .0833333333 entropy 3.58 number of balls 13 probability .0769230769 entropy 3.7 number of balls 14 probability .0714285714 entropy 3.81 number of balls 15 probability .0666666667 entropy 3.91 number of balls 16 prob consistency .0625 entropy 4

The program is identical for different cubes with x equal possibilities.

Number of possibilities (event space of a random experiment) [edit]

The meaning of the term entropy is best understood when one realizes how many possibilities there are in a random selection. Then you still have to distinguish whether all possibilities are equal or whether the probability of the individual possibilities is different.

Example: number of possibilities in a throw

Coin: Number of possibilities 2 equals Tack: Number of possibilities 2 different probabilities 6-sided die: Number of possibilities 6 equals

Unit bit [edit]

Actually, the calculated entropy is dimensionless. Nevertheless, it receives a unit, namely that bit. The Shannon unit is unusual and has not caught on.

H multiplied by the number of characters in the information text results in the minimum number of bits required to represent the information.

In principle, the calculation of entropy can also be transferred to other number systems (e.g. octal system, hexadecimal system) than the binary system (base 2). In this case, the logarithm is not formed for base 2, but for the corresponding information-theoretical basis of the number system.

The generalization of the entropy for a multivariate random variable is called block entropy or composite entropy.

Interpretation [edit]

entropy Roughly speaking, it is a measure of the chance that is contained in a system or a sequence of information. The unit is the random information 1 bit defined as the Amount of information contained in a random decision on an ideal coin toss. An ideal coin toss has only two options - coat of arms or number - both of which occur with the same probability p = 0.5.

Shannon's original intention to use entropy as the measure of the required bandwidth of a transmission channel was quickly generalized. Entropy was generally regarded as a measure of the information content. For example, if the entropy in a binary decision has a value of 1 then the information is considered to be random. If the entropy is small, the information text contains redundancies or statistical regularities. The number intuitively indicates the average information contained in a symbol of the source.

The purely statistical calculation of the information-theoretic entropy according to the above formula is at the same time its limitation. For example, the probability is a 0 or 1 "1010101010 ..." can be found in an ordered character string, the same size as in a character string that has arisen from statistically independent events (e.g. repeated coin toss). Therefore, Shannon's entropy is identical for both strings, although intuitively the first string must be described as less random. A more appropriate definition of the entropy of a character string is provided by the conditional entropy and source entropy, both of which are based on compound probabilities.

Transinformation, which indicates the strength of the statistical relationship between two random variables, is also closely related to conditional entropy.

Example: tossing a coin [edit]

With a coin toss, heads or tails are ideally equally likely. If entropy is taken as a measure of uncertainty, it will have a maximum value in this simple dual system. It is completely uncertain whether “heads” or “tails” will be thrown on the next throw. The entropy is here called 1 bit Are defined.

This is different with a marked coin, for example a coin that shows “heads” on average in 60% of cases and “tails” in only 40% of cases. The uncertainty is less here than with the normal coin, as one has a certain preference for “heads”. Measured as entropy, the uncertainty is only around 0.971.

In the case of the coin, the probabilities of both possibilities add up to the sum 1, since there are only two possibilities.

In this case, the entropy can be calculated using Shannon's formula:

If she lied to the term2 do not understand (binary logarithm), then take a look at the following pages:

Replace one by the expression , we get the formula

If p = 0.5, then one can calculate H:

H = - (0.5 * (-1) + 0.5 * (- 1))
H = - (- 0.5 - 0.5)
H = 1

If p can assume different values ​​between 0 and 1, the relationship between p and H can be represented graphically as follows:

Relationship between probability and entropy

For each you can read off the entropy directly from it. The function is symmetrical to the straight line . She falls in very steeply to an entropy value of 0. Even with values ​​that are related to the safe event of approach, the entropy drops to 0.

This relationship applies to a random event. In the case of several random events, the individual entropies have to be added up and entropies over 1 can easily be achieved. The probability on the other hand, even with repetitions, it always remains between 0 and 1 by definition!

If you repeat the coin toss twice, the number of possibilities increases to four. The probability of each individual possibility is 0.25. The entropy of tossing a coin twice is then 2 bits. If you repeat an ideal coin toss several times, the entropy simply adds up. The entropy of a series of 20 ideal coin tosses is simply calculated: . This is shown in the following picture.

Entropy as a function of the number of coins tossed

You can't just go out one Calculate the value of the probability of the entropy. The entropy affects the entire random process. Every partial probability of a possible result is included in the calculation of the entropy of the overall process. Specifying a partial entropy for every possible result makes little sense. In Shannon's entropy formula, the sum of the probabilities should result in 1, otherwise there may be an ambiguous result.

If you save a sequence of coin flips as a bit sequence, then it makes sense to always represent “head” by 0 and “tails” by 1 (or vice versa). More compact codings are possible with marked coins. This can be obtained, for example, from the Huffman code.

Example: ideal cube [edit]

When throwing an ideal dice with six possibilities, the entropy is greater than 1. In general, it is greater than 1 for a random event from a random experiment with more than two equal possibilities in the result space. If there are equally probable possibilities in the result space, their value is calculated as follows:


The ideal cube has six possibilities in the result space. The entropy for throw once follows from this:

Entropy versus number of equal opportunities

The entropy of a roll of an ideal figure-of-eight cube is easy to calculate: It has eight equal possibilities.

The entropy of a toss with the ideal figure of eight corresponds to the entropy of three tosses with the ideal coin.

The adjacent figure shows the relationship between the entropy and the number of equal possibilities of a random experiment.

Arithmetic problems [edit]

Thumbtack [edit]

Thumbtack in 2 possible positions

1. Calculate the entropy of throwing a thumbtack whose probability of lying on your back is p = 0.3 and the probability of not lying on your back is 0.7. Use Shannon's formula to do this.

The drawing pin is an example of a binary random number generator with an unequal random distribution p unequal q.

8 dice [edit]

2. Calculate the entropy of the throw of an ideal eighth cube, the probability of which is p = 1/8 for each side.

6-sided die [edit]

3. Calculate the entropy of the throw of two ideal six-sided dice that are thrown at the same time.

  • Solution: H = 4.34 for indistinguishable cubes
  • Solution: H = 5.169925001442 for distinguishable cubes
  • Solution: Entropy: _Solutions # Task_3

Total 2 * dice [edit]

4. Calculate the sum of the entropies of two throws of an ideal six-sided die that is thrown twice in a row.

Urn with 3 balls [edit]

5. You have an urn with a red, a white and a black ball. The balls are indistinguishable when you pull them.

  • a) Calculate the probability of pulling for each color.
  • b) Calculate the entropy for a move out of the urn.
  • c) Calculate the entropy for 2 moves from the urn.
  • Solution a: p = 1/3
  • Solution b: H = 1.584962500721
  • Solution c: H = 3.169925001442 with replacement
  • Solution c: H = 2.584962500721 ​​without replacing
  • Solution: Entropy: _Solutions # Task_3
Scheme drawing of an urn

6 urn with replacement [edit]

6. How many bits of entropy is there in each move of this 6-ball urn under the assumption that all balls are drawn with the same probability p = 1/6?

6 urn without replacing [edit]

7. How big is the sum of the entropies of the 6 urn under the assumption that all balls are drawn with equal probability p = 1/6 and that the balls are not put back after each move? You draw until the urn is empty. First calculate the probability for each move, then the entropy for each move and then the sum of the entropies.

Lottery drawing [edit]

8. What is the entropy of a fictitious lottery drawing 1 out of 1 million possibilities.

Card game [edit]

All cards in a normal deck of cards

9. You have a normal deck of cards with 4 suits and 13 cards of each suit (ace, 2, 3, 4, 5, 6, 7, 8, 9, 10, jack, queen, king). The cards are shuffled extensively. What is the entropy when pulling a card? To do this, note the number of equal possibilities and then calculate the entropy.

9a. You have a deck of cards with 4 suits and 9 cards of each suit. Total number of cards 36, the cards are shuffled extensively. What is the entropy when pulling a card? To do this, note the number of equal possibilities and then calculate the entropy.

9b. You have a normal deck of cards with 4 suits and 13 cards of each suit (ace, 2, 3, 4, 5, 6, 7, 8, 9, 10, jack, queen, king). That means there are 52 different cards. The cards are shuffled extensively. What is the entropy of all possible card arrangements when moving one card after the other? Please note that the number of equal opportunities is reduced by 1 with each move.

  • Solution: H = 226 = ld 52! = binary logarithm of the factorial of 52
    • 52! = 52*51*50*49*48*.....*1
    • H = ld 52 + ld 51 + ld 50 + ld 49 ..... + ld1
    • More precise solution H = 225.5810031237027702

In this exercise you can see how practical it is to give the logarithm instead of the total number of possibilities. Most calculators can no longer calculate a factorial of 52, that's how big the number is.

Entropy and genes

You have a gene sequence and you are trying to calculate the entropy of this sequence.

1 gagcagcgcg cgcaagcagg ccaggggaag gtgggcgcag gtgaggggcc gaggtgtgcg 61 caggacttta gccggttgag aaggatcaag caggcatttg gagcacaggt gtctagaaac

For continuation see entropy: _biology # entropy_in_genetic_code

Links to further exercises [edit]

The entropy of arbitrary binary sequences [edit]

If you want to understand entropy, you can deal with the simplest information sequences in which the difference between high and low entropy can be seen at first glance. To do this, one takes binary number sequences of zero and one and tries to make a statement about entropy.

Chaitin gave a simple example:

What's the difference between




if you look at the consequences from the standpoint of entropy and order.

See also

The first sequence is highly ordered and can easily be compressed: ten times "10". The second sequence is a random sequence that was obtained by flipping a coin 20 times. That is the difference.That is why the entropy in the first sequence is low and close to zero. The entropy of the second sequence is high.

How can one calculate the entropy of these two 01 sequences?

To do this, you can use a statistical test for randomness, the so-called Runtests, operate.

Programming the run test [edit]

A program for the run test can be found here.

Unfortunately, the run test does not work with any 01 sequences, but only when the number of 0 and 1 is roughly the same. For example, a sequence 00000000000000000010 cannot be checked for randomness in the run test.

Example listing of an entropy calculation with the run test for a 01 sequence of fixed length [edit]

Gambas programming language

  • The length of the 01 sequence can be varied, here s = 8. (length is the variable s in the listing)
  • The number of zeros and ones is the same for the sake of simplicity.
  • The program starts by itself.
  • The results are output in the direct window.
  • Rating:
    • pz values ​​around 0 are a sign of high entropy. Pz values ​​above 1.5 or below -1.5 are a sign of low entropy
PUBLIC SUB Form_Open () s AS Integer 'Length of the binary number z AS Integer' Counter from 0 to 2 ^ s-1 zz AS Integer 'Counter from 1 to all variants with ozaehler = izaehler t AS String' binary number ozaehler AS Integer 'Number of the Zeros izaehler AS Integer 'Number of ones AS String' binary number as string n AS Integer 'Length of the binary number char AS String UM AS Float variance AS Float svar AS Float' Square root of the variance two AS Integer sum AS Integer run AS Integer nn AS Integer chari AS String charnext AS String run counter AS Integer pz AS Float 'Check variable corresponds to the entropy value of the sequence' pz Values ​​around 0 are a sign of high entropy.

Results output:

Graphical representation of the entropy values ​​of an 8 01 series
Sketch of a calculated order-entropy landscape of 01 episodes

Entropy of any 01 sequences [edit]

If you want to calculate the entropy value of any 01 sequence in which the number of zeros and ones clearly differs, you can use the following trick: You assign a double binary number to each digit of the original 01 sequence according to the following scheme: A 0 becomes a 01 and a 1 becomes a 10. Example:

000000 > 010101010101 010010 > 011001011001 111111 > 101010101010

The resulting binary number is twice as long, can be clearly assigned to the original number and can be processed with the runtest without any problems. The entropy of the original number can then be calculated back from the calculated pz number if the entropy 0 is assigned to the largest or negative smallest number. A pz number of 0 means a maximum entropy which corresponds to the length of the original binary number. All pz numbers in between are simply converted proportionally. The proportional conversion is chosen arbitrarily, but it can be used as a first approximation. So you can assign an entropy value to any binary number. The whole thing can simply be implemented as a computer program as a supplement to the above basic program. You then enter any 01 sequence and the pc calculates the associated entropy.

With this method, however, information is also lost, because 00001111 becomes, for example, 0101010110101010. The initially symmetrical order becomes more of a repetitive order with the opposite sign of the pz number.

One notices very quickly that, for practical reasons, the calculation is limited to a certain length of the binary number, otherwise the PC will calculate too long.

A certain amount of calibration is also useful. A highly ordered sequence that consists only of zeros or only ones receives, by definition, the entropy zero and the pure random sequence with a pz below a limit with an error probability p to be specified (for example p <0.05) receives the maximum possible entropy, that of the length corresponds to the binary number.

Entropy and language [edit]

Probability and entropy of individual letters in the English language

No. Letter Probability Entropy 1 a .0575 4.1 2 b .0128 6.3 3 c .0263 5.2 4 d .0285 5.1 5 e .0913 3.5 6 f .0173 5.9 7 g .0133 6.2 8 h .0313 5.0 9 i .0599 4.1 10 j .0006 10.7 11 k .0084 6.9 12 l .0335 4.9 13 m .0235 5.4 14 n .0596 4.1 15 o .0689 3.9 16 p .0192 5.7 17 q .0008 10.3 18 r .0508 4.3 19 s .0567 4.1 20 t .0706 3.8 21 and .0334 4.9 22 v .0069 7.2 23 w .0119 6.4 24 x .0073 7.1 25 y .0164 5.9 26 z .0007 10.4 27 - .1928 2.4

Maximum entropy value and normalization [edit]

If one would like to have a standardized measure for the entropy of any discrete distribution, it is advantageous to use the maximum possible entropy that is possible with uniform distribution of the is achieved to be used for normalization. Be the number of symbols allowed in above the alphabet , then it is maximum entropy given by:

It follows for example for a binary distribution (), so you need 1 bit per character and Sign for complete information . This value is reached when 0en and 1en occur equally often. If one now normalizes the entropy of an arbitrary distribution different symbols with you get: