Someone will essentially tell you nothing if they tell you a fact. It is fair to say that something has been said.

Claude Shannon's theory of information is based on this difference. Theoretical framework for estimating the amount of information needed to accurately send and receive a message was introduced in a 1948 paper.

It's time for an example.

There is a trick coin with heads on both sides. I will flip it two times. It takes a lot of information to communicate the result. You have complete certainty that both flips will come up heads after you receive the message.

Abstractions navigates promising ideas in science and mathematics. Journey with us and join the conversation.

I do my two flips with a normal coin, heads on one side and tails on the other. The result can be communicated using a pair of letters: 0 for heads and 1 for tails. Each possible message requires two bits of information.

What is the purpose? It took zero bits to send a message in the first scenario. In the second you had a 1-in-4 chance of guessing the right answer, and the message needed two bits of information to clear up the confusion. It takes more information to convey a message if you don't know what it is.

Shannon was the first to make this relationship mathematical. He used a formula to calculate the minimum number of bits needed to send a message. If a sender uses less bits than the minimum, the message will be distorted.

Tara Javidi is an information theorist at the University of California, San Diego.

The term "entropy" is a measurement of disorder. A cloud has more ways to arrange water molecule than an ice cube does. There are so many possibilities for how the information can be arranged that a random message has a high Shannon entropy. Both physics and information theory use the same method for calculating the amount of energy in the universe. In physics, a number of physical states are taken into account. It is the logarithm of possible event outcomes.

The number of yes-or-no questions needed to determine the content of a message is one of the ways to think about Shannon entropy.

Imagine a pair of weather stations, one in San Diego and the other in St. Louis. Each wants to get a seven-day forecast from the other. You have high confidence in the forecast because San Diego is usually sunny. There is less certainty about the weather in St. Louis.

A black-and-white photo of a man sitting in front of an early computer.

How many questions would it take to transmit each day's forecast? If all seven days of the forecast are sunny, San Diego might be a good place to start. You have determined the entire forecast if the answer is yes and there is a good chance it will be. You have to work your way through the forecast to find out if the first day is sunny. The second, what about it?

The more certain you are about the message, the less questions you will need to ask.

Consider two different versions of the game. I chose a random letter from the English alphabet and I want you to guess it. The average number of questions it will take you to get it is 4.7. The first question to ask is, "Is the letter in the first half of the alphabet?"

Instead of guessing the value of random letters, in the second version of the game, you are trying to guess the letters in English words. You can use the fact that some letters appear more often than others to tailor your guess, and that knowing the value of one letter helps you guess the value of the next. Shannon calculated that the English language has 2.62 bits per letter, or 2.62 yes-or-no questions, which is less than you would need if each letter appeared randomly. Patterns make it possible to communicate a lot with relatively little information.

Better or worse questions can be asked in these examples. The floor is the minimum number of questions needed to convey a message.

There is a limit to the speed of light. Shannon entropy is a limit on how much we can compress a source.

Information compression technology uses Shannon entropy as a benchmark. The reason you can zip a large movie file is due to the fact that the colors of the screen have a pattern. Probable models can be built for patterns of colors. The models can be used to calculate the Shannon entropy by taking the logarithm of the weight and assigning it to different patterns. The absolute most a movie can be compressed before you lose information is the value.

Performance can be compared to this limit. You have an incentive to find a better method if you are far away. The information laws of the universe prevent you from doing better.