A Mind at Play: How Claude Shannon Invented the Information Age Published:
Modern man lives isolated in his artificial environment, not because the artificial is evil as such, but because of his lack of comprehension of the forces which make it work—of the principles which relate his gadgets to the forces of nature, to the universal order. It is not central heating which makes his existence "unnatural," but his refusal to take an interest in the principles behind it. By being entirely dependent on science, yet closing his mind to it, he leads the life of an urban barbarian.
— Arthur Koestler, The Act of Creation
It's important to be willing to learn how the world works. Not only because of the personal benefits, but also as a sign of respect to those who worked to build our modern world. I don't mean to say we should strive for a complete understanding of every technology we use; that's impossible. However, it's enriching to have some level of understanding behind the ingenious ideas that have brought us to these great heights of prosperity, in society and technology.
It's natural to want rememberance and recognition for a positive impact one has had on the world—but it's also common to yearn for more than just recognition. Fame, wealth, and prestige are all fair to expect if you're responsible for an important breakthrough or invention. This is what makes Claude Shannon's indifference to all these things so interesting. He had no desire for the spotlight; he was simply a curious individual that wanted to understand the underlying patterns of the world around him.
A Mind At Play
A Mind At Play is a biography of Claude Shannon's life. It was authored by Jimmy Soni and Rob Goodman—both of whom have no formal technical education. They were inspired to write the book after learning about Shannon from a friend. They found it shocking that someone considered a founding father of the digital age, and the inventor of the most important theory in the history of communications, is largely unknown to the general public. Although his impact to science and technology is comparable to the likes of Einstein, Tesla, and Von Neumann, he hasn't received anything close to a similar level of fame.
As the title suggests, Shannon is known for ushering in the digital communication era due to his groundbreaking thesis, A Mathematical Theory of Communication. His theory, commonly referred to as Information Theory, completely changed how scientists and engineers viewed communication.
Claude Shannon's paper accomplished two things that would go on to revolutionize communications. Firstly, he formalized an abstraction which could be used to describe ANY type of communications medium. And secondly, Shannon devised an ingenious schema (with proofs to support it) that allows for ANY message to be sent without error, no matter how noisy the transmission channel is.
Shannon's paper provided the ability to talk about any transmission medium using the same language. This includes telegraphy, telephony, TV, radio, human speech...anything which involves the transfer of information from a source to a receiver. The features that Shannon decided are integral components of communication are illustrated in the block diagram below, taken from his original paper:
As part of this abstraction, Shannon needed to rigorously define what was meant by "information" and how to quantify it. This proved to be one of his most astounding leaps in ingenuity.
Before Shannon's paper, it was assumed that the amount of information in a message is tied to the meaning of the message. Shannon showed that it's only dependent on the encoding used to send the message. In other words, it's the content of the message which determines the amount of information it contains, not how the receiver interprets it.
This may sound quite obvious to our modern ears. But it's because our abundance of information technologies, and the agnosticism to the type of information processed, has shaped our intuitions much differently than in Claude Shannon's era. In his own words, information is the "resolution of uncertainty".
Predictable Symbols
I'll try to explain this concept of uncertainty resolution with an example.
In English, we have a 26 letter alphabet. If you transmit a single letter from this set of 26 possible "symbols", you are sending objectively more information than if you sent a single binary digit, for example. This is because the binary "alphabet" is composed of only two symbols: 0 or 1. Furthermore, in the context of transmitting messages from the English alphabet, sending the letter "Q" provides more information to the receiver then sending an "S". This is because Q is much less likely to appear in English text than S, so a Q tells the receiver more. To illustrate this better, imagine receiving the following message from someone :
I hid the keys underneath the [S|Q]...
where the final letter is either an S or a Q.
If for some reason the rest of the message was interrupted before you could receive it, your chances of finding the hidden keys would be much different depending on what that final letter received was. Since there are far more words which start with the letter S than with the letter Q^1^, you would actually have received more information if the final letter was a Q. I can think of a handful of household items which begin with a Q (Quilt, Qur'an, Quiche), but I definitely would not want to start checking underneath everything in my house that begins with an S.
Similarly, say the letter was a Q but you actually received one more letter of the word after that—would you be able to guess what it is? Of course you would, because nearly every occurrence of the letter Q in the English language is followed by a "U". Because of its predictability, it's also fair to say that the following "U" carries very little information with it. These patterns of redundancy in English are very common, so much so that Shannon once estimated that as much as 80% of all English text on earth could be stripped away without us losing any information! However, as I will explain shortly, in the realm of communications this redundancy serves an important purpose.
From Theory to Application
I'm a machine and you're a machine, and we both think, don't we?
— Claude Shannon, pg. 337
With the basic components of communication and a more useful definition of information in place, Claude Shannon could now solve two important problems that had plagued the transmission of information up until then.
The first problem was figuring out how to transmit information through a limited bandwidth channel in the most efficient way possible. The advantage of knowing this was quite valuable for Shannon's employer at the time, Bell—the owner of the largest telephone infrastructure network in America at the time.
Shannon's second breakthrough was providing a way to guarantee a message will be received accurately over a noisy channel, essentially limiting the error rate to be as arbitrarily low as desired. This was an enormous achievement in the field of communications, bringing accuracy to noisy channels such as the Trans-Atlantic undersea cables, which hadn't had much success up until then.
As it turns out, you can't have your cake and eat it too. Or, you can't have perfectly efficient information transfer without the risk of errors too. As it is with most technologies, desirable qualities usually come at a trade-off. In this case it's not a mutually exclusive trade-off; a transmission encoding can still utilize Shannon's theorems and see gains in both efficiency (or "speed") and accuracy.
To achieve the highest information transfer efficiency possible, you must first understand the domain of information you want to transfer and how you want to represent that information in a finite set of symbols. You must then statistically quantify how often the symbols appear—this is the tricky part because information transfer is a stochastic process, which means it is partly random and partly deterministic. The better you can predict these frequencies, the better your encoding scheme can be.
For example, if you want to develop an encoding for English text, your set of symbols would consist of the alphabet, plus maybe digits and some punctuation marks. The first question you must answer is how to transform these symbols into units of information for transfer over a physical channel. In digital communication, this is usually electronic sequences of bits where a single bit is a HIGH or LOW state.
The next step is defining the translation to and from your set of symbols to sequences of bits. You might use your predetermined letter frequencies to decide you should use the least amount of bits to represent an "E" and more bits to represent rarer symbols like "Q". This will guarantee better transmission rates. The exact procedure for making these codes was first proposed by Shannon in his paper, and later improved by Robert Fano and then David Huffman.
The hard part is optimizing your encoding scheme, because as you optimize it more you will increase the likelihood of transmission errorrs. As an example, suppose you notice that certain words appear far more often than the rarer letters, like "the" or "to". You may consider assigning them shorter bit sequences than those rare letters, essentially expanding the set of symbols you can use to represent information. What could be wrong with that?
Well, nothing is wrong with that technically. If you extend this idea further, there is essentially nothing wrong with using single symbols to represent entire sentences and paragraphs! If you can identify these common patterns and assign them unique encodings which are shorter than the encoding would've been with the original symbols, your scheme will certainly be more efficient.
Problems start to arise as your encoding scheme starts to use up the domain of possible bit permutations. If you consider that an encoding is composed of discrete sequences of information units, there are only a finite amount of permutations of these units. The symbol encodings will start to get closer together, perhaps only differing by a single bit. At this point your über efficient encoding scheme is now vulnerable to errors!
Can You Hear Me Now?
From the God's-eye view, there is a law of tides; from our earthbound view, only some petty local ordinances.
— Jimmy Soni, A Mind at Play, pg. 49
Every physical transmission channel is subject to some amount of noise. If you're in a loud room speaking with a friend, your conversation is (literally) subject to noise and the chance of a misunderstanding is definitely non-zero. Some channels are inherently noisier than others, and what is considered a tolerable error rate is dependent on the application and context. How can you reduce it though? The answer is simple: you make your encoding scheme less efficient.
Figuring out what sort of encoding reduces errors is where mathematical rigor is going to help the most. From what I understand, there are two main strategies. The first is simple redundancy. Repeating yourself will inherently protect your message from failing due to single transient errors, since everything will be transmitted multiple times. Hope that's intuitive enough to understand. After all, it's what we do in-person when someone doesn't hear us.
The second strategy involves how you actually design your encoding scheme and how much "distance" you have between different codes. Some really smart people took Shannon's ideas and formalized this strategy. It was another Bell Labs alumnus, Richard Hamming, who invented a generalized family of linear-error detecting and correcting codes called Hamming codes. These sorts of codes are guaranteed to be "resistant" to a certain level of error; they can be single-error tolerant, double-error tolerant etc.
And that's basically it as far as my understanding of Shannon's information theory goes. The crucial consequence of Shannon's thesis was that he showed everything can be digitized—every type of communication, every type of information. This was a groundbreaking revolution in what was mostly an analog world at the time. He laid the first stone in what would become the vast digital landscape we find ourselves in today.
The Medium is the Book
How about the book though? I've mostly tried to explain information theory so far, which I'm hoping turned out somewhat coherent.
A Mind At Play was a pretty interesting read. I think that's largely due to my interest in the subject matter, but Claude Shannon was also a remarkable person. I found a lot to admire about Shannon, especially his indifference to fame and wealth. By all accounts it seems like Shannon was driven by his curiosity in how things work; he couldn't live with a problem and not knowing the solution. That's the sort of motivation I aspire to have throughout life. Shannon's other defining characteristic was his playful nature and his fascination with toys and games. What was remarkable was when these two qualities intersected—Shannon would do things like write papers and build machines to explore the games he was interested in. I actually decided to build a chess engine after reading his 1965 paper about how one would go about programming one, long before it was possible to do so. His explanation was so clear and simple that I decided to give it a try.
The book itself wasn't the greatest non-fiction I've ever read, nor even the best biography I've ever read. I enjoyed Steve Jobs biography more, it did a better job of exploring his character. A Mind At Play felt too encyclopedic at times, spending too much time on details that don't tell you anything new about the main subject. This might be because it's a postmortem biography and most of Shannon's life occurred half a century ago. They couldn't exactly interview the guy.
Overall, A Mind at Play kept me entertained and engaged throughout. If you are interested in technology at all I'd definitely recommend it.