Information theory concerns communications between transmitters and receivers. Transmitters aim to send some information (data) to receivers at the greatest possible rate while keeping the probability of error sufficiently small. Information theory gives expressions of the greatest possible rate C (called capacity) corresponding to a vanishing probability of error. These expressions are particularly useful for designing and optimizing communication networks.
The first and most famous example of such expression is the capacity of the additive white Gaussian noise (AWGN) channel, due to Shannon (1948). Since Shannon’s paper, a lot of work has been done to extend his results for a wide variety of channels and situations. In particular, the framework developed by Shannon permits to study the case of finite-alphabets and the extension to general-alphabets is usually done by heuristic discretization arguments. More recently, network information theory considers the case where there may be several transmitters (broadcast channel) or receivers (multiple access channel).
I. Coding problem, entropy and mutual information
Lecture 1: Coding problem (Information-Theory)
In the present lecture, we shall present the coding problem in the case of discrete-alphabets. The general problem is called the source-channel coding problem. But it will be helpful to consider separately the source coding and the channel coding.
Contenu: Channel coding and source coding, Claude Shannon information theory, information theory Claude Shannon, Claude e Shannon information theory, Claude Shannon theory of information, Shannon information theory, channel coding rate, source coding, channel coding, information theory Claude Shannon 1948, Shannon Claude information theory, Shannon information theory explained, Shannon theory of communication, information theory
Lecture 2: Entropy
We introduce Shannon’s entropy of a probability distribution on a discrete space (finite or countable) and study its basic properties. This notion is essential to prove Shannon’s coding theorems. In particular, Shannon’s source coding theorem allows to interpret the entropy as a notion of the amount of information « carried » by random variables.
Contents: Shannon’s entropy, Gibbs inequality, Entropy of random vectors.
Lecture 3: Mutual information
We shall introduce mutual information, between random variables on a discrete space (finite or countable) and study its basic properties. This notion is essential to prove source and channel coding theorems.
Conditional mutual information
Contents: Mutual information versus Kullback-Leibler divergence, Mutual information versus entropy, Data processing inequality, Kolmogorov’s formula, Chain rule for mutual information, Csiszar sum identity.
II. Source coding
Lecture 4: Source coding theorem
Roughly speaking, the source coding theorem states that the entropy of a source equals the number of bits needed to encode a source-symbol. In other words, we can achieve a coding of the source with a rate close to the entropy of a symbol and with evanescent error probability; and no better coding rate is possible. We shall make this statement more precise.
Source coding capacity
Source coding theorem
Compression point of view
Source coding theorem proof
Proof of achievability
Proof of converse
Lecture 5: Error-free source coding
We prove in the present lecture a theorem which strengthens the achievability part of Shannon’s source coding theorem stated in the previous lecture. More specifically, we shall introduce some error-free source coding schemes and prove that in this class the entropy still indicates the infimum of achievable coding rates. The price to pay for error-free coding is variable codeword length. The main tool is Kraft’s code.
Error-free source coding
Uniquely decipherable codes
Codes on trees and Kraft’s inequality
Error-free source coding theorem
Contents: Uniquely decipherable codes, Prefix-free codes, Codes on trees, Kraft’s inequality, Kraft’s code. .
Lecture 6: Optimal source codes
We shall construct codes based on the probability distribution of the source-symbol:
- Huffman’s code which is optimal in the sense of minimizing the mean codeword length for a given source distribution.
- Shannon-Fano-Elias code whose asymptotic average length (when coding a source-word of length going to infinity) is optimal.
Huffman’s optimal code
Optimality of the Huftman’s code
Contents: Huffman’s code, Shannon–Fano–Elias code.
Lecture 7: Parsing-translation and Tunstall’s codes
In the previous lesson we have shown that coding-rates close to the entropy of the source can be achieved by grouping source symbols in fixed-length blocks (words), then coding these source-words using some codes (e.g. Kraft’s, Huffman’s) optimized for the block distribution.
In this lesson, we shall see that equally small coding-rates can be achieved by some optimal (for the given source distribution) grouping of the source symbols in variable-length blocks, called parsing, then encoding these blocks with some simple, non-optimized codes.
This is the idea of Tunstall’s code proposed in 1967.
Valid parsing book
Mean formulae for a valid parsing book
Lecture 8: Universal source coding
Universal source coding consists in coding without prior knowledge of the source distribution. We shall show that for any coding rate R there exist universal codes (not depending on the source distribution) allowing for asymptotically error-free coding of any i.i.d source having the entropy smaller than R. This may be seen as an extension of Shannon’s first source coding theorem.
Universal source coding
Proof of universal source coding
Lecture 9: Lempel-Ziv source code
We shall present the Lempel-Zif algorithm introduced in 1977/78, which allows one to encode without errors an arbitrary source sequence (error-free universal source code).
Optimality of Lempel-Ziv coding
Coding-rates of IL finite automats
Optimality theorems of Lempel-Ziv
III. Channel coding
Lecture 10: Channel information capacity
We shall introduce a basic model of noisy communication channel and define its information capacity as the maximum of the mutual information — a measure of dependence derived from the notion of entropy — between the input and the output of the channel. Two illustrative examples are studied: binary symmetric channel and binary erasure channel.
Channel basic properties
Examples of channel’s information capacity
Lecture 11: Channel coding theorem
We introduce a general channel coding scheme, with its transmission rate and error probability. We prove a fundamental result that all transmission rates smaller than the channel information capacity (as introduced in the previous lecture) can be achieved by some coding schemes offering arbitrarily small error probabilities. This is the famous noisy-channel coding theorem formulated by Shannon in 1948. The proof makes use of a random coding argument and a decoding based on jointly typical sequences.
Channel coding theorem statement
Typical sequences for two random variables
Inequalities for mutual information
Proof of achievability
Analysis of the error probability
Proof of converse
Contents: Shannon coding theorem, channel coding theorem, jointly typical sequences, Fano’s inequality, channel capacity, channel coding rate, channel coding in digital communication, channel coding and source coding, channel’s coding capacity, achievable coding rate, supremum of achievable rates, set of achievable rates, Discrete alphabet channel coding theorem, Mutual information inequality, coding theorem achievability, coding theorem converse.
- Course title: « Information, coding theories: Source, channel coding, Shannon theorems » (2016-2024).
- Course location: ENS, 45, rue d’Ulm 75230 Paris Cedex 05. E. Noether room in the 2nd basement of the “AILE RATAUD” building; cf. ENS map.
B. Błaszczyszyn, M.K. Karray (2024). Information theory. Book in preparation.