Video lectures on information-theory

Information theory concerns communications between transmitters and receivers. Transmitters aim to send some information (data) to receivers at the greatest possible rate while keeping the probability of error sufficiently small. Information theory gives expressions of the greatest possible rate C (called capacity) corresponding to a vanishing probability of error. These expressions are particularly useful for designing and optimizing communication networks.
The first and most famous example of such expression is the capacity of the additive white Gaussian noise (AWGN) channel, due to Shannon (1948). Since Shannon’s paper, a lot of work has been done to extend his results for a wide variety of channels and situations. In particular, the framework developed by Shannon permits to study the case of finite-alphabets and the extension to general-alphabets is usually done by heuristic discretization arguments. More recently, network information theory considers the case where there may be several transmitters (broadcast channel) or receivers (multiple access channel).

I. Coding problem, entropy and mutual information

Lecture 1: Coding problem (Information-Theory)

PDF, Vidéo
In the present lecture, we shall present the coding problem in the case of discrete-alphabets. The general problem is called the source-channel coding problem. But it will be helpful to consider separately the source coding and the channel coding.

Coding problem
Source coding
Channel coding
Source-channel coding

Contenu: Channel coding and source coding, Claude Shannon information theory, information theory Claude Shannon, Claude e Shannon information theory, Claude Shannon theory of information, Shannon information theory, channel coding rate, source coding, channel coding, information theory Claude Shannon 1948, Shannon Claude information theory, Shannon information theory explained, Shannon theory of communication, information theory

Lecture 2: Entropy

We introduce Shannon’s entropy of a probability distribution on a discrete space (finite or countable) and study its basic properties. This notion is essential to prove Shannon’s coding theorems. In particular, Shannon’s source coding theorem allows to interpret the entropy as a notion of the amount of information « carried » by random variables.

Conditional entropy
Fano’s inequality
Entropy rate

Contents:  Shannon’s entropy, Gibbs inequality, Entropy of random vectors.

Lecture 3: Mutual information

We shall introduce mutual information, between random variables on a discrete space (finite or countable) and study its basic properties. This notion is essential to prove source and channel coding theorems.

Kullback-Leibler divergence
Mutual information
Conditional mutual information

Contents: Mutual information versus Kullback-Leibler divergence, Mutual information versus entropy, Data processing inequality, Kolmogorov’s formula, Chain rule for mutual information, Csiszar sum identity.

II. Source coding

Lecture 4: Source coding theorem

Roughly speaking, the source coding theorem states that the entropy of a source equals the number of bits needed to encode a source-symbol. In other words, we can achieve a coding of the source with a rate close to the entropy of a symbol and with evanescent error probability; and no better coding rate is possible. We shall make this statement more precise.

Source coding capacity
Source coding theorem
    Compression point of view
    Typical sequences
Source coding theorem proof
    Proof of achievability
    Proof of converse

Lecture 5: Error-free source coding

We prove in the present lecture a theorem which strengthens the achievability part of Shannon’s source coding theorem stated in the previous lecture. More specifically, we shall introduce some error-free source coding schemes and prove that in this class the entropy still indicates the infimum of achievable coding rates. The price to pay for error-free coding is variable codeword length. The main tool is Kraft’s code.

Error-free source coding
Uniquely decipherable codes
Codes on trees and Kraft’s inequality
Error-free source coding theorem

Contents: Uniquely decipherable codes, Prefix-free codes, Codes on trees, Kraft’s inequality, Kraft’s code. . 

Lecture 6: Optimal source codes

We shall construct codes based on the probability distribution of the source-symbol:

  • Huffman’s code which is optimal in the sense of minimizing the mean codeword length for a given source distribution.
  • Shannon-Fano-Elias code whose asymptotic average length (when coding a source-word of length going to infinity) is optimal.

Huffman’s optimal code
    Optimal code
    Huffman’s code
    Optimality of the Huftman’s code
Shannon-Fano-Elias code

Contents: Huffman’s code, Shannon–Fano–Elias code.

Lecture 7: Parsing-translation and Tunstall’s codes

In the previous lesson we have shown that coding-rates close to the entropy of the source can be achieved by grouping source symbols in fixed-length blocks (words), then coding these source-words using some codes (e.g. Kraft’s, Huffman’s) optimized for the block distribution.
In this lesson, we shall see that equally small coding-rates can be achieved by some optimal (for the given source distribution) grouping of the source symbols in variable-length blocks, called parsing, then encoding these blocks with some simple, non-optimized codes.
This is the idea of Tunstall’s code proposed in 1967.

Parsing-translation code
Valid parsing book
Mean formulae for a valid parsing book
Tunstall’s code

Lecture 8: Universal source coding

Universal source coding consists in coding without prior knowledge of the source distribution. We shall show that for any coding rate R there exist universal codes (not depending on the source distribution) allowing for asymptotically error-free coding of any i.i.d source having the entropy smaller than R. This may be seen as an extension of Shannon’s first source coding theorem.

Universal source coding
Empirical distributions
Proof of universal source coding

Lecture 9: Lempel-Ziv source code

We shall present the Lempel-Zif algorithm introduced in 1977/78, which allows one to encode without errors an arbitrary source sequence (error-free universal source code).

Lempel-Ziv algorithm
Coding automat
Optimality of Lempel-Ziv coding
    Coding-rates of IL finite automats
    Optimality theorems of Lempel-Ziv

III. Channel coding

Lecture 10: Channel information capacity

We shall introduce a basic model of noisy communication channel and define its information capacity as the maximum of the mutual information — a measure of dependence derived from the notion of entropy — between the input and the output of the channel. Two illustrative examples are studied: binary symmetric channel and binary erasure channel.

Channel basic properties
Information capacity
Examples of channel’s information capacity

Lecture 11: Channel coding theorem

We introduce a general channel coding scheme, with its transmission rate and error probability. We prove a fundamental result that all transmission rates smaller than the channel information capacity (as introduced in the previous lecture) can be achieved by some coding schemes offering arbitrarily small error probabilities. This is the famous noisy-channel coding theorem formulated by Shannon in 1948. The proof makes use of a random coding argument and a decoding based on jointly typical sequences.

Channel coding theorem statement
Typical sequences for two random variables
Inequalities for mutual information
Proof of achievability
    Random coders
    Channel outputs
    Analysis of the error probability
Proof of converse

Contents: Shannon coding theorem, channel coding theorem, jointly typical sequences, Fano’s inequality, channel capacity, channel coding rate, channel coding in digital communication, channel coding and source coding, channel’s coding capacity, achievable coding rate, supremum of achievable rates, set of achievable rates, Discrete alphabet channel coding theorem, Mutual information inequality, coding theorem achievability, coding theorem converse.


  1. Exercises 1 for Lecture 2 « Entropy »
  2. Exercises 2 for Lecture 5 « Error-free source coding » (Uniquely decipherable codes)
  3. Exercises 3 for Lecture 6 « Optimal source codes »



Teaching at Computer Science department of Ecole Normale Supérieure  (ULM, Paris) with Bartłomiej Blaszczyszyn :

  • Course title: « Information, coding theories: Source, channel coding, Shannon theorems » (2016-2024). 
  • Course location: ENS, 45, rue d’Ulm 75230 Paris Cedex 05. E. Noether room in the 2nd basement of the “AILE RATAUD” building; cf. ENS map.

Schedule (2024)

  1. 2024/02/06:
    1. Lecture 2 « Entropy ».
    2. Lecture 4 « Source coding ».
  2. 2024/02/27:
    1. Lecture 5 « Uniquely decipherable codes ».
    2. Exercises 1Solutions 1.
  3. 2024/03/05:
    1. Lecture 6 « Optimal codes ».
    2. Exercises 2.


B. Błaszczyszyn, M.K. Karray (2024). Information theory. Book in preparation.

Mohamed Kadhem KARRAY

My research activities at Orange aim to evaluate the performance of communication networks, by combining information, queueing theories, stochastic geometry, as well as machine and deep learning. Recently, I prepared video lectures on "Data science: From multivariate statistics to machine and deep learning" available on my YouTube channel. I also teached at Ecole Normale Supérieure, Ecole Polyetechnique, Ecole Centrale Paris, and prepared several mathematical books.

Laisser un commentaire