Machine-learning and Deep-learning video lectures

Machine-Learning is becoming a widespread tool to solve problems in many domains. It has solid mathematical foundations inherited from measure and probability theories and multivariate statistics theory. We shall present fundamental results on performance guarantee of ML algorithms, which go further beyong the classical heuristic approach (based on test sequence). 

Mathematics and softwares are oftenly imbricated in data-science writings. We believe that it is much more clear to present mathematics and softwares separately. 

The machine-learning problem may be illustrated as follows. We have n samples of the input and the corresponding outputs, and we aim the guess the relation between the output y and the input x; i.e. to find a function f such that y=f(x). 


I Machine-Learning frameworks

We present the fundamental concepts and results of machine and deep learning. In this first lecture, we define precisely the Machine Learning (ML) frameworks and give the first results on performance guarantee of ML algorithms in the case of a finite class of hypothesis. The following lessons extend these results to the case of infinite classes (even of infinite dimension).

Basic learning framework
      Empirical loss minimization (ELM)
Noisy learning framework
      Bayes optimal hypothesis
      Agnostic PAC-learnability
General learning framework
      Empirical loss minimization (ELM)
      Learnability versus uniform-convergence
      Finite hypothesis class

II Vapnik-Chervonenkis theory

We aim to establish a uniform bound of the deviation of the empirical loss from the true loss for an infinite hypothesis class. In this regard,
  – We present the Vapnik-Chervonenkis theory
    * Introduce the notions of covering and packing numbers, growth function, VC dimension.
    * Build bounds on the covering numbers for classes of sets and functions whose VC dimension is finite.
  – These results, together with some results from the empirical processes theory (next lecture) will permit to establish the desired uniform bound.

VC classes of sets
    Covering and packing numbers
    Growth function
    VC dimension
    Covering number bound
VC classes of functions
Covering numbers of convex hulls

III Results from empirical processes theory

We present some results from empirical processes theory which are useful for data science. These results, together with the Vapnik-Chervonenkis theory (previous lecture) will permit to establish a uniform bound of the deviation of the empirical loss from the true loss for a hypothesis within an infinite class of hypotheses.

Empirical processes
Measurability of the supremum
Tail bounds

IV Learnability characterization

We aim in this lecture to:
  – characterize the infinite classes of hypotheses which are learnable
  – and give the corresponding sample-complexity; i.e. the number of learning samples guaranteeing the performance of the learning algorithm.
Remind that for a finite class H of hypotheses, we gave in the first lecture a uniform bound of the deviation of the empirical loss L_{S⁽ⁿ⁾}(h) from the true loss L_{Q}(h);
  – which implies learnability and gives the sample-complexity.
We shall now extend these results for infinite class of hypotheses using:
  – the Vapnik-Chervonenkis theory presented in the second lecture
  – and some results from empirical processes theory presented in the third lecture.

Finite VC dimension implies learnablity
    Binary classification
    Finite range loss function
Learning for binary classification
    No free lunch
    Learnability requires finite VC dimension
    Fundamental theorem of learning for binary classification

V Examples of Machine-Learning problems

So far, we have considered learning problems where the learner is provided with some training sequence.
  – This is referred to as supervised learning, since we assume that there is a supervisor (or teacher) which provides the training sequence.
  – In some problems, we aim to find some patterns in a given data set, without having any training sequence; in this case we say that we have an unsupervised learning.

Supervised learning problems
    Binary classification with nearest neighbor
        1-NN hypothesis
        k-NN hypothesis
    Binary classification with halfspaces
        ELM by linear optimization
ELM by Perceptron
    Linear fitting with multidimensional input
    Polynomial fitting
Unsupervised learning problems
    Linkage-based clustering


VI Neural networks

A neural network is a computation model which mimicks the human brain structure. We shall:
  – give a formal definition based on multilayer representation;
  – give a bound for the VC dimension of neural network;
  – present the stochastic gradient descent algorithm permitting to minimize the empirical loss.

Neural networks
    Multilayer neural networks
    VC dimension of neural network
Stochastic gradient descent
    Gradient descent optimization algorithms
    Back propagation to train a neural network
Convolutional neural networks

VII Approximation theory in neural networks

Neural networks have the `reputation’ of approximating any function; which is known as the universel approximation theorem.
We shall make precise statements of this result. More specifically, we shall adress the following questions:
  – What are the functions that can be approximated by neural networks?
  – How many neurons one needs to ensure a given approximation accuracy?
In this regard, we shall firstly present some general results from approximation theory.

Approximation of continuous functions
Rate of approximation
    Rate of approximation in Hilbert and Lq spaces
    Rate of approximation in neural networks
    Rate of approximation with respect to supremum norm
Sufficient condition for approximation to hold

VIII Python softwares for Machine-Learning and Deep-Learning - Tutorial

The goal of this lecture is to guide you to install the software from scratch, and start writing your first Python codes for data-science. You will be able to build your own neural network, with multidimensional input and output, in a few lines of code using a specific library (called PyTorch).

Python softwares
      Install softwares from scratch
      Install, import and use packages
      Example : Linear fitting with sklearn package
PyTorch package
      Devices (processors)
      Image datasets
      Differentiation with Pytorch
PyTorch for neural networks
      Example : Linear fitting with PyTorch
      Logistic regression with PyTorch
      Multiclass logistic regression with PyTorch
      Optimization in PyTorch
      Multilayer neural network

Python code examples
Here are some notebook examples which permit to learn and practice Python and the associated packages. Please refer to the lecture notes to install softwares on your machine to run Python notebooks.

  • Example 0 : Rapid tutorial on Python.
  • Example 1 : Import and use basic Python packages: math, numpy, and pandas.
  • Example 2 : Linear fitting using sklearn package.
  • Example 3 : Tensor object in PyTorch package.
  • Example 4 : Devices to run Python code (processors).
  • Example 5 : Image datasets.
  • Example 6 : Rapid tutorial on Python.
  • Example 7 : Linear fitting using a neural network with PyTorch.
  • Example 8 : Multilayer neural networks to fit data.

Mohamed Kadhem KARRAY

My research activities at Orange aim to evaluate the performance of communication networks, by combining information, queueing theories, stochastic geometry, as well as machine and deep learning. Recently, I prepared video lectures on "Data science: From multivariate statistics to machine and deep learning" available on my YouTube channel. I also teached at Ecole Normale Supérieure, Ecole Polyetechnique, Ecole Centrale Paris, and prepared several mathematical books.