Page Contents

Toggle# Machine-learning and Deep-learning video lectures

Machine-Learning is becoming a widespread tool to solve problems in many domains. It has solid mathematical foundations inherited from measure and probability theories and multivariate statistics theory. We shall present fundamental results on performance guarantee of ML algorithms, which go further beyong the classical heuristic approach (based on test sequence).

Mathematics and softwares are oftenly imbricated in data-science writings. We believe that it is much more clear to present mathematics and softwares separately.

The machine-learning problem may be illustrated as follows. We have n samples of the input and the corresponding outputs, and we aim the guess the relation between the output y and the input x; i.e. to find a function f such that y=f(x).

## I Machine-Learning frameworks

**PDF**, **Video**, **Vidéo**

We present the fundamental concepts and results of machine and deep learning. In this first lecture, we define precisely the Machine Learning (ML) frameworks and give the first results on performance guarantee of ML algorithms in the case of a finite class of hypothesis. The following lessons extend these results to the case of infinite classes (even of infinite dimension).

**Outline**:

Basic learning framework

Empirical loss minimization (ELM)

PAC-learnability

Noisy learning framework

Bayes optimal hypothesis

Agnostic PAC-learnability

General learning framework

Empirical loss minimization (ELM)

Learnability versus uniform-convergence

Finite hypothesis class

## II Vapnik-Chervonenkis theory

**PDF**, **Vidéo**

We aim to establish a uniform bound of the deviation of the empirical loss from the true loss for an infinite hypothesis class. In this regard,

– We present the Vapnik-Chervonenkis theory

* Introduce the notions of covering and packing numbers, growth function, VC dimension.

* Build bounds on the covering numbers for classes of sets and functions whose VC dimension is finite.

– These results, together with some results from the empirical processes theory (next lecture) will permit to establish the desired uniform bound.

**Outline**:

VC classes of sets

Covering and packing numbers

Growth function

VC dimension

Covering number bound

VC classes of functions

Covering numbers of convex hulls

## III Results from empirical processes theory

**PDF**, **Vidéo**

We present some results from empirical processes theory which are useful for data science. These results, together with the Vapnik-Chervonenkis theory (previous lecture) will permit to establish a uniform bound of the deviation of the empirical loss from the true loss for a hypothesis within an infinite class of hypotheses.

**Outline**:

Empirical processes

Measurability of the supremum

Tail bounds

## IV Learnability characterization

**PDF**, **Vidéo**

We aim in this lecture to:

– characterize the infinite classes of hypotheses which are learnable

– and give the corresponding sample-complexity; i.e. the number of learning samples guaranteeing the performance of the learning algorithm.

Remind that for a finite class H of hypotheses, we gave in the first lecture a uniform bound of the deviation of the empirical loss L_{S⁽ⁿ⁾}(h) from the true loss L_{Q}(h);

– which implies learnability and gives the sample-complexity.

We shall now extend these results for infinite class of hypotheses using:

– the Vapnik-Chervonenkis theory presented in the second lecture

– and some results from empirical processes theory presented in the third lecture.

**Outline**:

Finite VC dimension implies learnablity

Binary classification

Finite range loss function

Learning for binary classification

No free lunch

Learnability requires finite VC dimension

Fundamental theorem of learning for binary classification

## V Examples of Machine-Learning problems

**PDF**, **Vidéo**

So far, we have considered learning problems where the learner is provided with some training sequence.

– This is referred to as supervised learning, since we assume that there is a supervisor (or teacher) which provides the training sequence.

– In some problems, we aim to find some patterns in a given data set, without having any training sequence; in this case we say that we have an unsupervised learning.

**Outline**:

Supervised learning problems

Binary classification with nearest neighbor

1-NN hypothesis

k-NN hypothesis

Binary classification with halfspaces

Halfspaces

ELM by linear optimization

ELM by Perceptron

Linear fitting with multidimensional input

Polynomial fitting

Unsupervised learning problems

Clustering

Linkage-based clustering

## VI Neural networks

**PDF**, **Vidéo**

A neural network is a computation model which mimicks the human brain structure. We shall:

– give a formal definition based on multilayer representation;

– give a bound for the VC dimension of neural network;

– present the stochastic gradient descent algorithm permitting to minimize the empirical loss.

**Outline**:

Neural networks

Multilayer neural networks

VC dimension of neural network

Stochastic gradient descent

Gradient descent optimization algorithms

Back propagation to train a neural network

Convolutional neural networks

## VII Approximation theory in neural networks

**PDF**, **Vidéo**

Neural networks have the `reputation’ of approximating any function; which is known as the universel approximation theorem.

We shall make precise statements of this result. More specifically, we shall adress the following questions:

– What are the functions that can be approximated by neural networks?

– How many neurons one needs to ensure a given approximation accuracy?

In this regard, we shall firstly present some general results from approximation theory.

**Outline**:

Approximation of continuous functions

Rate of approximation

Rate of approximation in Hilbert and Lq spaces

Rate of approximation in neural networks

Rate of approximation with respect to supremum norm

Sufficient condition for approximation to hold

## VIII Python softwares for Machine-Learning and Deep-Learning - Tutorial

**PDF**, **Vidéo**

The goal of this lecture is to guide you to install the software from scratch, and start writing your first Python codes for data-science. You will be able to build your own neural network, with multidimensional input and output, in a few lines of code using a specific library (called PyTorch).

**Outline**:

Python softwares

Install softwares from scratch

Install, import and use packages

Example : Linear fitting with sklearn package

PyTorch package

Tensors

Devices (processors)

Image datasets

Differentiation with Pytorch

PyTorch for neural networks

Example : Linear fitting with PyTorch

Logistic regression with PyTorch

Multiclass logistic regression with PyTorch

Optimization in PyTorch

Multilayer neural network

**Python code examples**

Here are some notebook examples which permit to learn and practice Python and the associated packages. Please refer to the lecture notes to install softwares on your machine to run Python notebooks.

- Example 0 : Rapid tutorial on Python.
- Example 1 : Import and use basic Python packages: math, numpy, and pandas.
- Example 2 : Linear fitting using sklearn package.
- Example 3 : Tensor object in PyTorch package.
- Example 4 : Devices to run Python code (processors).
- Example 5 : Image datasets.
- Example 6 : Rapid tutorial on Python.
- Example 7 : Linear fitting using a neural network with PyTorch.
- Example 8 : Multilayer neural networks to fit data.