Machine Learning and Deep Learning: Lectures

Machine-Learning Course
This page brings together graduate-level lectures on machine learning and deep learning, covering the theoretical foundations of statistical learning theory (PAC-learnability, VC theory, empirical processes), the core concepts of supervised and unsupervised learning, the architecture and training of neural networks and convolutional neural networks, the approximation theory underlying their universal approximation capabilities, and a hands-on Python tutorial using PyTorch and scikit-learn. Each lecture is available as a downloadable PDF with a video recording in French. Lecture 1 also includes a video in English. This material builds on the foundations established in Measure & Probability and Statistics Theory, and the theoretical results developed here are applied to network performance modeling in Wireless Networks.

Lectures: PDF & Video

1 Machine-Learning Frameworks

PDF, English video, and French video of the lecture on Machine-Learning Frameworks, the foundational structures guiding the learning process in artificial intelligence systems. We begin with the basic learning framework, defining fundamental concepts such as target functions, hypotheses, and empirical loss minimization (ELM). Through theorems, we establish the relationship between the training sequence, hypothesis class cardinality, and true loss, particularly when the hypothesis class is finite, introducing PAC-learnability. We then transition into the noisy learning framework, where the relationship between input and output is no longer deterministic, introducing probability kernels, Bayes optimal hypotheses, and agnostic PAC-learnability. Expanding our understanding further, the general learning framework challenges traditional notions by allowing for more diverse data spaces and hypothesis spaces, emphasizing concepts like empirical loss minimization and uniform convergence, with the key relationship between learnability and uniform convergence for finite hypothesis classes.

Course Outline:

  1. Basic learning framework
    1. Empirical loss minimization (ELM)
    2. PAC-learnability
  2. Noisy learning framework
    1. Bayes optimal hypothesis
    2. Agnostic PAC-learnability
  3. General learning framework
    1. Empirical loss minimization (ELM)
    2. Learnability versus uniform-convergence
    3. Finite hypothesis class

2 Vapnik-Chervonenkis Theory

PDF and video of the lecture on Vapnik-Chervonenkis Theory, the rigorous framework for establishing uniform bounds between empirical and true loss within an infinite hypothesis class. We begin with VC classes of sets, elucidating the fundamental concepts of covering and packing numbers, growth functions, and VC dimension. Through definitions and lemmas, we establish the theoretical framework necessary to comprehend the interplay between these concepts and their implications on learning and generalization, with illustrative examples and theorems such as Sauer’s lemma, showing how covering numbers of VC classes of sets grow polynomially. We then expand to VC classes of functions, defining VC dimension for classes of functions and introducing the concept of envelope functions, demonstrating how covering numbers of VC classes of functions also exhibit polynomial growth. Finally, we explore covering numbers of convex hulls, particularly in Hilbert space, leveraging Maurey’s lemma and other foundational results to elucidate how covering numbers of convex hulls exhibit certain growth properties crucial for understanding the complexity of learning in high-dimensional spaces.

Course Outline:

  1. VC classes of sets
    1. Covering and packing numbers
    2. Growth function
    3. VC dimension
    4. Covering number bound
  2. VC classes of functions
  3. Covering numbers of convex hulls

3 Results from Empirical Processes Theory

PDF and video of the lecture on Results from Empirical Processes Theory, crucial insights that bridge theoretical foundations with practical applications in machine learning. We begin with empirical processes, defining key concepts such as empirical measure and empirical process. Leveraging foundational lemmas like the Law of Large Numbers and the Central Limit Theorem, we establish the statistical underpinnings necessary for comprehending the behavior of empirical processes in machine learning contexts. We then investigate the measurability of the supremum over classes of functions, particularly when the class is uncountable, introducing essential tools such as pointwise separability and envelope functions. Finally, we derive tail bounds for the supremum of empirical processes, building upon the concept of bracketing numbers with examples illustrating their application. Through theorems and corollaries, we establish tail bounds for uniformly bounded classes of functions and sets, facilitating the estimation of empirical cumulative distribution functions (CDFs) and enabling robust statistical inference in machine learning tasks.

Course Outline:

  1. Empirical processes
  2. Measurability of the supremum
  3. Tail bounds

4 Learnability Characterization

PDF and video of the lecture on Learnability Characterization, uncovering the essential characteristics of infinite classes of hypotheses that are learnable in the context of machine learning. We begin by establishing that finite VC dimension implies learnability, particularly focusing on binary classification tasks. Through corollaries derived from concentration inequalities and theoretical insights, we show that finite VC dimension implies learnability for both binary classification and finite range loss functions, while elucidating the importance of prior knowledge and the limitations of infinite VC dimension. We then delve deeper into learning for binary classification, revealing the absence of a « no free lunch » in binary classification and the necessity of finite VC dimension for learnability. Finally, we unveil the fundamental theorem of learning for binary classification, providing a comprehensive framework that encapsulates the essential conditions for achieving learnability in noisy learning environments.

Course Outline:

  1. Finite VC dimension implies learnability
    1. Binary classification
    2. Finite range loss function
  2. Learning for binary classification
    1. No free lunch
    2. Learnability requires finite VC dimension
    3. Fundamental theorem of learning for binary classification

5 Examples of Machine-Learning Problems

PDF and video of the lecture on Examples of Machine-Learning Problems, where machine learning techniques are applied to solve real-world problems. We begin with supervised learning problems, where the learner is provided with a training sequence and aims to infer patterns from labeled data. We examine binary classification tasks using nearest neighbor and halfspace methods, elucidating the concepts of 1-NN and k-NN hypotheses, as well as the class of affine and linear halfspaces. We explore the VC dimensions of these classes, providing theoretical insights into their expressive power and learnability. We then study ELM by Perceptron, introducing the Perceptron algorithm and discussing its convergence rate, with applications of linear and polynomial fitting with multidimensional input. Finally, we shift to unsupervised learning problems, where the goal is to uncover patterns in data without labeled training examples, exploring common techniques such as clustering algorithms like linkage-based clustering and the k-means algorithm.

Course Outline:

  1. Supervised learning problems
    1. Binary classification with nearest neighbor
    2. Binary classification with halfspaces
  2. ELM by Perceptron
    1. Linear fitting with multidimensional input
    2. Polynomial fitting
  3. Unsupervised learning problems
    1. Clustering
    2. Linkage-based clustering

6 Neural Networks

PDF and video of the lecture on Neural Networks, computational models inspired by the intricate structure of the human brain. We begin with the architecture and functionality of neural networks, defining multilayer neural networks and elucidating the role of each layer in the network’s computation process. Through examples and discussions on activation functions, we showcase how neural networks can capture nonlinear relationships within data. We explore the VC dimension of neural networks, providing insights into their expressive power and generalization capabilities in the context of binary classification tasks. We then introduce stochastic gradient descent (SGD) for training neural networks, starting with gradient descent optimization algorithms and focusing on the stochastic variant. Through a detailed examination of backpropagation, we illustrate how gradients are computed and propagated through the network to update model parameters, with examples and algorithms for basic neural networks including quadratic loss functions. Finally, we explore convolutional neural networks (CNNs), widely used in tasks like image recognition and natural language processing, with key components such as convolutional layers, subsampling layers, and fully-connected layers, leveraging hierarchical feature extraction for deep learning tasks.

Course Outline:

  1. Neural networks
    1. Multilayer neural networks
    2. VC dimension of neural networks
  2. Stochastic gradient descent
    1. Gradient descent optimization algorithms
    2. Backpropagation to train a neural network
  3. Convolutional neural networks

7 Approximation Theory in Neural Networks

PDF and video of the lecture on Approximation Theory in Neural Networks, the foundational principles governing the ability of neural networks to approximate functions. We begin with the approximation of continuous functions using neural networks, defining sigmoidal functions and their essential properties, setting the stage for Cybenko’s theorem, a pivotal result in approximation theory. Through rigorous analysis, we elucidate the conditions under which neural networks can approximate a wide range of continuous functions, offering a fundamental understanding of their expressive power and versatility, known as the universal approximation theorem. We then delve into the rate of approximation for functions in various spaces, shedding light on the precision achievable by neural networks. We introduce preliminary notations and results, including Makovoz’s lemma, to facilitate our discussion on the rate of approximation in Hilbert and Lq spaces, with respect to different norms and activation functions. Finally, we explore sufficient conditions for approximation to hold, delving into the theoretical underpinnings that guarantee the effectiveness of neural networks in approximating functions.

Course Outline:

  1. Approximation of continuous functions
  2. Rate of approximation
    1. Rate of approximation in Hilbert and Lq spaces
    2. Rate of approximation in neural networks
    3. Rate of approximation with respect to supremum norm
  3. Sufficient condition for approximation to hold

8 Python Software for Machine Learning and Deep Learning: Tutorial

PDF and video of the lecture on Python Software for Machine Learning and Deep Learning, a step-by-step tutorial for installing the necessary software and writing your first Python code for data science. We begin with Python software, walking through the process of installing essential Python components from scratch. From setting up Python interpreters to installing, importing, and utilizing packages crucial for data science tasks, we ensure a solid foundation to begin the journey into machine learning and deep learning, with a practical example of linear fitting with the scikit-learn package. We then transition to the PyTorch package, exploring its key functionalities and capabilities for building neural networks, starting with tensors as the backbone of data manipulation, then devices and processors, working with image datasets, and differentiation techniques within PyTorch. Finally, we focus on PyTorch for neural networks, beginning with simple examples such as linear fitting and logistic regression, progressing to more complex tasks like multiclass logistic regression and multilayer neural networks, guided by hands-on demonstrations of model optimization.

Course Outline:

  1. Python software
    1. Install software from scratch
    2. Install, import and use packages
    3. Example: Linear fitting with scikit-learn package
  2. PyTorch package
    1. Tensors
    2. Devices (processors)
    3. Image datasets
    4. Differentiation with PyTorch
  3. PyTorch for neural networks
    1. Example: Linear fitting with PyTorch
    2. Logistic regression with PyTorch
    3. Multiclass logistic regression with PyTorch
    4. Optimization in PyTorch
    5. Multilayer neural network

Python Code Examples

A collection of Jupyter notebook examples accompanying the Python tutorial, available on our GitHub repository. The notebooks cover Python fundamentals, the use of essential data science packages (NumPy, pandas, scikit-learn), and a complete walkthrough of PyTorch from tensors to multilayer neural networks. Each notebook is self-contained and includes detailed explanations alongside runnable code snippets. The examples range from basic Python operations and package usage to linear fitting with scikit-learn, tensor manipulation with PyTorch, automatic differentiation, device management, image datasets handling, and multilayer neural network training.

Book: Preliminary Version Available

The lectures on this page are based on the first preliminary version of our book, available as a preprint on the HAL open archive: M. K. Karray, B. Błaszczyszyn, L. DarlavoixData Science: From Statistics to Machine Learning and Deep Learning, with Applications to Wireless Networks. Preliminary Version (Preprint). HAL open archive, October 2024. The book covers a broader scope including Multivariate Statistics, Machine Learning (the content of this page), and Applications to Wireless Networks. See also Statistics Theory and Wireless Networks for the corresponding pages.

Innovative Contributions

This book is distinguished by its deep commitment to building a principled understanding of data science from the ground up. Our approach is founded on three key pillars:

  • A Foundation of Mathematical Rigor: Unlike approaches that treat algorithms as « black boxes, » this book is built on a foundation of formal mathematical exposition. Key concepts are introduced through precise definitions and established via theorems with complete, detailed proofs. Our goal is to provide readers with a coherent understanding of the mathematical principles that govern machine learning.
  • Principled Treatment of Measurability: Our commitment to rigor is particularly evident in our treatment of measurability issues—a topic often glossed over in the learning theory literature. As noted by Francis Bach in Learning Theory from First Principles, it is common to « avoid overformalizations » in this area. In contrast, we address these issues formally, as they are fundamental to ensuring that probabilistic models in data science are mathematically sound, especially for applications in engineering and communication networks.
  • From Theory to Real-World Application: We bridge the gap between abstract theory and concrete practice with an in-depth case study on predicting Quality of Service (QoS) in large-scale wireless networks. Using operational data from a major European operator, we demonstrate how the theoretical frameworks—from linear models to neural networks—are implemented and validated, providing a complete and actionable guide for researchers and practitioners.

About These Topics

These graduate-level lectures on machine learning and deep learning are designed for students and researchers seeking a rigorous mathematical foundation for artificial intelligence, beyond the heuristic approaches commonly found in introductory courses. The material adopts a measure-theoretic approach, building on the foundations of probability theory and multivariate statistics, and provides the theoretical machinery needed to understand modern deep learning systems. The course covers the foundational frameworks of statistical learning theory, including empirical loss minimization (ELM), PAC-learnability, and the uniform convergence property. It then develops the theoretical machinery of Vapnik-Chervonenkis (VC) theory, including VC dimension, covering and packing numbers, growth functions, and Sauer’s lemma, complemented by empirical processes theory with tail bounds, bracketing numbers, and envelope functions. These tools enable a rigorous characterization of learnability, culminating in the fundamental theorem of learning for binary classification. The lectures then transition to practical machine learning problems, covering supervised learning with nearest neighbors, halfspaces, and the Perceptron algorithm, and unsupervised learning with clustering algorithms such as linkage-based clustering and k-means. We then explore neural networks, including multilayer architectures, activation functions, stochastic gradient descent (SGD), backpropagation, and convolutional neural networks (CNNs) for image recognition and deep learning tasks. We address the approximation theory underlying neural networks, including Cybenko’s theorem, the universal approximation theorem, and the rate of approximation in Hilbert and Lq spaces. The lectures conclude with a hands-on Python tutorial using PyTorch and scikit-learn, with practical examples available as Jupyter notebooks on our GitHub repository. The material draws on the book Data Science: From Statistics to Machine Learning and Deep Learning, with Applications to Wireless Networks by Karray, Błaszczyszyn, and Darlavoix, available as a preprint on the HAL open archive.

Last Updated on 24 mai 2026 by Mohamed Kadhem KARRAY