Lectures on Statistics: Univariate & Multivariate Data

Infographic illustrating key concepts in an online statistics course, including univariate and multivariate analysis, linear and logistic regression, principal component and factor analysis, suitable for learners seeking statistics tutorials, courses, and videos in data science and statistical modeling

We offer a comprehensive statistics course that delves into both univariate and multivariate analysis techniques. Suitable for beginners and advanced practitioners alike, our resources span a diverse array of topics to improve your statistical expertise.

  • Basic Statistics: We will examine the foundational concepts of statistical estimation, confidence intervals, hypothesis testing, and likelihood, along with their practical applications. These essential topics form the cornerstone for advanced statistical methods and play a pivotal role across diverse domains, including scientific research, business analytics, and engineering.
  • Linear Fitting and Regression: We dive into the core of statistical modeling and analysis with this lecture focused on understanding variable relationships and predictive techniques. We’ll unravel the details of linear models, examining both deterministic fitting and probabilistic regression approaches. The session encompasses analysis in both univariate and multivariate contexts, and introduces Gaussian models as a framework for regression analysis.
  • Logistic Regression: In the field of statistical modeling, logistic regression is a crucial technique for analyzing discrete outcome variables. Unlike linear regression, which is designed for continuous outcomes, logistic regression is well-suited for scenarios where the output variable is categorical. This lecture covers the basics of logistic regression, explaining its applications in both binary and multiclass contexts, as well as the methods used for prediction.
  • Principal Component and Factor Analysis: Principal Component Analysis (PCA) and Factor Analysis are essential techniques in multivariate statistics, used to reduce redundancy among observed variables while retaining important information. Although they share a common goal, each technique has unique characteristics and methodologies. This lecture offers a comprehensive examination of PCA and Factor Analysis, explaining their principles, applications, and providing comparative insights.

Enhance your statistical prowess with our comprehensive statistics course, meticulously designed to cater to learners of all levels. Whether you’re seeking a fundamental understanding of univariate and multivariate analysis or aiming to master advanced techniques like linear regression, logistic regression, principal component analysis, and factor analysis, our course has you covered. With downloadable PDF materials and engaging video lectures, learning statistics has never been more accessible. Dive into the world of data analysis and statistical modeling with confidence, and embark on your journey towards becoming a proficient data scientist.

The foundation of statistics theory rests on measure and probability theories, offering invaluable insights essential for understanding machine learning principles.

1. Basic Statistics


In this lecture on basic statistics, we will explore fundamental concepts essential for understanding statistical estimation, confidence intervals, hypothesis testing, likelihood, and their applications. These topics serve as building blocks for more advanced statistical techniques and are crucial in various fields ranging from scientific research to data analysis in business and engineering.

  1. Statistical Estimation: Statistical estimation forms the cornerstone of inferential statistics, aiming to infer characteristics of a population from sample data. We begin by defining the statistical framework, which includes concepts such as sample space, parameter space, estimands, and estimators. Through examples and definitions, we delve into understanding the mean squared error of an estimator and explore estimators for expectation and variance, particularly focusing on Gaussian random variables.
  2. Confidence Interval: Confidence intervals provide a range of plausible values for population parameters based on sample data, aiding in quantifying uncertainty. We define confidence intervals and discuss propositions regarding their construction for expectation and variance of Gaussian random variables. Additionally, we explore Gaussian approximation techniques and extend confidence interval concepts to parameters such as Bernoulli’s parameter.
  3. Hypothesis Testing: Hypothesis testing allows us to make decisions based on sample data, particularly regarding population parameters. We introduce the fundamental concepts of hypothesis tests, including null and alternative hypotheses, error rates, and the concept of power. Through propositions and definitions, we explore hypothesis tests for Gaussian random variables, covering scenarios with known and unknown standard deviations and addressing tests for variance and Bernoulli’s parameter.
  4. Statistic and Test Statistic: Understanding the role of statistics and test statistics is essential for effective hypothesis testing. We define these terms, emphasizing the importance of test statistics in decision-making processes within hypothesis tests. This section provides clarity on how these statistical measures contribute to the overall inferential process.
  5. Likelihood: Likelihood functions serve as a cornerstone for evaluating the plausibility of different parameter values given observed data, offering a systematic approach to model fitting and inference. We define the likelihood function and explore its properties, particularly focusing on the Gaussian distribution. Through lemmas, we illustrate how likelihood functions can be utilized to infer parameters, showcasing their utility in statistical inference across diverse settings.

Explore the fundamental principles of basic statistics, including confidence intervals and hypothesis testing, with our comprehensive lecture series. Learn how to calculate the p-value and test statistic, crucial components of hypothesis testing, and understand the nuances of hypothesis formulation, including null and alternative hypotheses. Gain insights into statistical estimation techniques and the significance of likelihood functions in statistical analysis. Whether you’re a student delving into the intricacies of statistics or a professional seeking to enhance your analytical skills, our lecture provides valuable insights into hypothesis testing statistics and the interpretation of p-values. 

Course Outline:
1 Statistical estimation
    1.1 Statistical framework
    1.2 Mean squared error of an estimator
    1.3 Estimators of expectation and variance
2 Confidence interval
3 Hypothesis testing
    3.1 Hypothesis tests
    3.2 Hypothesis tests for Gaussian random variables
    3.3 Statistic and test statistic
    3.4 The p-value
5 Likelihood

2. Linear Fitting and Regression


Linear fitting and regression are fundamental concepts in statistical modeling and analysis, crucial for understanding relationships between variables and making predictions. In this lecture, we delve into the intricacies of linear models, exploring both deterministic linear fitting and probabilistic linear regression, covering unidimensional and multidimensional scenarios as well as Gaussian models for regression.

  1. Unidimensional Linear Fitting and Regression: In this section, we begin with unidimensional input scenarios, distinguishing between linear fitting and regression. Linear fitting involves deterministic points, aiming to find the best-fitting line through the least-squares fitting method. We delve into definitions of least-squares fitting and its parameters, establishing the connection between least-squares fitting and projection. Moreover, we introduce the determination coefficient for assessing the goodness of fit. Transitioning to linear regression, where the output is random, we discuss regression errors, parameters estimation, and the role of least-squares fitting in estimating regression parameters.
  2. Multidimensional Linear Fitting and Regression: Extending our analysis to multidimensional input spaces, we explore linear fitting and regression models. In multidimensional linear fitting, we aim to find the best-fitting hyperplane, again utilizing least-squares fitting for parameter estimation and understanding its projection interpretation. We further discuss the decomposition of output variance to gain insights into the model’s performance. Moving to multidimensional linear regression, we introduce regression errors, parameters estimation, residual analysis, and prediction error assessment in a multidimensional context.
  3. Gaussian Model and Maximum Likelihood Estimators: Concluding our lecture, we introduce the Gaussian linear regression model and maximum likelihood estimators. By assuming a Gaussian distribution for the regression errors, we derive maximum likelihood estimators for the parameters, providing a probabilistic framework for linear regression analysis. This section highlights the statistical underpinnings of linear regression and how it aligns with the Gaussian assumption, offering valuable insights for modeling and inference in real-world scenarios.

Our comprehensive lecture on linear fitting and regression covers a wide array of topics crucial for understanding and applying statistical modeling techniques. From discussing the intricacies of least-squares fitting to exploring the nuances of maximum likelihood estimators, our lecture provides valuable insights into regression analysis. Whether you’re seeking to understand regression errors, calculate regression lines, or delve into multiple linear regression models, our content offers clear explanations and practical examples. Furthermore, we elucidate the concepts of determination coefficients and residual sum of squares, empowering learners to effectively evaluate model performance and make informed decisions.

Course Outline:
1 Unidimensional input
    1.1 Linear fitting
    1.2 Linear regression
    1.3 Linear prediction
2 Multidimensional input
    2.1 Multidimensional linear fitting
    2.2 Multidimensional linear regression
    2.3 Multidimensional linear prediction
3 Gaussian model
    3.1 Maximum likelihood estimators

3. Logistic Regression


In the realm of statistical modeling, logistic regression emerges as a pivotal technique when dealing with discrete outcome variables. Unlike linear regression, which is tailored for continuous outcomes, logistic regression adapts gracefully to scenarios where the output variable is categorical. This lecture navigates through the fundamentals of logistic regression, elucidating its applications in binary and multiclass contexts, alongside the methodologies for prediction.

  1. Binary Logistic Regression: Our journey commences with binary logistic regression, a cornerstone in statistical modeling. Here, we define the binary logistic regression model, encapsulating the essence of the logit function and the sigmoid function in mapping probabilities. Delving deeper into this model, we explicate the likelihood function specifically tailored for binary logistic regression, laying the groundwork for understanding maximum likelihood estimators. Additionally, we explore the nuances of the log-likelihood function and discuss efficient numerical methods for calculating maximum likelihood estimators, ensuring robust model estimation.
  2. Binary Logistic Prediction: With a firm grasp of binary logistic regression, we transition seamlessly into the realm of prediction. In this section, we explore the intricacies of binary logistic prediction, where we leverage the learned model to predict outcomes for new observations.
  3. Multiclass Logistic Regression: Expanding our horizons, we delve into multiclass logistic regression, a sophisticated extension of binary logistic regression tailored for scenarios with more than two categorical outcomes. We define the multiclass logistic regression model, shedding light on its intricacies and the underlying likelihood function. Furthermore, we delve into the derivation of maximum likelihood estimators for multiclass logistic regression, accompanied by a discussion on the log-likelihood function. Finally, we explore multiclass logistic prediction, elucidating how to extend the principles of binary logistic prediction to scenarios with multiple categorical outcomes, ensuring a comprehensive understanding of logistic regression in diverse modeling contexts.

Our comprehensive lecture on logistic regression provides a deep dive into this essential statistical technique, offering insights into binary and multiclass classification scenarios. From understanding the foundational concepts like the sigmoid and logit functions to mastering the intricacies of maximum likelihood estimators, our content caters to learners at all levels. Whether you’re delving into binary logistic regression equations or exploring the log likelihood function, our lecture equips you with the knowledge and tools needed for effective model building and interpretation.

With practical examples and theoretical discussions, we demystify complex concepts like the sigmoid activation function and the probit function, making logistic regression accessible and applicable in various fields. 

Course Outline:
1 Binary logistic regression
    1.1 Binary logistic regression model
2 Binary logistic prediction
3 Multiclass logistic regression
    3.1 Multiclass logistic regression model
    3.2 Multiclass logistic prediction

4. Principal Component and Factor Analysis


Principal Component Analysis (PCA) and Factor Analysis are indispensable techniques in the realm of multivariate statistics, serving the common goal of reducing redundancy among observed variables while preserving essential information. Despite their shared objective, each technique possesses distinct characteristics and methodologies. This lecture provides an in-depth exploration of PCA and Factor Analysis, elucidating their principles, applications, and comparative insights.

  1. Principal Component Analysis (PCA): Principal Component Analysis constitutes a fundamental technique for dimensionality reduction and data exploration in multivariate analysis. In this section, we delve into the intricacies of PCA, starting with its application to random observed variables. We elucidate the spectral decomposition of the covariance matrix and introduce the concept of principal component transformation, accompanied by key definitions, lemmas, and propositions. Furthermore, we discuss PCA’s adaptability to deterministic observed variables, outlining the spectral decomposition of empirical covariance matrices and exploring essential aspects such as variance reduction and orthogonality.
  2. Factor Analysis: Factor Analysis provides a nuanced approach to uncovering latent variables underlying observed variables, thus facilitating a deeper understanding of data structures. In this section, we embark on a comprehensive journey through Factor Analysis, beginning with an overview of its model components, including observations, common factors, and factor loadings. We delve into the intricacies of interpreting factor loadings and discuss Factor Analysis in contrast to regression analysis. Moreover, we explore critical topics such as the existence of Factor Analysis models, scaling considerations, and the non-uniqueness of factor loadings. Additionally, we shed light on factor interpretation techniques, including factor rotation methods like the Varimax criterion, and discuss various estimation methods for factor loadings, encompassing the principal component method, principal factor method, and maximum likelihood method.

Our lecture on principal component and factor analysis provides a comprehensive exploration of essential techniques in multivariate statistics. Whether you’re delving into the intricacies of principal component analysis (PCA) or unraveling the complexities of factor analysis, our content caters to learners seeking a deeper understanding of dimensionality reduction and latent variable modeling. From exploring PCA components to understanding factor loadings and communality, our lecture equips you with the knowledge and tools needed to navigate multivariate data analysis with confidence. 

Course Outline:
1 Principal component analysis (PCA)
    1.1 PCA for random observed variables
    1.2 PCA for deterministic observed variables
2 Factor analysis
    2.1 Existence of factor analysis model
    2.2 Scaling in factor analysis
    2.3 Rotation of factors
    2.4 Factors interpretation
    2.5 Estimating loadings

Book on Statistics: Coming Soon on this Webpage

Keep an eye on this page for the upcoming launch of our book:

  • B. Błaszczyszyn, L. Darlavoix, M.K. Karray: « Data science : From multivariate statistics to machine, deep learning ».