SDS 323: Statistical Learning and Inference

This course is an introduction to statistical inference, broadly construed as the process of drawing conclusions from data, and of quantifying uncertainty about said conclusions. The goal is to introduce the basic ideas of statistical learning and predictive modeling from a statistical, theoretical and computational perspective, together with applications to real data. Topics cover the major schools of thought that influence modern scientific practice, including classical frequentist methods, machine learning and Bayesian inference. The course aims to provide a very applied overview of some classical linear approaches such as Linear Regression, Logistic Regression, Linear Discriminant Analysis, as well as some non-linear methods such as K-Means Clustering, K-Nearest Neighbors, Generalized Additive Models, Decision Trees, Boosting, Bagging and Support Vector Machines.

Tentative schedule and weekly learning goals

The following schedule is tentative and will be updated throughout the course.

Topic Assignment Due Readings (ISLA)
Introduction HW0 1
Statistical Learning overview 2
R Session: Introduction to R
Introduction to the Linear Model 3.1
Multiple Linear regression and potential problems HW1 HW0 3.2, 3.3
R Session: Linear Regression
Classification 4.1, 4.2, 4.3
Classification 4.4, 4.5
R Session: Classification HW2 HW1
Resampling methods 5.1, 5.2
R Session: Resampling methods 5.2
Linear Model selection 6.1
Linear model regularization HW2 6.2, 6.3
Midterm Exam 1
R Session: Model selection HW3
Moving beyond linearity 7.1, 7.2, 7.3, 7.4
Moving beyond linearity 7.5, 7.6, 7.7
R Session: Moving beyond linearity
Tree based methods HW4 HW3 8.1
Tree based methods 8.2
R Session: Tree based methods
Support Vector Machines 9.1, 9.2, 9.3
R Session: Support Vector Machines HW5 HW4
Midterm Exam 2
Unsupervised Learning 10
Unsupervised Learning 10
Thanksgiving HW5
R Session: Unsupervised Learning
Special topic: Intro to Neural Networks