Data Science 6400

School of Data Science
University of Virginia

Course Overview
Course Calendar
Course Policies

View the Project on GitHub thomasgstewart/machine-learning-1-fall-2023

Machine Learning I Fall 2023


This course is intended for individuals with some exposure to multivariable models such as regression, random forest, or neural networks. In the context of these multivariable models, the course covers:

  1. data types
  2. inference vs prediction (vs false dichotomy)
  3. variation, heterogeneity, and confounding
  4. non-linear relationships between predictors and the outcome
  5. interactions between predictors
  6. model complexity
  7. carrying capacity of data
  8. model stability
  9. model diagnostics
  10. strategies for right-sizing the model complexity
    • regularization (LASSO, ridge, Bayesian)
    • constraints (principle components, monotonicity)
  11. discrimination
  12. calibration
  13. strategies for data challenges
    • missing data
    • overly influential observations
  14. common challenges in observational data analysis
    • selection bias
    • survivor selection bias
    • confounding by indication

These topics will be discussed first in the context of linear regression, and then revisited in the context of logistic regression, ordinal regression, proportional hazards regression, random forests, and (time permitting) neural networks. The course is hands-on; students will be required to fit the models (via both maximum likelihood and Bayesian approaches) and implement the strategies discussed in the course.


Thomas G. Stewart, PhD
Associate Professor
Elson Building, 400 Brandon Ave, Room 156

Instruction & Office hours

Format of the class: In-class time will be a combination of lectures, group assignments, live coding, and student presentations.

Please note: Circumstances may require the face-to-face portion of the class to be online.

Time: Monday, Wednesday, and Friday @ 11am - Ridley Room 173

Office Hours: Monday, Wednesday, and Friday @ 10am - Dell common area


Regression Modeling Strategies With Applications to Linear Models, Logistic and Ordinal Regression, and Survival Analysis
by Frank E. Harrell, Jr.
ISBN-13: 978-3-319-19424-0
Available as a PDF via UVA institutional license (link)
Author’s website for textbook: (link)

Plane Answers to Complex Questions: The Theory of Linear Models
by Ronald Christensen
Available as a PDF via UVA institutional license (link)

Optional reference texts

Data Analysis Using Regression and Multilevel/Hierarchical Models
by Andrew Gelman and Jennifer Hill
ISBN-10: 052168689X
ISBN-13: 978-0521686891


The course will be taught using R (link).


Students will be invited to a Teams channel. Questions related to course logistics, content, assignments, or the final project/exam should be posted in the Teams channel. Individual questions should be sent to the instructor and/or TA by direct chat in Teams.