**School of Data ScienceUniversity of Virginia**

Course Overview

Course Calendar

Course Policies

View the Project on GitHub thomasgstewart/machine-learning-1-fall-2023

This course is intended for individuals with some exposure to multivariable models such as regression, random forest, or neural networks. In the context of these multivariable models, the course covers:

- data types
- inference vs prediction (vs false dichotomy)
- variation, heterogeneity, and confounding
- non-linear relationships between predictors and the outcome
- interactions between predictors
- model complexity
- carrying capacity of data
- model stability
- model diagnostics
- strategies for right-sizing the model complexity
- regularization (LASSO, ridge, Bayesian)
- constraints (principle components, monotonicity)

- discrimination
- calibration
- strategies for data challenges
- missing data
- overly influential observations

- common challenges in observational data analysis
- selection bias
- survivor selection bias
- confounding by indication

These topics will be discussed first in the context of linear regression, and then revisited in the context of logistic regression, ordinal regression, proportional hazards regression, random forests, and (time permitting) neural networks. The course is hands-on; students will be required to fit the models (via both maximum likelihood and Bayesian approaches) and implement the strategies discussed in the course.

Thomas G. Stewart, PhD

Associate Professor

Elson Building, 400 Brandon Ave, Room 156

thomas.stewart@virginia.edu

thomasgstewart

**Format of the class:** In-class time will be a combination of lectures, group assignments, live coding, and student presentations.

**Please note:** Circumstances may require the face-to-face portion of the class to be online.

**Time:** Monday, Wednesday, and Friday @ 11am - Ridley Room 173

**Office Hours:** Monday, Wednesday, and Friday @ 10am - Dell common area

Regression Modeling Strategies With Applications to Linear Models, Logistic and Ordinal Regression, and Survival Analysis

by Frank E. Harrell, Jr.

ISBN-13: 978-3-319-19424-0

Available as a PDF via UVA institutional license (link)

Authorâ€™s website for textbook: (link)

Plane Answers to Complex Questions: The Theory of Linear Models

by Ronald Christensen

Available as a PDF via UVA institutional license (link)

Data Analysis Using Regression and Multilevel/Hierarchical Models

by Andrew Gelman and Jennifer Hill

ISBN-10: 052168689X

ISBN-13: 978-0521686891

The course will be taught using R (link).

Students will be invited to a Teams channel. Questions related to course logistics, content, assignments, or the final project/exam should be posted in the Teams channel. Individual questions should be sent to the instructor and/or TA by direct chat in Teams.