Machine learning is one of the fastest growing areas of computer science, with far-reaching applications. The aim of this textbook is to introduce machine learning, and the algorithmic paradigms it offers, in a principled way. The book provides an extensive theoretical account of the fundamental ideas underlying machine learning and the mathematical derivations that transform these principles into practical algorithms. Following a presentation of the basics of the field, the book covers a wide array of central topics that have not been addressed by previous textbooks. These include a discussion of the computational complexity of learning and the concepts of convexity and stability; important algorithmic paradigms including stochastic gradient descent, neural networks, and structured output learning; and emerging theoretical concepts such as the PAC-Bayes approach and compression-based bounds. Designed for an advanced undergraduate or beginning graduate course, the text makes the fundamentals and algorithms of machine learning accessible to students and non-expert readers in statistics, computer science, mathematics, and engineering.

Shai Shalev-Shwartz

I am an associate professor at the School of Computer Science and Engineering at the Hebrew University of Jerusalem, Israel. I am also at Mobileye, working on autonomous driving. I received my PhD from the Hebrew University in 2007, and was a research assistant professor at the Toyota Technological Institute at Chicago until June 2009.

My work focuses on Machine Learning algorithms.

Shai Ben-David

Shai Ben-David grew up in Jerusalem, Israel. He attended the Hebrew University studying physics, mathematics and psychology. He received his PhD under the supervision of Saharon Shelah and Menachem Magidor for a thesis in set theory. Professor Ben-David was a postdoctoral fellow at the University of Toronto in the Mathematics and the Computer Science departments, and in 1987 joined the faculty of the CS Department at the Technion (Israel Institute of Technology). He held visiting faculty positions at the Australian National University in Canberra (1997-8) and at Cornell University (2001-2004). In August 2004 he joined the School of Computer Science at the University of Waterloo.

  • Introduction

Part I: Foundations

  • A gentle start
  • A formal learning model
  • Learning via uniform convergence
  • The bias-complexity trade-off
  • The VC-dimension
  • Non-uniform learnability
  • The runtime of learning

Part II: From Theory to Algorithms

  • Linear predictors
  • Boosting
  • Model selection and validation
  • Convex learning problems
  • Regularization and stability
  • Stochastic gradient descent
  • Support vector machines
  • Kernel methods
  • Multiclass, ranking, and complex prediction problems
  • Decision trees
  • Nearest neighbor
  • Neural networks

Part III: Additional Learning Models

  • Online learning
  • Clustering
  • Dimensionality reduction
  • Generative models
  • Feature selection and generation

Part IV: Advanced Theory

  • Rademacher complexities
  • Covering numbers
  • Proof of the fundamental theorem of learning theory
  • Multiclass learnability
  • Compression bounds
  • PAC-Bayes


  • Technical lemmas
  • Measure concentration
  • Linear algebra