Print version: Osipov, Carl MLOps Engineering at Scale New York : Manning Publications Co. LLC,c2022
Preface // acknowledgments // about this book // Who should read this book // How this book is organized: A road map // About the code // liveBook discussion forum // about the author // about the cover illustration // Part 1 Mastering the data set // 1 Introduction to serverless machine learning // 1.1 What is a machine learning platform? // 1.2 Challenges when designing a machine learning platform // 1.3 Public clouds for machine learning platforms // 1.4 What is serverless machine learning? // 1.5 Why serverless machine learning? // 1.5.1 Serverless vs. IaaS and PaaS // 1.5.2 Serverless machine learning life cycle // 1.6 Who is this book for? // 1.6.1 What you can get out of this book // 1.7 How does this book teach? // 1.8 When is this book not for you? // 1.9 Conclusions // Summary // 2 Getting started with the data set // 2.1 Introducing the Washington, DC taxi rides data set // 2.1.1 What is the business use case? // 2.1.2 What are the business rules? // 2.1.3 What is the schema for the business service? // 2.1.4 What are the options for implementing the business service? // 2.1.5 What data assets are available for the business service? // 2.1.6 Downloading and unzipping the data set // 2.2 Starting with object storage for the data set // 2.2.1 Understanding object storage vs. filesystems // 2.2.2 Authenticating with Amazon Web Services // 2.2.3 Creating a serverless object storage bucket // 2.3 Discovering the schema for the data set // 2.3.1 Introducing AWS Glue // 2.3.2 Authorizing the crawler to access your objects // 2.3.3 Using a crawler to discover the data schema // 2.4 Migrating to columnar storage for more efficient analytics // 2.4.1 Introducing column-oriented data formats for analytics // 2.4.2 Migrating to a column-oriented data format // Summary //
3 Exploring and preparing the data set // 3.1 Getting started with interactive querying // 3.1.1 Choosing the right use case for interactive querying // 3.1.2 Introducing AWS Athena // 3.1.3 Preparing a sample data set // 3.1.4 Interactive querying using Athena from a browser // 3.1.5 Interactive querying using a sample data set // 3.1.6 Querying the DC taxi data set // 3.2 Getting started with data quality // 3.2.1 From "garbage in, garbage out" to data quality // 3.2.2 Before starting with data quality // 3.2.3 Normative principles for data quality // 3.3 Applying VACUUM to the DC taxi data // 3.3.1 Enforcing the schema to ensure valid values // 3.3.2 Cleaning up invalid fare amounts // 3.3.3 Improving the accuracy // 3.4 Implementing VACUUM in a PySpark job // Summary // 4 More exploratory data analysis and data preparation // 4.1 Getting started with data sampling // 4.1.1 Exploring the summary statistics of the cleaned-up data set // 4.1.2 Choosing the right sample size for the test data set // 4.1.3 Exploring the statistics of alternative sample sizes // 4.1.4 Using a PySpark job to sample the test set // Summary // Part 2 PyTorch for serverless machine learning // 5 Introducing PyTorch: Tensor basics // 5.1 Getting started with tensors // 5.2 Getting started with PyTorch tensor creation operations // 5.3 Creating PyTorch tensors of pseudorandom and interval values // 5.4 PyTorch tensor operations and broadcasting // 5.5 PyTorch tensors vs. native Python lists // Summary // 6 Core PyTorch: Autograd, optimizers, and utilities // 6.1 Understanding the basics of autodiff // 6.2 Linear regression using PyTorch automatic differentiation // 6.3 Transitioning to PyTorch optimizers for gradient descent // 6.4 Getting started with data set batches for gradient descent // 6.5 Data set batches with PyTorch Dataset and DataLoader ///
6.6 Dataset and DataLoader classes for gradient descent with batches // Summary // 7 Serverless machine learning at scale // 7.1 What if a single node is enough for my machine learning model? // 7.2 Using IterableDataset and ObjectStorageDataset // 7.3 Gradient descent with out-of-memory data sets // 7.4 Faster PyTorch tensor operations with GPUs // 7.5 Scaling up to use GPU cores // Summary // 8 Scaling out with distributed training // 8.1 What if the training data set does not fit in memory? // 8.1.1 Illustrating gradient accumulation // 8.1.2 Preparing a sample model and data set // 8.1.3 Understanding gradient descent using out-of-memory data shards // 8.2 Parameter server approach to gradient accumulation // 8.3 Introducing logical ring-based gradient descent // 8.4 Understanding ring-based distributed gradient descent // 8.5 Phase 1: Reduce-scatter // 8.6 Phase 2: All-gather // Summary // Part 3 Serverless machine learning pipeline // 9 Feature selection // 9.1 Guiding principles for feature selection // 9.1.1 Related to the label // 9.1.2 Recorded before inference time // 9.1.3 Supported by abundant examples // 9.1.4 Expressed as a number with a meaningful scale // 9.1.5 Based on expert insights about the project // 9.2 Feature selection case studies // 9.3 Feature selection using guiding principles // 9.3.1 Related to the label // 9.3.2 Recorded before inference time // 9.3.3 Supported by abundant examples // 9.3.4 Numeric with meaningful magnitude // 9.3.5 Bring expert insight to the problem // 9.4 Selecting features for the DC taxi data set // Summary // 10 Adopting PyTorch Lightning // 10.1 Understanding PyTorch Lightning // 10.1.1 Converting PyTorch model training to PyTorch Lightning // 10.1.2 Enabling test and reporting for a trained model // 10.1.3 Enabling validation during model training // Summary // 11 Hyperparameter optimization //
11.1 Hyperparameter optimization with Optuna // 11.1.1 Understanding loguniform hyperparameters // 11.1.2 Using categorical and log-uniform hyperparameters // 11.2 Neural network layers configuration as a hyperparameter // 11.3 Experimenting with the batch normalization hyperparameter // 11.3.1 Using Optuna study for hyperparameter optimization // 11.3.2 Visualizing an HPO study in Optuna // Summary // 12 Machine learning pipeline // 12.1 Describing the machine learning pipeline // 12.2 Enabling PyTorch-distributed training support with Kaen // 12.2.1 Understanding PyTorch-distributed training settings // 12.3 Unit testing model training in a local Kaen container // 12.4 Hyperparameter optimization with Optuna // 12.4.1 Enabling MLFlow support // 12.4.2 Using HPO for DcTaxiModel in a local Kaen provider // 12.4.3 Training with the Kaen AWS provider // Summary // Appendix A. Introduction to machine learning // A.1 Why machine learning? // A.2 Machine learning at first glance // A.3 Machine learning with structured data sets // A.4 Regression with structured data sets // A.5 Classification with structured data sets // A.6 Training a supervised machine learning model // Appendix B. Getting started with Docker // B.1 Getting started with Docker // B.2 Building a custom image // B.3 Sharing your custom image with the world // index