Chapter 19 Training and Deploying TensorFlow Models at Scale
Hands-On Machine Learning with Scikit-Learn, Keras & Tensorflow 2nd Edition by A. Geron
Chapter 2は、KaggleのTitanicと並行して勉強していたので、何をどこまで学んだか忘れてしまったが、章のタイトルが"End-to-End Machine Learning Project"となっていて、最後の方に、"Lauch, Monitor, and Maintain Your System"という節があって、開発した機械学習モデルを市場に出して運用するところまで説明されていたのが強く印象に残っている。
Chapter 2: End-to-End Machine Learning Project
Lauch, Monitor, and Maintain Your System
Perfect, you got approval to launch!
You now need to get your solution ready for production (e.g., polich the code, write documantation and test, and so on).
Then you can deploy your model to your production environment.
One way to do this is to save the trained Scikit-Learn model (e.g., using joblib), including the full preprocessing and prediction pipeline, then load this trained model within your production environment and use it to make predictions by calling its predict( ) method.
For example, perhaps the model will be used within a website:
the user will type in some data about a new distinct and click the Estimate Price button.
This will send a query containing the data to the web server, which will forward it to your web application, and finally your code will simply call the model's predict( ) method (you want to load the model upon server startup, rather than every time the model is used).
Alternatively, you can wrap the model within a dedicated web service that your web application can query through a REST API.
REST API: In a nutshell, a REST (or RESTful) API is an HTTP-based API that follows some conventions, such as using standard HTTP verbs to read, update, or delete resources (GET, POST, PUT, and DELETE) and using JSON for the inputs and outputs.
This makes it easier to upgrade your model to new versions without interrupting the main application.
It also simplifies scaling, since you can start as many web services as needed and load-balance the requests coming from your web application across these web services.
Moreover, it allows your web application to use any language, not just Python.
This gives you a simple web service that takes care of load balancing and scaling for you.
You can then use this web service in your website (or whatever production environment you are using).
As we will see in Chapter 19, deploying TensorFlow models on AI Platform is not much different from deploying Scikit-Learn models.
But deployment is not the end of the story.
You also need to write monitoring code to check your system's live performance at regular intervals and trigger alerts when it drops.
This could be a steep drop, likely due to a broken component in your infrastructure, but be aware that it could also be a gentle decay that could easily go unnoticed for a long time.
This is quite common because models tend to "rot" over time:
indeed, the world changes, so if the model was trained with last year's data, it may not be adapted to today's data.
Even a model trained to classify pictures of cats and dogs may need to be retrained regularly, not because cameras keep changing, along with image formats, sharpness, brightness, and size ratios.
Moreover, people may love different breeds next year, or they may decide to dress their pets with tiny hats - Who knows?
So you need to monitor your model's live performance.
But howdo you that?
Well, it depends.
In some cases the model's performance can be infered from downstream metrics.
Fore example, if your model is part of a recommender system and it suggests products that the users may be interested in, then it's easy to monitor the number of recommended products sold each day.
If this number drops (compared to nonrecommended products), then the prime suspect is the model.
This may be because the data pipeline is broken, or perhaps the model needs to be retrained on fresh data (as we will discuss shortly).
However, its not always possible to determine the model's performance without any human analysis.
For example, suppose you trained an image classification model (see Chapter 3) to detect several product defects on a production line.
How can you get an alert if the model's performance drops, before thousands of defective products get shipped to your cliants?
One solution is to send to human raters a sample of all the pictures that the model classified (especially pictures that the model wasn't so sure about).
In some applications they could even be the users themselves, responding for example via surveys or repurposed captchas.
Either way, you need to put in place a monitoring system (with or without human raters to evaluate the live model), as well as all the relevant processes to define what to do in case of failures and how to prepare for them.
Unfortunately, this can be a lot of work.
In fact, it is often much more work than building and training a model.
If the data keeps evolving, you will need to update your datasets and retrain your model regularly.
You should probably automate the whole process as much as possible.
Here are a few things you can automate:
・Collect fresh data regularly and label it (e.g., using human raters).
・Write a script to train the model and fine-tune the hyperparameters automatically.
This script could run automatically, fore example every day or every week, depending on your needs.
・Write another script that will evaluate both the new model and the previous model on the updated test set, and deploy the model to production if the performance has not decreased (if it did, make sure you investigate why).
You should also make sure you evaluate the model's input data quality.
Sometimes performance will degrade slightly because of a poor-quality signal (e.g., a malfunctioning sensor sending random values, or another team's output becoming stale), but it may take a while before your system's performance degrades enough to trigger an alart.
If you monitor your model's inputs, you may catch this earlier.
For example, you could trigger an alert if more and more inputs are missing a feature, or if its mean or standard deviation drifts too far from the training set, or a categorical feature starts containing new categories.
Finally, make sure you keep backups of every model you create and have the process and tools in place to roll back to a previous model quickly, in case the new model starts failing badly for some reason.
Having backups also makes it possible to easily compare new models with previous ones.
Similarly, you should keep backups of every version of your datasets so that you can roll back to a previous dataset if the new one ever gets corrupted (e.g., if the fresh data that gets added to it turns out to be full of outliers).
Having backups of your datasets also allows you to evaluate any model against any previous dataset.
You may want to create several subsets of the test set in order to evaluate how well your model performs on specific parts of the data.
For example, you may want to have a subset containing only the most recent data, or a test set for specific kinds of inputs (e.g., districts located inland versus districts located near the ocean).
This will give you a deeper understanding of your model's strengths and weaknesses.
As you can see, Machine Learning involves quite a lot of infrastructure, so don't be surprized if your first ML project takes a lot of effort and time to build and deploy to production.
Fortunately, once all the infrastructure is in place, going from idea to production will be much faster.
Chapter 19 Training and Deploying TensorFlow Models at Scale
A great solution to scale up your service, as we will see in this chapter, is to use TF Serving, either on your own hardware infrastructure or via a cloud service such as Google Cloud AI Platform.
It will take care of efficiently serving your model, handle graceful model transitions, and more.
If you use the cloud platform, you will also get many extra features, such as powerful monitoring tools.
In this chapter we will look at how to deploy models, first to