Chapter 19 Training and Deploying TensorFlow Models at Scale

Hands-On Machine Learning with Scikit-Learn, Keras & Tensorflow 2nd Edition by A. Geron

Chapter 2は、KaggleのTitanicと並行して勉強していたので、何をどこまで学んだか忘れてしまったが、章のタイトルが"End-to-End Machine Learning Project"となっていて、最後の方に、"Lauch, Monitor, and Maintain Your System"という節があって、開発した機械学習モデルを市場に出して運用するところまで説明されていたのが強く印象に残っている。

プログラム開発は、モノづくりであって、市場に出してナンボ。

誰が、どこで、どのように使うのかを想定しておかないと、収集したデータも、開発したプログラムも、使われることなく埋もれてしまうことになりかねない。

研究開発して、論文発表して終わりということなら、関係ないかもしれない。

それでも、この分野の今後の発展を考えるならば、常に変化している最先端の開発環境を使えるようにしておくことも含めて、勉強しておこう。

ということで、まず、第2章の該当部分を復習することから始めよう。

Chapter 2: End-to-End Machine Learning Project

Lauch, Monitor, and Maintain Your System

Perfect, you got approval to launch!

You now need to get your solution ready for production (e.g., polich the code, write documantation and test, and so on).

Then you can deploy your model to your production environment.

One way to do this is to save the trained Scikit-Learn model (e.g., using joblib), including the full preprocessing and prediction pipeline, then load this trained model within your production environment and use it to make predictions by calling its predict( ) method.

For example, perhaps the model will be used within a website:

the user will type in some data about a new distinct and click the Estimate Price button.

This will send a query containing the data to the web server, which will forward it to your web application, and finally your code will simply call the model's predict( ) method (you want to load the model upon server startup, rather than every time the model is used).

Alternatively, you can wrap the model within a dedicated web service that your web application can query through a REST API.

REST API: In a nutshell, a REST (or RESTful) API is an HTTP-based API that follows some conventions, such as using standard HTTP verbs to read, update, or delete resources (GET, POST, PUT, and DELETE) and using JSON for the inputs and outputs.

This makes it easier to upgrade your model to new versions without interrupting the main application.

It also simplifies scaling, since you can start as many web services as needed and load-balance the requests coming from your web application across these web services.

Moreover, it allows your web application to use any language, not just Python.

Anothe popular strategy is to deploy your model on the cloud, for eample on Google Cloud AI Platform (formerly known as Google Cloud ML Engine):

just save your model using joblib and upload it to Google Cloud Storage (GCS), then head over to Google Cloud AI Platform and create a new model version, pointing it to the GCS file.

That's it!

This gives you a simple web service that takes care of load balancing and scaling for you.

It takes JSON requests containing the input data (e.g., of a district) and return JSON responses containing the predictions.

You can then use this web service in your website (or whatever production environment you are using).

As we will see in Chapter 19, deploying TensorFlow models on AI Platform is not much different from deploying Scikit-Learn models.

But deployment is not the end of the story.

You also need to write monitoring code to check your system's live performance at regular intervals and trigger alerts when it drops.

This could be a steep drop, likely due to a broken component in your infrastructure, but be aware that it could also be a gentle decay that could easily go unnoticed for a long time.

This is quite common because models tend to "rot" over time:

indeed, the world changes, so if the model was trained with last year's data, it may not be adapted to today's data.

Even a model trained to classify pictures of cats and dogs may need to be retrained regularly, not because cameras keep changing, along with image formats, sharpness, brightness, and size ratios.

Moreover, people may love different breeds next year, or they may decide to dress their pets with tiny hats - Who knows?

So you need to monitor your model's live performance.

But howdo you that?

Well, it depends.

In some cases the model's performance can be infered from downstream metrics.

Fore example, if your model is part of a recommender system and it suggests products that the users may be interested in, then it's easy to monitor the number of recommended products sold each day.

If this number drops (compared to nonrecommended products), then the prime suspect is the model.

This may be because the data pipeline is broken, or perhaps the model needs to be retrained on fresh data (as we will discuss shortly).

However, its not always possible to determine the model's performance without any human analysis.

For example, suppose you trained an image classification model (see Chapter 3) to detect several product defects on a production line.

How can you get an alert if the model's performance drops, before thousands of defective products get shipped to your cliants?

One solution is to send to human raters a sample of all the pictures that the model classified (especially pictures that the model wasn't so sure about).

Depending on the task, the raters may need to be experts, or they could be nonspecialists, such as workers on a crowdsourcing platform (e.g., Amazon Mechanical Turk).

In some applications they could even be the users themselves, responding for example via surveys or repurposed captchas.

Either way, you need to put in place a monitoring system (with or without human raters to evaluate the live model), as well as all the relevant processes to define what to do in case of failures and how to prepare for them.

Unfortunately, this can be a lot of work.

In fact, it is often much more work than building and training a model.

そりゃあ、遊び用のモデルと違って、生産工場で使うモデルは、根本的に違った設計になるのは当然である。

初期性能の維持は当然であり、欠陥の見逃しなど許されるはずがない。

最低でも、不良品の検出と良品の検出はパラレルで走らさないといけない。

良品と不良品の検知の経験を、モデルに対して定期的にフィードバックして、モデルの性能を向上させていくべきものでしょう。

ハードウエアの向上にも対応しないといけないし、それによる性能アップも必要。

複数のモデルを多重に走らせることが必要だろうな。

性能だけなら、凝ったディープラーニングモデルが高い性能を示すかもしれないが、そのDNNモデルにしても、簡単なものから複雑なものまでパラレルに走らせばいいし、予測能力の数値は低くても、安定して動作する機械学習モデルも並行して走らせておけばいいだろうし、・・・。

ランダムにサンプリングした高精度画像をオフラインで定期的に、あるいは、徹底的に検査・精査することも必要だろうし・・・。

画像も、可視だけでなく、赤外とか紫外とか、さらに、レーザー照射して干渉光を利用分光するとか、高速ラマン分光を使うとか、X線や電子線を照射して特性X線を検出するとか、・・・。

If the data keeps evolving, you will need to update your datasets and retrain your model regularly.

You should probably automate the whole process as much as possible.

Here are a few things you can automate:

・Collect fresh data regularly and label it (e.g., using human raters).

・Write a script to train the model and fine-tune the hyperparameters automatically.

This script could run automatically, fore example every day or every week, depending on your needs.

・Write another script that will evaluate both the new model and the previous model on the updated test set, and deploy the model to production if the performance has not decreased (if it did, make sure you investigate why).

You should also make sure you evaluate the model's input data quality.

Sometimes performance will degrade slightly because of a poor-quality signal (e.g., a malfunctioning sensor sending random values, or another team's output becoming stale), but it may take a while before your system's performance degrades enough to trigger an alart.

If you monitor your model's inputs, you may catch this earlier.

For example, you could trigger an alert if more and more inputs are missing a feature, or if its mean or standard deviation drifts too far from the training set, or a categorical feature starts containing new categories.

Finally, make sure you keep backups of every model you create and have the process and tools in place to roll back to a previous model quickly, in case the new model starts failing badly for some reason.

Having backups also makes it possible to easily compare new models with previous ones.

Similarly, you should keep backups of every version of your datasets so that you can roll back to a previous dataset if the new one ever gets corrupted (e.g., if the fresh data that gets added to it turns out to be full of outliers).

Having backups of your datasets also allows you to evaluate any model against any previous dataset.

You may want to create several subsets of the test set in order to evaluate how well your model performs on specific parts of the data.

For example, you may want to have a subset containing only the most recent data, or a test set for specific kinds of inputs (e.g., districts located inland versus districts located near the ocean).

This will give you a deeper understanding of your model's strengths and weaknesses.

As you can see, Machine Learning involves quite a lot of infrastructure, so don't be surprized if your first ML project takes a lot of effort and time to build and deploy to production.

Fortunately, once all the infrastructure is in place, going from idea to production will be much faster.

Chapter 19 Training and Deploying TensorFlow Models at Scale

A great solution to scale up your service, as we will see in this chapter, is to use TF Serving, either on your own hardware infrastructure or via a cloud service such as Google Cloud AI Platform.

It will take care of efficiently serving your model, handle graceful model transitions, and more.

If you use the cloud platform, you will also get many extra features, such as powerful monitoring tools.

In this chapter we will look at how to deploy models, first to

f:id:AI_ML_DL:20200520094423p:plain — style=140 iteration=1

f:id:AI_ML_DL:20200520094510p:plain — style=140 iteration=20

f:id:AI_ML_DL:20200520094550p:plain — style=140 iteration=500

AI_ML_DL’s diary

人工知能、機械学習、ディープラーニングの日記

Chapter 19 Training and Deploying TensorFlow Models at Scale