Chapter 12 Custom Models and Training with TensorFlow

Hands-On Machine Learning with Scikit-Learn, Keras & Tensorflow 2nd Edition by A. Geron

Up until now, we've used only TensorFlow's high-level API, tf.keras, but it already got us pretty far: we built various neural network architectures, including regression and classification nets, Wide & Deep nets, and self-normalizing nets, using all sorts of techniques, such as Batch Normalization, dropout, and learning rate schedules.

In fact, 95% of the use cases you will encounter will not require anything other than tf.keras (and tf.data see Chapter 13). But it's time to dive deeper into TensorFlow and take a look at its lower-level Python API. This will be useful when you need extra control to write custom loss functions, custom metrics, layers, models, initializers, regularizers, weight constraints, and more. You may even need to fully control the training loop itself, for ezample to apply special transformations or constraints to the gradients (beyond just clipping them) or to use multiple optimizers for different parts of the network.

We will cover all these cases in this chapter, and we will also look at how you can boost your custom models and training algorithms using TensorFlow's automatic graph generation feature.

But first, let's take a quick tour of TensorFlow

A Quick Tour of TensorFlow

・Its core is very similar to NumPy, but with GPU support.

・It supports distributed computing (across multiple devices and servers).

・It includes a kind of just-in-time (JIT) compiler that allows it to optimize computations for speed and memory usage. It works by extracting the computation grapf from Python function, then optimizing it (e.g., by pruning unused nodes), and finally running it efficiently (e.g., by automatically running independent operations in parallel).

・Computation graphs can be exported to a portable format, so you can train a TensorFlow model in one environment (e.g., using Python on Linux) and run it in another (e.g., using Java on an Android device).

・It implements autodiff (see Chapter 10 and Appendix D) and provides some excellent optimizers, such as RMSProp and Nadam (see Chapter 11), so you can easily minimize all sorts of loss functions.

TensorFlow runs not only on Windows, Linux, and macOS, but also on mobile devices (using TensorFlow Lite), including both iOS and Android (see Chapter 19). If you do not use the Python API, there are C++, Java, Go, and Swift APIs. There is even a JavaScript implementation called TensorFlow.js that makes it possible to run your models directry in your browser.

Using TensorFlow like NumPy

Tensors and Operations

You can create a tensor with tf.constant( ).

For example, here is a tensor representing a matrix with two rows and three columns of floats:

>>> tf.constant( [ [ 1., 2., 3. ], [ 4., 5., 6. ] ] ) # matrix

<tf.Tensor: id=0, shape=(2, 3), dtype=float32, numpy=

array( [ [ 1., 2., 3. ],

[ 4., 5., 6. ] ], dtype=float32,)>

>>> tf.constant(42) # scalar

<tf.Tensor: id=1, shape=( ), dtype=int32, numpy=42>

just like an ndarray, a tf.Tensor has a shape and a data type (dtype):

>>> t = tf.constant( [ [ 1., 2., 3. ], [ 4., 5., 6. ] ] )

>>> t.shape

TensorShape( [ 2, 3 ] )

>>> t.type

tf.float32

Indexing works much like in KumPy:

>>> t[ : , 1: ]

<tf.Tensor: id=5, shape=(2, 2), dtype=float32, numpy=

array( [ [ 2., 3. ],

[ 5., 6. ] ], dtype=float32)>

>>> t[ . . . , 1, tf.newaxis]

<tf.Tensor: id=15, shape=(2,1), dtype=float32, numpy=

array( [ [ 2. ],

[ 5. ] ], dtype=float32)>

Most importantly, all sorts of tensor operations are available:

>>> t + 10

<tf.Tensor: id=18, shape=(2, 3), dtype=float32, numpy=

array( [ [ 11., 12., 13. ],

[ 14., 15., 16. ] ], dtype=float32)>

>>> tf.square(t)

<tf.Tensor: id=20, shape=(2, 3), dtype=float32, numpy=

array( [ [ 1., 4., 9. ],

[ 16., 25., 36. ] ], dtype=float32)>

>>> t @ tf.transpose(t)

<tf.Tensor: id=24, shape=(2, 2), dtype=float32, numpy=

array( [ [ 14., 32. ],

[ 32., 77. ] ], dtype=float32)>

Note that writing t + 10 is equivalent to calling tf.add(t, 10) (indeed, Python calls the magic method t.__add__(19), which just calls tf.add(t, 10)). Other operators like - and * are also supported. The @ operator was added in Python 3.5, for matrix multiplication: it is equivalent to calling the tf.matmul( ) function.

Keras' Low-Level API

The Keras API has its own low-level API, located in keras.backend. It includes functions like square( ), exp( ), and sqrt( ). In tf.keras, these functions generally just call the corresponding TensorFlow operations. If you want to write code that will be portable to other Keras implementations, you should use these Keras functions. However, they only cover a subset of all functions available in TensorFlow, so in this book we will use the TensorFlow operations directly. Here is as simple example using keras.backend, which is commonly named K for short:

>>> from tensorflow import keras

>>> K = keras.backend

>>> K.square(K.transpose(t)) +10

<tf.Tensor: id=39, shape=(3, 2), dtype=float32, numpy=

array( [ [ 11., 26. ],

[ 14., 35. ],

[ 19., 46. ] ], dtype=float32)>

Tensors and Numpy

Tensor play nice with NumPy: you can create a tensor from a NumPy array, and vise versa. You can even apply TensorFlow operations to NumPy arrays and NumPy operations to tensors:

Notice that NumPy was 64-bit precision by default, while TensorFlow uses 32-bit.

This is because 32-bit precision is generally more than enough for neural networks, plus it runs faster and uses less RAM.

So when you create a tensor from a NumPy array, make sure to set dtype=tf.float32.

Type Conversions

Variables

Other Data Structures

Custamizing Models and Training Algorithms

Custom Loss Functions

Saving and Loading Models That Contain Custom Components

Custom Activation Functions, Initializers, Regularizers, and Constraints

Custom Metrics

Custom Layers