Deep Learning Analysis Using Large Model Support

 · 8 mins read

Contents

(Source: https://miro.medium.com/max/3512/1*d-ZbdImPx4zRW0zK4QL49w.jpeg)

Deep Learning Analysis Using Large Model Support

Optimize your Machine Learning model memory consumption with IBM Large Model Support.

Introduction

Memory management is now a really important topic in Machine Learning. Because of memory constraints, it is becoming quite common to train Deep Learning models using cloud tools such as Kaggle and Google Colab thanks to their free NVIDIA Graphical Processing Unit (GPU) support. Nonetheless, memory can still be a huge constraint in the cloud when working with large amounts of data.

In my last article, I explained how to speed up Machine Learning workflow execution. This article aims instead to explain to you how to efficiently reduce memory usage when implementing Deep Learning models. In this way, you might be able to train your Deep Learning model using the same amount of memory (even if before you couldn’t because of memory errors).

There are three main reasons for which a model can lead to running out of memory:

  • Model depth/complexity = number of layers and nodes in a Neural Network.
  • Data Size = number of samples/features in the dataset used.
  • Batch Size = number of samples that get propagated through a Neural Network.

One solution to this problem has traditionally been to reduce the model size by trying to get rid of less relevant features during the preprocessing stage. This can be done using either Feature Importance or Feature Extraction techniques (eg. PCA, LDA).

Using this approach can possibly lead to reduced noise (decreasing the chance of overfitting) and faster training times. One downside though to this approach can be a consistent decrease in accuracy.

If the model needs a high complexity to capture all the important characteristics of a dataset, reducing the dataset size will in fact inevitably lead to worse performances. In this case, Large Model Support can be the solution to this problem.

Large Model Support

Large Model Support (LMS) is a Python library recently launched by IBM. This library has been ideated in order to train large Deep Learning models which can’t fit in GPU memory. In fact, GPUs have generally smaller memory space compared to Central Processing Units (CPUs).

When Neural Networks are implemented using libraries such as Tensorflow and PyTorch, a set of mathematical operations gets automatically generated to construct this model. These mathematical operations can then be represented using computational graphs.

A computational graph is a directed graph where the nodes correspond to operations or variables. Variables can feed their value into operations, and operations can feed their output into other operations. This way, every node in the graph defines a function of the variables.

— deep ideas [1]

The values that enter and comes out of nodes in computational graphs are called tensors (multi-dimensional arrays).

In Figure 1 is represented a simple example of how a mathematical operation can be represented using a computational graph ( z = (x + y) ∗ (x − 5) ):


Figure 1: Computational Graph [2]

LMS is able to levitate GPUs memory problems by redesigning Neural Networks computational graphs. This is done by making possible to transfer tensor operations by storing intermediate results on CPUs (instead of GPUs).

The IBM documentation outlines three different methods to use Large Model Support using the Tensorflow library:

  • Session-based training.
  • Estimator-based training.
  • Keras-based training.

In this article, I will provide an example using Keras-based training. If you are interested in finding out more about the two other methods, IBM documentation is a great place where to start [3].

When working with LMS there are two main parameters we can tune in order to improve our model efficiency. The objective is to be able to find out the minimum number of tensors we need to swap out without incurring in memory errors.

The two main parameters to tune are:

  • n_tensors = number of swapped tensors (eg. swiping out more tensors than needed, can lead to communication overheads).
  • lb = how soon tensors are swapped back in before use (eg. using a low value for lb can make GPU training pause).

Demonstration

I will now walk you through a simple example to get you started with LMS. All the code used for this exercise is available in this Google Colaboratory notebook and on my GitHub.

In this example, I will train a simple Neural Network using first Keras with Large Model Support and then just plain Keras. I both cases, I will record the memory usage required for the training.

Preprocessing

In order to install all the required dependencies to follow this example, just run the following cell in your notebook and enable your GPU environment (eg. Kaggle, Google Colab).

! git clone
! pip install ./tensorflow-large-model-support
! pip install memory_profiler

Once is everything set up, we can then import all the necessary libraries.

from tensorflow_large_model_support import LMSKerasCallback
import numpy as np
import pandas as pd
from keras.utils import np_utils
from sklearn.metrics import accuracy_score
from sklearn import preprocessing
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers.core import Dense, Dropout, Activation, Flatten, Reshape
from keras.layers.convolutional import Convolution1D, Convolution2D, MaxPooling2D
from keras.utils import np_utils
view raw LMS1.py hosted with ❤ by GitHub

In order to record the memory usage, I decided to use Python memory_profiler.

%load_ext memory_profiler
# Used to examining memory usage
view raw LMS2.py hosted with ❤ by GitHub

Successively, I defined the LMS Keras Callback which will be used during training. The definition of a Callback according to Keras documentation is:

A callback is a set of functions to be applied at given stages of the training procedure. You can use callbacks to get a view on internal states and statistics of the model during training.

— Keras Documentation [4]

Callbacks are typically used to take control of a model training process by automating certain tasks during every training iteration (in this case by adding Large Model Support optimization).

# LMSKerasCallback and LMS share a set of keyword arguments. Here we just
# use the default options.
lms_callback = LMSKerasCallback()
view raw LMS3.py hosted with ❤ by GitHub

I then decided to fabricate a simple dataset of 200000 rows using Gaussian Distributions consisting of three features and two labels (0/1).

The values of the means and standard deviations of the distributions have been chosen so that to make this classification problem fairly easy (linearly separable data).

# Creating a linearly separable dataset using Gaussian Distributions.
# The first half of the number in Y is 0 and the other half 1.
# Therefore I made the first half of the two features quite different from
# the second half of the features (setting the value of the means quite
# similar) so that make quite simple the classification between the
# classes (the data is linearly separable).
dataset_len = 2000000
dlen = int(dataset_len/2)
X_11 = pd.Series(np.random.normal(2,2,dlen))
X_12 = pd.Series(np.random.normal(9,2,dlen))
X_1 = pd.concat([X_11, X_12]).reset_index(drop=True)
X_21 = pd.Series(np.random.normal(1,3,dlen))
X_22 = pd.Series(np.random.normal(7,3,dlen))
X_2 = pd.concat([X_21, X_22]).reset_index(drop=True)
X_31 = pd.Series(np.random.normal(3,1,dlen))
X_32 = pd.Series(np.random.normal(3,4,dlen))
X_3 = pd.concat([X_31, X_32]).reset_index(drop=True)
Y = pd.Series(np.repeat([0,1],dlen))
df = pd.concat([X_1, X_2, X_3, Y], axis=1)
df.columns = ['X1', 'X2', 'X3', 'Y']
df.head()
view raw LMS4.py hosted with ❤ by GitHub


Figure 2: Dataset Head

Once created the dataset, I divided it into features and labels and then defined a function to preprocess it.

X = df.drop(['Y'], axis = 1).values
y = df['Y']
def preproces(df, X, y, train_size = 0.80):
# label_encoder object knows how to understand word labels.
label_encoder = preprocessing.LabelEncoder()
# Encode labels
y = label_encoder.fit_transform(y)
# identify shape and indices
num_rows, num_columns = df.shape
delim_index = int(num_rows * train_size)
# Splitting the dataset in training and test sets
X_train, y_train = X[:delim_index, :], y[:delim_index]
X_test, y_test = X[delim_index:, :], y[delim_index:]
# Checking sets dimensions
print('X_train dimensions: ', X_train.shape, 'y_train: ', y_train.shape)
print('X_test dimensions:', X_test.shape, 'y_validation: ', y_test.shape)
# Checking dimensions in percentages
total = X_train.shape[0] + X_test.shape[0]
print('X_train Percentage:', (X_train.shape[0]/total)*100, '%')
print('X_test Percentage:', (X_test.shape[0]/total)*100, '%')
return X_train, y_train, X_test, y_test
X_train, y_train, X_test, y_test = preproces(df, X, y)
view raw rapids3.py hosted with ❤ by GitHub

Now that we got our Training/Test sets, we are finally ready to get started with Deep Learning. I, therefore, defined a simple Sequential model for binary classification and selected a batch size of 8 elements.

model = Sequential()
model.add(Dense(12, activation='relu', input_dim=df.shape[1]-1))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adadelta', metrics=['accuracy'])
batch_size = 8
view raw LMS5.py hosted with ❤ by GitHub

Keras and Large Model Support

When using LMS, a Keras model is trained using Keras fit_generator function. The first input this function needs, is a generator. A generator is a function used to generate a dataset on multiple cores in real-time and then input its results in a Deep Learning model [5].

In order to create the generator function used in this example, I took reference to this implementation.

In case you are interested in a more detailed explanation about Keras Generators, it is available here.

def generator(X_data, y_data, batch_size):
samples_per_epoch = X_data.shape[0]
number_of_batches = samples_per_epoch/batch_size
counter=0
while 1:
X_batch = np.array(X_data[batch_size*counter:batch_size*(counter+1)]).astype('float32')
y_batch = np.array(y_data[batch_size*counter:batch_size*(counter+1)]).astype('float32')
counter += 1
yield X_batch,y_batch
#restart counter to yeild data in the next epoch as well
if counter >= number_of_batches:
counter = 0
view raw LMS6.py hosted with ❤ by GitHub

Once defined our generator function I then trained our model using the LMS Keras Callback defined before.

%%memit
model.fit_generator(generator(X_train, y_train,batch_size),
epochs=2,steps_per_epoch = X_train.shape[0]/batch_size,
callbacks=[lms_callback])
# Peak Memory = Memory usage of the Python interpreter after that line
# has been executed.
# Increment = represents the difference in memory of the current line
# with respect to the last one.
view raw LMS7.py hosted with ❤ by GitHub

In the code above, I additionally added on the first line the *%%memit *command to print out the memory usage of running this cell. The results are shown below:

Epoch 1/2 200000/200000 [==============================]
- 601s 3ms/step - loss: 0.0222 - acc: 0.9984

Epoch 2/2 200000/200000 [==============================]
- 596s 3ms/step - loss: 0.0203 - acc: 0.9984

peak memory: 2834.80 MiB, increment: 2.88 MiB

The registered Peak Memory to train this model using LMS was equal to 2.83GB and the Increment 2.8MB.

Finally, I decided to test the accuracy of our trained model to validate our training results.

LMS_preds = model.predict_classes(X_test)
LMS_acc = accuracy_score(y_test, LMS_preds)
print("Model accuracy using Large Model Support:", LMS_acc*100, '%')
view raw LMS8.py hosted with ❤ by GitHub
Model accuracy using Large Model Support: 99.9995 %

Keras

Repeating the same procedure using plain Keras, the following results were obtained:

%%memit
#without generator
model.fit(x = np.array(X_train), y = np.array(y_train), batch_size = batch_size, epochs = 2)
view raw LMS9.py hosted with ❤ by GitHub
Epoch 1/2 1600000/1600000 [==============================]
- 537s 336us/step - loss: 0.0449 - acc: 0.9846

Epoch 2/2 1600000/1600000 [==============================]
- 538s 336us/step - loss: 0.0403 - acc: 0.9857

peak memory: 2862.26 MiB, increment: 26.15 MiB

The registered Peak Memory to train this model using Keras was equal to 2.86GB and the Increment 26.15MB.

Testing our Keras model lead instead to 98.47% accuracy.

preds = model.predict_classes(X_test)
sk_pred = np.round(preds)
sk_acc = accuracy_score(y_test, sk_pred)
print("Model accuracy using Sklearn:", sk_acc*100, '%')
view raw LMS10.py hosted with ❤ by GitHub
Model accuracy using Sklearn: 98.4795 %

Evaluation

Comparing the results obtained using *Keras + LMS vs Plain Keras *it can be noticed that using LMS can lead to a decrease in memory consumption and as well to an increase in model accuracy. LMS performance can even be even improved if given more GPU/CPU resources (which can be used to optimise training) and using larger datasets.

Bibliography

[1] Deep Learning From Scratch I: Computational Graphs — deep ideas. Accessed at: http://www.deepideas.net/deep-learning-from-scratch-i-computational-graphs/

[2] TFLMS: Large Model Support in TensorFlow by Graph Rewriting. Tung D. Le, Haruki Imai et al. Accessed at: https://arxiv.org/pdf/1807.02037.pdf

[3] Getting started with TensorFlow Large Model Support (TFLMS) — IBM Knowledge Center. Accessed at: https://www.ibm.com/support/knowledgecenter/en/SS5SF7_1.5.4/navigation/pai_tflms.html

[4] Kears Documentation. Docs -> Callbacks. Accessed at: https://keras.io/callbacks/?source=post_page—————————

[5] A detailed example of how to use data generators with Keras.
Shervine Amidi. Accessed at: https://stanford.edu/~shervine/blog/keras-how-to-generate-data-on-the-fly

Contacts

If you want to keep updated with my latest articles and projects follow me on Medium and subscribe to my mailing list. These are some of my contacts details:

Buy Me A Coffee

Subscribe to my mailing list to be updated about my new blog posts!

* indicates required