Day 7: Real World Deep Learning
So far we have explored neural networks almost in the vacuum. Although we have provided some illustrations for better clarity, relying an existing framework would allow us to benefit from the knowledge of previous contributors. One such framework is called Hasktorch. Among the practical reasons to use Hasktorch is relying on a mature Torch Tensor library. Another good reason is strong GPU acceleration, which is necessary for almost any serious deep learning project. Finally, standard interfaces rather than reinventing the wheel will help to reduce the boilerplate.
Fun fact: one of Hasktorch contributors is Adam Paszke, the original author of Pytorch.
Today's post is also based on
Day 2: What Do Hidden Layers Do? Day 4: The Importance Of Batch Normalization Day 5: Convolutional Neural Networks Tutorial The source code from this post is available on Github.
The Basics
The easiest way to start with Hasktorch is via Docker:
docker run --gpus all -it --rm -p 8888:8888 \
-v $(pwd):/home/ubuntu/data \
htorch/hasktorch-jupyter:latest-cu11
Now, you may open localhost:8888 in your browser to access Jupyterlab notebooks. Note that you need to select Haskell kernel when creating a new notebook.
If you have never used Torch library before, you may also want to review this tutorial.
MNIST Example
Let's take the familiar MNIST example and see how it can be implemented in Hasktorch.
Imports
{-# LANGUAGE DeriveAnyClass #-}
{-# LANGUAGE DeriveGeneric #-}
{-# LANGUAGE MultiParamTypeClasses #-}
{-# LANGUAGE RecordWildCards #-}
{-# LANGUAGE ScopedTypeVariables #-}
import Control.Exception.Safe
( SomeException (..),
try,
)
import Control.Monad ( forM_, when, (<=<) )
import Control.Monad.Cont ( ContT (..) )
import GHC.Generics
import Pipes hiding ( (~>) )
import qualified Pipes.Prelude as P
import Torch
import Torch.Serialize
import Torch.Typed.Vision ( initMnist )
import qualified Torch.Vision as V
import Prelude hiding ( exp )
The most notable import is the Torch module itself. There are also related helpers such Torch.Vision to handle image data. The function initMnist has type
initMnist :: String -> IO (MnistData, MnistData)
The function is loading MNIST train and test datasets, similar to loadMNIST from previous posts.
It might be also useful to pay attention to Pipes module. It is an alternative to previously used Streamly, which also allows building streaming components.
We also import functions from Control.Monad, which are useful for IO operations.
Finally, we hide exp function in favor of Torch exp, which operates on tensors (arrays)1 rather than floating point scalars:
Torch.exp :: Tensor -> Tensor
Defining Neural Network Architecture
First we define a neural network data structure that contains trained parameters (neural network weights). In the simplest case, it can be a multilayer perceptron (MLP).
data MLP = MLP
{ fc1 :: Linear,
fc2 :: Linear,
fc3 :: Linear
}
deriving (Generic, Show, Parameterized)
This MLP contains three linear layers. Next, we may define a data structure that specifies the number of neurons in each layer:
data MLPSpec = MLPSpec
{ i :: Int,
h1 :: Int,
h2 :: Int,
o :: Int
}
deriving (Show, Eq)
Now, we can define a neural network as a function, similar as we did on Day 5 with a "reversed" composition operator (~>).
(~>) :: (a -> b) -> (b -> c) -> a -> c
f ~> g = g. f
mlp :: MLP -> Tensor -> Tensor
mlp MLP {..} =
-- Layer 1
linear fc1
~> relu
-- Layer 2
~> linear fc2
~> relu
-- Layer 3
~> linear fc3
~> logSoftmax (Dim 1)
We finish by a (log) softmax layer reducing the tensor's dimension 1 (Dim 1). Derivatives of linear, relu, and logSoftmax are already handled by Torch library.
Initial Weights
How do we generate initial random weights? As you may remember from Day 5, we could create a function such as this one:
randNetwork = do
let [i, h1, h2, o] = [784, 64, 32, 10]
fc1 <- randLinear (Sz2 i h1)
fc2 <- randLinear (Sz2 h1 h2)
fc3 <- randLinear (Sz2 h2 o)
return $
MLP { fc1 = fc1
, fc2 = fc2
, fc3 = fc3
}
In our example we do almost the same, except we benefit from applicative functors and Randomizable.
instance Randomizable MLPSpec MLP where
sample MLPSpec {..} =
MLP
<$> sample (LinearSpec i h1)
<*> sample (LinearSpec h1 h2)
<*> sample (LinearSpec h2 o)
We say above that MLP is an instance of the Randomizable typeclass, parametrized by MLPSpec. All we needed to define this instance was to implement a sample function. To generate initial MLP weights, later we can simply write
let spec = MLPSpec 784 64 32 10
net <- sample spec
Train Loop
The core of the neural network training is trainLoop, which enables a single training "epoch". Let us first inspect its type signature.
trainLoop :: Optimizer o => MLP -> o -> ListT IO (Tensor, Tensor) -> IO MLP
This signifies that the function accepts an initial neural network configuration, an optimizer, and a dataset. The optimizer can be a gradient descent (GD), Adam, or other optimizer. The result of the function is a new MLP configuration, as a result of IO call. IO is necessary for instance if we want to print the loss after each iteration. Now, let's take a look at the implementation:
trainLoop model optimizer = P.foldM step begin done. enumerateData
First, we enumerate the dataset with enumerateData. Then, we iterate over (fold) the batches. The step function is an analogy to a step in the gradient descent algorithm:
where
step :: MLP -> ((Tensor, Tensor), Int) -> IO MLP
step model ((input, label), iter) = do
let loss = nllLoss' label $ mlp model input
-- Print loss every 50 batches
when (iter `mod` 50 == 0) $ do
putStrLn $ "Iteration: " ++ show iter ++ " | Loss: " ++ show loss
(newParam, _) <- runStep model optimizer loss 1e-3
return newParam
We calculate a negative log likelihood loss nllLoss' between the ground truth label and the output of our MLP. Note that model is the parameter, i.e. weights of the MLP network. Then, we take advantage of the iteration number iter to print the loss every 50 iterations. Finally, we perform a gradient descent step using our optimizer via runStep :: ... => model -> optimizer -> Loss -> LearningRate -> IO (model, optimizer) and keep only new model newParam. The learning rate here is 1e-3, but can be eventually changed.
The done function is (trivial in this case) finalization of foldM iterations over the MLP model and begin are the initial weights (we use pure to satisfy the type m x requirement).
done = pure
begin = pure model
Putting It All Together
The remaining part is simple. We load the data into batches, specify the number of neurons in our MLP, choose an optimizer, and initialize the random weights.
main = do
(trainData, testData) <- initMnist "data"
let trainMnist = V.MNIST {batchSize = 256, mnistData = trainData}
testMnist = V.MNIST {batchSize = 1, mnistData = testData}
spec = MLPSpec 784 64 32 10
optimizer = GD
net <- sample spec
Then, we train the network for 5 epochs:
net' <- foldLoop net 5 $ \model _ ->
runContT (streamFromMap (datasetOpts 2) trainMnist) $ trainLoop model optimizer. fst
Finally, we may examine the model on test images
forM_ [0 .. 10] $ displayImages net' <=< getItem testMnist
For this purpose may use a function such as
displayImages :: MLP -> (Tensor, Tensor) -> IO ()
displayImages model (testImg, testLabel) = do
V.dispImage testImg
putStrLn $ "Model : " ++ (show. argmax (Dim 1) RemoveDim. exp $ mlp model testImg)
putStrLn $ "Ground Truth : " ++ show testLabel
Running
Iteration: 0 | Loss: Tensor Float [] 12.3775
Iteration: 50 | Loss: Tensor Float [] 1.0952
Iteration: 100 | Loss: Tensor Float [] 0.5626
Iteration: 150 | Loss: Tensor Float [] 0.6660
Iteration: 200 | Loss: Tensor Float [] 0.4771
Iteration: 0 | Loss: Tensor Float [] 0.5012
Iteration: 50 | Loss: Tensor Float [] 0.4058
Iteration: 100 | Loss: Tensor Float [] 0.3095
Iteration: 150 | Loss: Tensor Float [] 0.4237
Iteration: 200 | Loss: Tensor Float [] 0.3433
Iteration: 0 | Loss: Tensor Float [] 0.3671
Iteration: 50 | Loss: Tensor Float [] 0.3206
Iteration: 100 | Loss: Tensor Float [] 0.2467
Iteration: 150 | Loss: Tensor Float [] 0.3420
Iteration: 200 | Loss: Tensor Float [] 0.2737
Iteration: 0 | Loss: Tensor Float [] 0.3054
Iteration: 50 | Loss: Tensor Float [] 0.2779
Iteration: 100 | Loss: Tensor Float [] 0.2161
Iteration: 150 | Loss: Tensor Float [] 0.2933
Iteration: 200 | Loss: Tensor Float [] 0.2289
Iteration: 0 | Loss: Tensor Float [] 0.2693
Iteration: 50 | Loss: Tensor Float [] 0.2530
Iteration: 100 | Loss: Tensor Float [] 0.1979
Iteration: 150 | Loss: Tensor Float [] 0.2616
Iteration: 200 | Loss: Tensor Float [] 0.1986
#%%*****
::: %
%:
:%
#:
:%
%.
#=
:%.
=#
Model : Tensor Int64 [1] [ 7]
Ground Truth : Tensor Int64 [1] [ 7]
%%%#
%# %
. #%
:%:
%+
*%
%=
%%
%%%%++%%%=
==%%=.
Model : Tensor Int64 [1] [ 2]
Ground Truth : Tensor Int64 [1] [ 2]
.-
=
%
.#
=:
@
#
++
%:
%
Model : Tensor Int64 [1] [ 1]
Ground Truth : Tensor Int64 [1] [ 1]
%.
*%-
%%%%#
:%%+:%-
%% -%.
% .@+
% %%.
% #%*
%%%%%%
:%%%-
Model : Tensor Int64 [1] [ 0]
Ground Truth : Tensor Int64 [1] [ 0]
= +
% %
+. %
% %:
+ %
%--=*%
:: +%
=%
=%
*
Model : Tensor Int64 [1] [ 4]
Ground Truth : Tensor Int64 [1] [ 4]
%@
@:
=@
@%
@
:@
%#
@
@
+
Model : Tensor Int64 [1] [ 1]
Ground Truth : Tensor Int64 [1] [ 1]
% %
% %
+# -+
+%*::*%
:%==%+
%
++
%
%-+
*
Model : Tensor Int64 [1] [ 4]
Ground Truth : Tensor Int64 [1] [ 4]
+
%%+
.%*%%
-: *%
-#-%%.
%% =#
%
.%
#.
%
Model : Tensor Int64 [1] [ 9]
Ground Truth : Tensor Int64 [1] [ 9]
..=.
.%%%%%%
::%+:
%
%
%=
%%%%%%+
:%%%%
%%%%
%#
Model : Tensor Int64 [1] [ 6]
Ground Truth : Tensor Int64 [1] [ 5]
+%%%#
+%* .%%
:%. .#%+
%@%%%%*
+%-
-%#
%%
%%
%=
@
Model : Tensor Int64 [1] [ 9]
Ground Truth : Tensor Int64 [1] [ 9]
==:
%%**%%
.% %:
*- +#
% :#
# :#
-# +#
-# .%
# +%:
#%%%%=
Model : Tensor Int64 [1] [ 0]
Ground Truth : Tensor Int64 [1] [ 0]
See the complete project on Github. For suggestions about the content feel free to open a new issue.
Summary
Today we have learned the basics of Hasktorch library. The most important is that the principles from our previous days still apply. Therefore, the transition to the new library was quite straightforward. With a few minor changes, this example could be run on a graphics processing unit accelerator.
Further Reading Hasktorch:
Hasktorch tutorial https://hasktorch.github.io/tutorial/02-tensors.html Hasktorch examples https://github.com/hasktorch/hasktorch/tree/master/examples Hasktorch documentation http://hasktorch.org/docs.html Applicative functors http://learnyouahaskell.com/functors-applicative-functors-and-monoids