# Day 7: Real World Deep Learning

So far we have explored neural networks almost in the vacuum. Although we have provided some illustrations for better clarity, relying an existing framework would allow us to benefit from the knowledge of previous contributors. One such framework is called Hasktorch. Among the practical reasons to use Hasktorch is relying on a mature Torch Tensor library. Another good reason is strong GPU acceleration, which is necessary for almost any serious deep learning project. Finally, standard interfaces rather than reinventing the wheel will help to reduce the boilerplate.

Fun fact: one of Hasktorch contributors is Adam Paszke, the original author of Pytorch.

---

# Today's post is also based on

Day 2: What Do Hidden Layers Do?
Day 4: The Importance Of Batch Normalization
Day 5: Convolutional Neural Networks Tutorial
The source code from this post is available on Github.

---

# The Basics
The easiest way to start with Hasktorch is via Docker:
```bash
  docker run --gpus all -it --rm -p 8888:8888 \
    -v $(pwd):/home/ubuntu/data \
    htorch/hasktorch-jupyter:latest-cu11
```
Now, you may open `localhost:8888` in your browser to access Jupyterlab notebooks. Note that you need to select `Haskell` kernel when creating a new notebook.

If you have never used Torch library before, you may also want to review this tutorial.

# MNIST Example
Let's take the familiar MNIST example and see how it can be implemented in Hasktorch.
```haskell
Imports
{-# LANGUAGE DeriveAnyClass #-}
{-# LANGUAGE DeriveGeneric #-}
{-# LANGUAGE MultiParamTypeClasses #-}
{-# LANGUAGE RecordWildCards #-}
{-# LANGUAGE ScopedTypeVariables #-}

import Control.Exception.Safe
  ( SomeException (..),
    try,
  )
import Control.Monad ( forM_, when, (<=<) )
import Control.Monad.Cont ( ContT (..) )
import GHC.Generics
import Pipes hiding ( (~>) )
import qualified Pipes.Prelude as P
import Torch
import Torch.Serialize
import Torch.Typed.Vision ( initMnist )
import qualified Torch.Vision as V
import Prelude hiding ( exp )
```
The most notable import is the `Torch` module itself. There are also related helpers such `Torch.Vision` to handle image data. The function `initMnist` has type
```
initMnist :: String -> IO (MnistData, MnistData)
```
The function is loading MNIST train and test datasets, similar to `loadMNIST` from previous posts.

It might be also useful to pay attention to `Pipes` module. It is an alternative to previously used `Streamly`, which also allows building streaming components.

We also import functions from `Control.Monad`, which are useful for IO operations.

Finally, we hide `exp` function in favor of Torch `exp`, which operates on tensors (arrays)1 rather than floating point scalars:
```haskell
Torch.exp :: Tensor -> Tensor
```
# Defining Neural Network Architecture
First we define a neural network data structure that contains trained parameters (neural network weights). In the simplest case, it can be a multilayer perceptron (MLP).
```haskell
data MLP = MLP
  { fc1 :: Linear,
    fc2 :: Linear,
    fc3 :: Linear
  }
  deriving (Generic, Show, Parameterized)
```
This MLP contains three linear layers. Next, we may define a data structure that specifies the number of neurons in each layer:
```haskell
data MLPSpec = MLPSpec
  { i :: Int,
    h1 :: Int,
    h2 :: Int,
    o :: Int
  }
  deriving (Show, Eq)
```
Now, we can define a neural network as a function, similar as we did on Day 5 with a "reversed" composition operator `(~>)`.
```haskell
(~>) :: (a -> b) -> (b -> c) -> a -> c
f ~> g = g. f

mlp :: MLP -> Tensor -> Tensor
mlp MLP {..} =
  -- Layer 1
  linear fc1
  ~> relu

  -- Layer 2
  ~> linear fc2
  ~> relu

  -- Layer 3
  ~> linear fc3
  ~> logSoftmax (Dim 1)
```
We finish by a (log) softmax layer reducing the tensor's dimension 1 (Dim 1). Derivatives of linear, relu, and logSoftmax are already handled by Torch library.

# Initial Weights
How do we generate initial random weights? As you may remember from Day 5, we could create a function such as this one:
```haskell
randNetwork = do
  let [i, h1, h2, o] = [784, 64, 32, 10]
  fc1 <- randLinear (Sz2 i h1)
  fc2 <- randLinear (Sz2 h1 h2)
  fc3 <- randLinear (Sz2 h2 o)
  return $
     MLP {  fc1 = fc1
          , fc2 = fc2
          , fc3 = fc3
          }
```
In our example we do almost the same, except we benefit from applicative functors and `Randomizable`.
```haskell
instance Randomizable MLPSpec MLP where
  sample MLPSpec {..} =
    MLP
      <$> sample (LinearSpec i h1)
      <*> sample (LinearSpec h1 h2)
      <*> sample (LinearSpec h2 o)
```
We say above that `MLP` is an instance of the `Randomizable` typeclass, parametrized by `MLPSpec`. All we needed to define this instance was to implement a `sample` function. To generate initial MLP weights, later we can simply write
```haskell
let spec = MLPSpec 784 64 32 10
net <- sample spec
```
# Train Loop
The core of the neural network training is `trainLoop`, which enables a single training "epoch". Let us first inspect its type signature.
```haskell
trainLoop :: Optimizer o => MLP -> o -> ListT IO (Tensor, Tensor) -> IO MLP
```
This signifies that the function accepts an initial neural network configuration, an optimizer, and a dataset. The optimizer can be a gradient descent (GD), Adam, or other optimizer. The result of the function is a new MLP configuration, as a result of IO call. IO is necessary for instance if we want to print the loss after each iteration. Now, let's take a look at the implementation:

trainLoop model optimizer = P.foldM step begin done. enumerateData
First, we enumerate the dataset with `enumerateData`. Then, we iterate over (fold) the batches. The step function is an analogy to a `step` in the gradient descent algorithm:
```haskell
  where
    step :: MLP -> ((Tensor, Tensor), Int) -> IO MLP
    step model ((input, label), iter) = do
      let loss = nllLoss' label $ mlp model input
      -- Print loss every 50 batches
      when (iter `mod` 50 == 0) $ do
        putStrLn $ "Iteration: " ++ show iter ++ " | Loss: " ++ show loss
      (newParam, _) <- runStep model optimizer loss 1e-3
      return newParam
```
We calculate a negative log likelihood loss `nllLoss'` between the ground truth label and the output of our MLP. Note that `model` is the parameter, i.e. weights of the MLP network. Then, we take advantage of the iteration number `iter` to print the loss every 50 iterations. Finally, we perform a gradient descent step using our optimizer via `runStep :: ... => model -> optimizer -> Loss -> LearningRate -> IO (model, optimizer)` and keep only new model `newParam`. The learning rate here is `1e-3`, but can be eventually changed.

The `done` function is (trivial in this case) finalization of `foldM` iterations over the MLP model and `begin` are the initial weights (we use `pure` to satisfy the type `m x` requirement).
```haskell
    done = pure
    begin = pure model
```
# Putting It All Together
The remaining part is simple. We load the data into batches, specify the number of neurons in our MLP, choose an optimizer, and initialize the random weights.
```haskell
main = do
  (trainData, testData) <- initMnist "data"
  let trainMnist = V.MNIST {batchSize = 256, mnistData = trainData}
      testMnist = V.MNIST {batchSize = 1, mnistData = testData}
      spec = MLPSpec 784 64 32 10
      optimizer = GD
  net <- sample spec
```
Then, we train the network for 5 epochs:
```haskell
  net' <- foldLoop net 5 $ \model _ ->
      runContT (streamFromMap (datasetOpts 2) trainMnist) $ trainLoop model optimizer. fst
```
Finally, we may examine the model on test images
```haskell
  forM_ [0 .. 10] $ displayImages net' <=< getItem testMnist
```
For this purpose may use a function such as
```haskell
displayImages :: MLP -> (Tensor, Tensor) -> IO ()
displayImages model (testImg, testLabel) = do
  V.dispImage testImg
  putStrLn $ "Model        : " ++ (show. argmax (Dim 1) RemoveDim. exp $ mlp model testImg)
  putStrLn $ "Ground Truth : " ++ show testLabel
```
# Running
```bash
Iteration: 0 | Loss: Tensor Float []  12.3775   
Iteration: 50 | Loss: Tensor Float []  1.0952   
Iteration: 100 | Loss: Tensor Float []  0.5626   
Iteration: 150 | Loss: Tensor Float []  0.6660   
Iteration: 200 | Loss: Tensor Float []  0.4771   
Iteration: 0 | Loss: Tensor Float []  0.5012   
Iteration: 50 | Loss: Tensor Float []  0.4058   
Iteration: 100 | Loss: Tensor Float []  0.3095   
Iteration: 150 | Loss: Tensor Float []  0.4237   
Iteration: 200 | Loss: Tensor Float []  0.3433   
Iteration: 0 | Loss: Tensor Float []  0.3671   
Iteration: 50 | Loss: Tensor Float []  0.3206   
Iteration: 100 | Loss: Tensor Float []  0.2467   
Iteration: 150 | Loss: Tensor Float []  0.3420   
Iteration: 200 | Loss: Tensor Float []  0.2737   
Iteration: 0 | Loss: Tensor Float []  0.3054   
Iteration: 50 | Loss: Tensor Float []  0.2779   
Iteration: 100 | Loss: Tensor Float []  0.2161   
Iteration: 150 | Loss: Tensor Float []  0.2933   
Iteration: 200 | Loss: Tensor Float []  0.2289   
Iteration: 0 | Loss: Tensor Float []  0.2693   
Iteration: 50 | Loss: Tensor Float []  0.2530   
Iteration: 100 | Loss: Tensor Float []  0.1979   
Iteration: 150 | Loss: Tensor Float []  0.2616   
Iteration: 200 | Loss: Tensor Float []  0.1986   
              
              
              
              
   #%%*****   
      ::: %   
         %:   
        :%    
        #:    
       :%     
       %.     
      #=      
     :%.      
     =#       
Model        : Tensor Int64 [1] [ 7]
Ground Truth : Tensor Int64 [1] [ 7]
              
              
     %%%#     
    %#  %     
    .  #%     
      :%:     
      %+      
     *%       
     %=       
    %%        
    %%%%++%%%=
     ==%%=.   
              
              
Model        : Tensor Int64 [1] [ 2]
Ground Truth : Tensor Int64 [1] [ 2]
              
              
        .-    
        =     
        %     
       .#     
       =:     
       @      
       #      
      ++      
      %:      
      %       
              
              
Model        : Tensor Int64 [1] [ 1]
Ground Truth : Tensor Int64 [1] [ 1]
              
              
       %.     
      *%-     
     %%%%#    
    :%%+:%-   
    %%   -%.  
    %    .@+  
    %    %%.  
    %   #%*   
    %%%%%%    
    :%%%-     
              
              
Model        : Tensor Int64 [1] [ 0]
Ground Truth : Tensor Int64 [1] [ 0]
              
              
              
     =    +   
     %    %   
    +.    %   
    %    %:   
    +    %    
    %--=*%    
     :: +%    
        =%    
        =%    
        *     
              
Model        : Tensor Int64 [1] [ 4]
Ground Truth : Tensor Int64 [1] [ 4]
              
              
              
        %@    
        @:    
       =@     
       @%     
       @      
      :@      
      %#      
      @       
      @       
      +       
              
Model        : Tensor Int64 [1] [ 1]
Ground Truth : Tensor Int64 [1] [ 1]
              
              
              
     %     %  
    %     %   
   +#    -+   
   +%*::*%    
    :%==%+    
        %     
       ++     
       %      
       %-+    
       *      
              
Model        : Tensor Int64 [1] [ 4]
Ground Truth : Tensor Int64 [1] [ 4]
              
              
              
      +       
     %%+      
    .%*%%     
    -: *%     
    -#-%%.    
     %% =#    
         %    
         .%   
          #.  
           %  
              
Model        : Tensor Int64 [1] [ 9]
Ground Truth : Tensor Int64 [1] [ 9]
              
              
         ..=. 
      .%%%%%% 
     ::%+:    
    %         
   %          
   %=         
   %%%%%%+    
     :%%%%    
      %%%%    
       %#     
              
              
Model        : Tensor Int64 [1] [ 6]
Ground Truth : Tensor Int64 [1] [ 5]
              
              
              
              
      +%%%#   
    +%*  .%%  
   :%.  .#%+  
    %@%%%%*   
       +%-    
      -%#     
      %%      
     %%       
     %=       
     @        
Model        : Tensor Int64 [1] [ 9]
Ground Truth : Tensor Int64 [1] [ 9]
              
              
       ==:    
     %%**%%   
    .%    %:  
    *-    +#  
    %     :#  
    #     :#  
   -#     +#  
   -#    .%   
    #   +%:   
    #%%%%=    
              
              
Model        : Tensor Int64 [1] [ 0]
Ground Truth : Tensor Int64 [1] [ 0]
```
See the complete project on Github. For suggestions about the content feel free to open a new issue.

# Summary
Today we have learned the basics of Hasktorch library. The most important is that the principles from our previous days still apply. Therefore, the transition to the new library was quite straightforward. With a few minor changes, this example could be run on a graphics processing unit accelerator.

Further Reading
Hasktorch:

Hasktorch tutorial https://hasktorch.github.io/tutorial/02-tensors.html
Hasktorch examples https://github.com/hasktorch/hasktorch/tree/master/examples
Hasktorch documentation http://hasktorch.org/docs.html
Applicative functors http://learnyouahaskell.com/functors-applicative-functors-and-monoids