Deep Learning


Mahmood Amintoosi

m.amintoosi @ hsu.ac.ir

پاییز ۹۸

Source book

Deep Learning with Python,
Github: Jupyter Notebooks

What is deep learning?

Google Street-View (and ReCaptchas)

House Numbers

Better than Human

Machine learning vs. Classical programming

Machine learning: a new programming paradigm

Why Computer Vision is difficult?

How Computer see the above picture?

Deep Learning

  • Neural Networks
  • Multiple layers
  • Fed with lots of Data


  • 1980+ : Lots of enthusiasm for NNs
  • 1995+ : Disillusionment = A.I. Winter (v2+)
  • 2005+ : Stepwise improvement : Depth
  • 2010+ : GPU revolution : Data

Who is involved

Google Hinton (Toronto)
Facebook LeCun (NYC)
Universities Bengio (Montreal)
Baidu Ng (Stanford)

Andrew Ng:

“AI is the new electricity.”

2011, Image Classification

ImageNet Karpathy ImageNet challenge was difficult at the time, consisting of classifying highresolution color images into 1,000 different categories after training on 1.4 million images

Deep Learning started to beat other approaches...

  • In 2011, Dan Ciresan from IDSIA began to win academic image-classification competitions with GPU-trained deep neural networks
  • In 2011, the top-five accuracy of the winning model, based on classical approaches to computer vision, was only 74.3%.
  • In 2012, a team led by Alex Krizhevsky and advised by Geoffrey Hinton was able to achieve a top-five accuracy of 83.6%—a significant breakthrough
  • By 2015, the winner reached an accuracy of 96.4%, and the classification task on ImageNet was considered to be a completely solved problem

What makes deep learning different?

It completely automates what used to be the most crucial step in a machine-learning workflow:
feature engineering

Why deep learning? Why now?

In general, three technical forces are driving advances:

  1. Hardware
  2. NVIDIA GPUs, Google TPUs
  3. Datasets and benchmarks
  4. Flickr, YouTube videos and Wikipedia
  5. Algorithmic advances
    • Better activation functions
    • Better weight-initialization schemes
    • Better optimization schemes

Before we begin: the mathematical building blocks of neural networks

We will discuss:

  • A first example of a neural network
  • Tensors and tensor operations
  • How neural networks learn via backpropagation and gradient descent

We will use Python in examples

Python Data Science Handbook. Essential Tools for Working with Data by: Jake VanderPlas

A first look at a neural network

Digit Classification
		import keras
		from keras.datasets import mnist
		(train_images, train_labels), (test_images, test_labels) = mnist.load_data()
		from keras import models
		from keras import layers

		network = models.Sequential()
		network.add(layers.Dense(512, activation='sigmiod', input_shape=(28 * 28,)))
		network.add(layers.Dense(10, activation='sigmiod'))
		train_images = train_images.reshape((60000, 28 * 28))
		train_images = train_images.astype('float32') / 255

		test_images = test_images.reshape((10000, 28 * 28))
		test_images = test_images.astype('float32') / 255
		from keras.utils import to_categorical

		train_labels = to_categorical(train_labels)
		test_labels = to_categorical(test_labels)
		network.fit(train_images, train_labels, epochs=5, batch_size=128)

Compilation step

  • An optimizer—The mechanism through which the network will update itself based on the data it sees and its loss function.
  • A loss function—How the network will be able to measure its performance on the training data, and thus how it will be able to steer itself in the right direction.
  • Metrics to monitor during training and testing—Here, we’ll only care about accuracy (the fraction of the images that were correctly classified)
Online Documentation: Keras

Data representations for neural networks


Don’t confuse a 5D vector with a 5D tensor! A 5D vector has only one axis and has five dimensions along its axis, whereas a 5D tensor has five axes (and may have any number of dimensions along each axis).

Dimensionality can denote either the number of entries along a specific axis (as in the case of our 5D vector) or the number of axes in a tensor (such as a 5D tensor), which can be confusing at times. In the latter case, it’s technically more correct to talk about a tensor of rank 5 (the rank of a tensor being the number of axes), but the ambiguous notation 5D tensor is common regardless.

2.2.6 Manipulating tensors in Numpy

my_slice = train_images[:, 14:, 14:]

2.2.7 The notion of data batches

batch = train_images[128 * n:128 * (n + 1)]

2.2.8 Real-world examples of data tensors

  1. Vector data—2D tensors of shape
  2. (samples, features)
  3. Timeseries data or sequence data—3D tensors of shape
  4. (samples, timesteps, features)
  5. Images—4D tensors of shape
  6. (samples, height, width, channels) or
    (samples, channels, height, width)
  7. Video—5D tensors of shape
  8. (samples, frames, height, width, channels) or
    (samples, frames, channels, height, width)

The gears of neural networks: tensor operations

  1. Element-wise operations
  2. Broadcasting
  3. Tensor dot
  4. Tensor reshaping

Tensor Operations

import numpy as np
x = np.random.random((3, 2))
y = np.ones((2,))/2
z = np.maximum(x, y)
z = x+y
z = x*y

A geometric interpretation of deep learning

The engine of neural networks: gradient-based optimization

  1. What’s a derivative?
  2. Derivative of a tensor operation: the gradient
  3. Stochastic gradient descent
  4. Chaining derivatives: the Backpropagation algorithm

Intro to optimization in deep learning

  1. Intro to optimization in deep learning: Gradient Descent
  2. Intro to optimization in deep learning: Momentum, RMSProp and Adam
  3. Intro to optimization in deep learning: Busting the myth about batch normalization
  4. Adam — latest trends in deep learning optimization

Various Gradient Descent Algorithms

Stochastic Gradient Descent

TensorFlow Operations

Auto Gradient in TF2
		import tensorflow as tf
		x = tf.constant(3.0)
		with tf.GradientTape(persistent=True) as g:
		  y = x * x
		  z = y * y
		dy_dx = g.gradient(y, x)  # 6.0
		dz_dx = g.gradient(z, x)  # 108.0 (4*x^3 at x = 3)
		dz_dy = g.gradient(z, y)  # 18.0 (2*y at y = 9)
		del g  # Drop the reference to the tape
tf.Tensor(6.0, shape=(), dtype=float32)
tf.Tensor(108.0, shape=(), dtype=float32)
tf.Tensor(18.0, shape=(), dtype=float32)

Understanding convolutional neural networks

  1. Convolution arithmetic tutorial
  2. Machine Learning and AI - Bangalore Chapter
  3. Counting No. of Parameters in Deep Learning Models by Hand
5.1 - Introduction to convnets
MNIST Classification (Included with Keras)
Overall Model:
MNIST Classification, TensorFlow Code
import keras
from keras import layers
from keras import models

model = models.Sequential()
model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(layers.Dense(64, activation='relu'))
model.add(layers.Dense(10, activation='softmax'))
Number of Parameters
Layer (type)                 Output Shape              Param #   
conv2d_1 (Conv2D)            (None, 26, 26, 32)        320       
max_pooling2d_1 (MaxPooling2 (None, 13, 13, 32)        0         
conv2d_2 (Conv2D)            (None, 11, 11, 64)        18496     
max_pooling2d_2 (MaxPooling2 (None, 5, 5, 64)          0         
conv2d_3 (Conv2D)            (None, 3, 3, 64)          36928     
flatten_1 (Flatten)          (None, 576)               0         
dense_1 (Dense)              (None, 64)                36928     
dense_2 (Dense)              (None, 10)                650       
Total params: 93,322
Overall Architecture of a sample CNN
More about architecture and number of parameters:
Source: Counting No. of Parameters in Deep Learning Models by Hand
Persian Digits Classification (Not included with Keras)
import keras
from keras import layers
from keras import models

model = models.Sequential()
model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(layers.Dense(64, activation='relu'))
model.add(layers.Dense(10, activation='softmax'))	
5.2 - Using convnets with small datasets
Classify Dogs vs Cats (Not included with Keras)
5.2 - Using convnets with small datasets
Classify Dogs vs Cats (Building from scrach)
  • Download Images from Kaggle
  • Download Images from fastai 845MB, (need some manipulation)
5.2 - Using convnets with small datasets
Classify Dogs vs Cats (Building from scrach)
model = models.Sequential()
model.add(layers.Conv2D(32, (3, 3), activation='relu',
                        input_shape=(150, 150, 3)))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(128, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(128, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Dense(512, activation='relu'))
model.add(layers.Dense(1, activation='sigmoid'))			
There are various architectures of CNNs available
LeNet, AlexNet, VGGNet, GoogLeNet, ResNet, ZFNet
VGG16 Architecture
VGG16 is a convolutional neural network model proposed by K. Simonyan and A. Zisserman from the University of Oxford in the paper “Very Deep Convolutional Networks for Large-Scale Image Recognition”. The model achieves 92.7% top-5 test accuracy in ImageNet, which is a dataset of over 14 million images belonging to 1000 classes. VGG16 was trained for weeks and was using NVIDIA Titan Black GPU’s.
ML vs DL
Some Outputs
Our Outputs
Our Outputs

Neural style transfer

Generative Adversarial Networks

Adversarial T-Shirts: Researchers Designed T-Shirts That Can Fool Object Detectors: source
Deepfake Detection Challenge