Neural Style Transfer
Neural Style transfer is a techniques that uses deep learning to generate a new image that merges a content image (C) and a Style image(S)
Here is the animation of the generated image getting updated through the iterations
This uses the concept of “Transfer learning” where a deep neural network trained for a different application can be used as a base network for traning a new application. The rationale is that a pre-trained model on a very large dataset like ImageNet would have already learnt the high level (objects, patterns) and low level (edges, corners, shades) features that are essential for detecting various objects in the new task.
This repository implements the original NST paper by Gatys et al (2015). Here we will use VGG-19 pre-trained network for this task.
How to select layer l for computing the cost ?
Eventhough G is initialized with a noise image (N) in the beginning, usually it is also set to resemble a percentage of C (Weighted average). A Noise Ratio = 0.6 is a good starting point
- when l is selected from shallow layers, G will resemble C too much
- when l is selected from very deep layers, G wil hardly have resemblance to C
- for “visually pleasing” results, l is chosen somewhere in the middle layers of the model to have the right blend of C in G
Here, the activations from hidden layer conv4_2 are selected for comparing C with G
NST Algorithm
- Create a tf Interactive Session
- Initializations
- Select Content image
- Select Style image
- Initialize Generated image = Noise + Content image
- Load the pre-trained VGG-19 model
- Build the Tensorflow graph
- Compute content cost by passing C as input to the model
- Compute style cost from all layers, by passing S as input to the model
- Compute total cost
- define optimizer and learning rate
- Initialize the Tensorflow graph and update the Generated image for every epoch
Content Cost
The goal is we want the generated image (G) to look like the content image (C). So a particular hidden layer’s activation output is chosen to represent the “Content” of an image.
Steps to compute Content Cost:
- Input C to the VGG-19 model, forward-propagate and get the content activation on layer l, a(C)
- Input G to the VGG-19 model, forward-propagate and get the content activation on layer l, a(G)
- Compute content cost using
Where n_H - height , n_W - Width and n_C - number of channels in the chosen hidden layer , l
Style Cost
Style of an image is defined by how correlated are the activations between different layers of an image. The degree of correlation between features in a layer measures how similar or different their styles are.
A “Gram Matrix”, also called a Style Matrix, computes the correlation of features in an activation and is given by G_A = A A^T.
Steps to compute the Style Cost:
- Get a(S) on layer l and compute Gram matrix for Style image Gram_S
- Get a(G) on layer l and compute Gram matrix for Generated image Gram_G
- the style cost for layer l is defined to be the distance between the Gram matrices
-
Aggregate style cost over multiple layers
Instead of comparing style from just one layer, using multiple layers will capture style from shallow layers (detailed features) as well as deeper layers (high level features). Each layer’s contribution can be weighted by a factor lambda
Total Cost
The total cost function combines both the Content cost as well as the Style cost functions
Execution instructions:
Download the pre-trained model VGG-19 from here and paste it under folder “pretrained-model”.
You can use -h to display the help section of the application (example: python nst_main.py -h)
Basic usage - python nst_main.py
.
Options parameters
-
To use your own images, (requirement: (WIDTH = 400, HEIGHT = 300) copy the content and style images in folder “images” and use them as
python nst_main.py --content_image_filename louvre_small.jpg --style_image_filename style1.jpg
-
To change the number of iterations:
python nst_main.py --epochs 200
-
To print cost and save generated images for every iteration:
python nst_main.py --print_every 20
-
to set the learning rate for optimizer
python nst_main.py --learning_rate 2.0
-
to set the weights for content cost (alpha) and style cost (beta)
python nst_main.py --alpha 10 --beta 40
Intermediate output images are saved in “output” folder
References:
The Neural Style Transfer algorithm was due to Gatys et al. (2015). Harish Narayanan and Github user “log0” also have highly readable write-ups from which we drew inspiration. The pre-trained network used in this implementation is a VGG network, which is due to Simonyan and Zisserman (2015). Pre-trained weights were from the work of the MathConvNet team.
- Leon A. Gatys, Alexander S. Ecker, Matthias Bethge, (2015). A Neural Algorithm of Artistic Style
- Harish Narayanan, Convolutional neural networks for artistic style transfer.
- Log0, TensorFlow Implementation of “A Neural Algorithm of Artistic Style”.
- Karen Simonyan and Andrew Zisserman (2015). Very deep convolutional networks for large-scale image recognition
- MatConvNet.
This project was completed as part of “Convolutional Neural Networks” course by Coursera and deeplearning.ai (part of Deep Learning Specialization taught by Prof. Andrew Ng)