Transfer Learning -Part -7.0 !! Dense net


In Part 6 Series of the Transfer Learning series we have discussed the Mobile Nets in depth along with hands-on application of these pre-trained neural nets in Keras and PyTorch API’s. The datasets on which these pre-trained model is trained for the ILVRC competition which is held annually and their repository as well as the documentation in order to implement this concept with two API’s namely Keras and PyTorch which is discussed in Part 6 of this series. In this, article we will discuss theoretically about the Densenet and in article 7.1 and 7.2 we will have practical implementation with Keras and PyTorch API respectively. The link of notebook for setting up the along with the article is given below:

For the repository and document please follow below two mentioned links:



1. Introduction

DenseNet is one of the new discoveries in neural networks for visual object recognition. DenseNet is quite similar to ResNet with some fundamental differences. ResNet uses an additive method (+) that merges the previous layer (identity) with the future layer, whereas DenseNet concatenates (.) the output of the previous layer with the future layer. DenseNet (Dense Convolutional Network) is reviewed. This is the paper in 2017 CVPR which got Best Paper Award with over 2000 citations. It is jointly invented by Cornwell University, Tsinghua University and Facebook AI Research (FAIR).

2. Need For DenseNets?

DenseNet was specially developed to improve accuracy caused by the vanishing gradient in high-level neural networks due to the long distance between input and output layers & the information vanishes before reaching its destination.

Fig .1. DenseNet Architecture VS ResNet Architecture.

This image shows a 5-layer dense block with a growth rate of k = 4 and the standard ResNet structure, we have a capital L number of layers, In a typical network with L layers, there will be L connections, that is, connections between the layers. However, in a DenseNet, there will be about L
and L plus one by two connections L(L+1)/2. hENCE dense net, we have less number of layers than the other model, so here we can train more than 100 layers of the model very easily by using this technique.

3. DenseNet Architecture

In this section we will discuss the various architecture of Densenet.

3.1 Building Block of Densenet

As we all know ConvNet, input image goes through multiple convolution layer to extract high-level features whereas in ResNet, identity mapping is proposed to increase the gradient propagation i.e Element-wise addition is employed.

In DenseNet, each layer obtains additional inputs from all preceding layers and passes on its own feature-maps to all subsequent layers. Concatenation is used and each layer is receiving a “collective knowledge” from all preceding layers. Since each layer receives feature maps from all preceding layers, network can be thinner and compact, i.e. number of channels can be fewer. The growth rate k is the additional number of channels for each layer.

So, it have higher computational efficiency and memory efficiency. The following figure shows the concept of concatenation during forward propagation:

Fig. 2. Concatenation during Forward Propagation

As shown in Fig.2. n output of the previous layer acts as an input of the second layer by using composite function operation. This composite operation consists of the convolution layer, pooling layer, batch normalization, and non-linear activation layer. These connections mean that the network has L(L+1)/2 direct connections. L is the number of layers in the architecture.

Be it adding or concatenating, the grouping of layers by the above equation is only possible if feature map dimensions are the same. What if dimensions are different? The DenseNet is divided into DenseBlocks where a number of filters are different, but dimensions within the block are the same. Transition Layer applies batch normalization using downsampling; it’s an essential step in CNN.

Fig. 3. Densely Connected Convolutional Networks

The number of filters changes between the DenseBlocks, increasing the dimensions of the channel. The growth rate (k) helps in generalizing the l-th layer. It controls the amount of information to be added to each layer as shown in below equation.

Fig.4. Full Architechture Densely Connected Convolutional Networks

The DenseNet has different versions, like DenseNet-121, DenseNet-160, DenseNet-201, etc. The numbers denote the number of layers in the neural network. The number 121 is computed as follows:

No- of layers = 5+(6+12+24+16)

5- Convolution and Pooling

3- Transition Layer ( 6+12+24)

1- Classification Layer (16)

2- DenseBlock (1*1 and 3*3 conv)

3.2. Basic DenseNet Composition Layer

For each composition layer, Pre-Activation Batch Norm (BN) and ReLU, then 3×3 Conv are done with output feature maps of k channels, say for example, to transform x0, x1, x2, x3 to x4.

Fig. 5. DenseNet Composition Layer

3.3. DenseNet-B

Here “B” represents the BottleNeck layers To reduce the model complexity and size, BN-ReLU-1×1 Conv is done before BN-ReLU-3×3 Conv.

Fig. 6. DenseNet-B

3.4. Multiple Dense Blocks with Transition Layers

Here, 1×1 Conv followed by 2×2 average pooling are used as the transition layers between two contiguous dense blocks. Feature map sizes are the same within the dense block so that they can be concatenated together easily. At the end of the last dense block, a global average pooling is performed and then a softmax classifier is attached.

Fig. 7. Multiple Dense Blocks

3.5. DenseNet-BC (Further Compression)

If a dense block contains m feature-maps, The transition layer generate θm output feature maps, where 0<θ≤1 is referred to as the compression factor.When θ=1, the number of feature-maps across transition layers remains unchanged. DenseNet with θ<1 is referred as DenseNet-C, and θ=0.5 in the experiment.

When both the bottleneck and transition layers with  θ<1 are used, the model is referred as DenseNet-BC. Finally, DenseNets with/without B/C and with different L layers and k growth rate are trained.

4. Advantages of theDensenet.

  • Parameter efficiency — Every layer adds only a limited number of parameters- for e.g. only about 12 kernels are learned per layer
  • Implicit deep supervision — Improved flow of gradient through the network- Feature maps in all layers have direct access to the loss function and its gradient.
  • More Diversified Features — Since each layer in DenseNet receive all preceding layers as input, more diversified features and tends to have richer patterns.
  • Maintains Low Complexity Features — In Standard ConvNet, classifier uses most complex features. In DenseNet, classifier uses features of all complexity levels. It tends to give more smooth decision boundaries. It also explains why DenseNet performs well when training data is insufficient.

5. DenseNet Terminology

Growth rate — This determines the number of feature maps output into individual layers inside dense blocks.

Dense connectivity — By dense connectivity, we mean that within a dense block each layer gets us input feature maps from the previous layer as seen in this figure.

Composite functions — So the sequence of operations inside a layer goes as follows. So we have batch normalization, followed by an application of Relu, and then a convolution layer (that will be one convolution layer)

Transition layers — The transition layers aggregate the feature maps from a dense block and reduce its dimensions. So Max Pooling is enabled here.

In this article we have discussed about the Densenet architecture theoretically in next article i.e. 7.1 and 7.2 we will have hands on experience with Keras and PyTorch API’s.

Stay Tuned !!! Happy Learning :)

Special Thanks:

As we say “Car is useless if it doesn’t have a good engine” similarly student is useless without proper guidance and motivation. I will like to thank my Guru as well as my Idol “Dr. P. Supraja” and “A. Helen Victoria”- guided me throughout the journey, from the bottom of my heart. As a Guru, she has lighted the best available path for me, motivated me whenever I encountered failure or roadblock- without her support and motivation this was an impossible task for me.

Pytorch: Link

Keras: Link

ResNet: Link

Tensorflow: Link

if you have any query feel free to contact me with any of the -below mentioned options:

YouTube : Link



Github Pages:


Contact Me:

Google Form: