Basic CNN Architecture: Explaining 5 Layers of Convolutional Neural Network [US]

A CNN (Convolutional Neural Network) is a type of deep learning neural network that uses a combination of convolutional and subsampling layers to learn features from large sets of data. It is commonly used for image recognition and classification tasks. The convolutional layers apply filters to the input data, and subsampling layers reduce the input data size. Convolutional Neural Network architecture aims to learn features from the data that can be used to classify or detect objects in the input. Below are the 5 CNN layers explained.

Enrol for the Machine Learning Course from the World’s top Universities. Earn Master, Executive PGP, or Advanced Certificate Programs to fast-track your career.

5 Layers of a Convolutional Neural Network

1. Convolutional Layer:

This layer performs the convolution operation on the input data, which extracts various features from the data. 

Convolutional Layers in a CNN model architecture are one of the most vital components of CNN layers. These layers are responsible for extracting features from the input data and forming the basis for further processing and learning.

A convolutional layer consists of a set of filters (also known as kernels) applied to the input data in a sliding window fashion. Each filter extracts a specific set of features from the input data based on the weights associated with it. 

The number of filters used in the convolutional layer is one of the key hyperparameters in the architecture. It is determined based on the type of data being processed as well as the desired accuracy of the model. Generally, more filters will result in more features extracted from the input data, allowing for more complex network architectures to understand the data better.

The convolution operation consists of multiplying each filter with the data within the sliding window and summing up the results. This operation is repeated for all the filters, resulting in multiple feature maps for a single convolutional layer. These feature maps are then used as input for the following layers, allowing the network to learn more complex features from the data.

Convolutional layers are the foundation of deep learning architectures and are used in various applications, such as image recognition, natural language processing, and speech recognition. By extracting the most critical features from the input data, convolutional layers enable the network to learn more complex patterns and make better predictions.

2. Pooling Layer:

This layer performs a downsampling operation on the feature maps, which reduces the amount of computation required and also helps to reduce overfitting.

The pooling layer is a vital component of the architecture of CNN. It is typically used to reduce the input volume size while extracting meaningful information from the data. Pooling layers are usually used in the later stages of a CNN, allowing the network to focus on more abstract features of an image or other type of input. The pooling layer operates by sliding a window over the input volume and computing a summary statistic for the values within the window.

Common statistics include taking the maximum, average, or sum of the values within the window. This reduces the input volume’s size while preserving important information about the data.

The pooling layer is also typically used to introduce spatial invariance, meaning that the network will produce the same output regardless of the location of the input within the image. This allows the network to inherit more general features about the image rather than simply memorizing its exact location.

3. Activation Layer:

This layer adds non-linearity to the model by applying a non-linear activation function such as ReLU or tanh.

An activation layer in a CNN is a layer that serves as a non-linear transformation on the output of the convolutional layer. It is a primary component of the network, allowing it to learn complex relationships between the input and output data.

The activation layer can be thought of as a function that takes the output of the convolutional layer and maps it to a different set of values. This enables the network to learn more complex patterns in the data and generalize better.

Common activation functions used in CNNs include ReLu (Rectified Linear Unit), sigmoid, and tanh. Each activation function serves a different purpose and can be used in different scenarios.

ReLu is the most commonly used activation function in most convolutional networks. It is a non-linear transformation that outputs 0 for all negative values and the same value as the input for all positive values. This allows the network to imbibe more complex patterns in the data.

Sigmoid is another commonly used activation function, which outputs values between 0 and 1 for any given input. This helps the network to understand complex relationships between the input and output data but is more computationally expensive than ReLu.

Tanh is the least commonly used activation function, which outputs values between -1 and 1 for any given input.

The activation layer is an essential component of the CNN, as it prevents linearity and enhances non-linearity in the output. Choosing the right activation function for the network is essential, as each activation function serves a different purpose and can be used in different scenarios. Selecting a suitable activation function can lead to better performance of the CNN structure.

4. Fully Connected Layer:

This layer connects each neuron in one layer to every neuron in the next layer, resulting in a fully-connected network.

A fully connected layer in a CNN is a layer of neurons connected to every neuron in the previous layer in the network. This is in contrast to convolutional layers, where neurons are only connected to a subset of neurons in the previous layer based on a specific pattern.

By connecting every neuron in one layer to every neuron in the next layer, the fully connected layer allows information from the previous layer to be shared across the entire network, thus providing the opportunity for a more comprehensive understanding of the data.

Fully connected layers in CNN are typically used towards the end of a CNN model architecture, after the convolutional layers and pooling layers, as they help to identify patterns and correlations that the convolutional layers may not have recognized.

Additionally, fully connected layers are used to generate a non-linear decision boundary that can be used for classification. In conclusion, fully connected layers are an integral part of any CNN and provide a powerful tool for identifying patterns and correlations in the data.

5. Output Layer:

This is the final layer of the network, which produces the output labels or values.

The output layer of a CNN is the final layer in the network and is responsible for producing the output. It is the layer that takes the features extracted from previous layers and combines them in a way that allows it to produce the desired output.

A fully connected layer is typically used when the output is a single value, such as a classification or regression problem. A single neuron layer is generally used when the outcome is a vector, such as a probability distribution.

A softmax activation function is used when the output is a probability distribution, such as a probability distribution over classes. The output layer of a CNN is also responsible for performing the necessary computations to obtain the desired output. This includes completing the inputs’ necessary linear or non-linear transformations to receive the output required.

Finally, the output layer of a CNN can also be used to perform regularization techniques, such as dropout or batch normalization, to improve the network’s performance.


The CNN architecture is a powerful tool for image and video processing tasks. It is a combination of convolutional layers, pooling layers, and fully connected layers. It allows for extracting features from images, videos, and other data sources and can be used for various tasks, such as object recognition, image classification, and facial recognition. Overall, this type of architecture is highly effective when applied to suitable functions and datasets.

Acquire a proficient skill set in ML and DL with upGrad

With upGrad’s Advanced Certificate Programme in Machine Learning & Deep Learning offered by IIIT-B, you can gain proficiency in Machine Learning and Deep Learning. The program covers the fundamentals of ML and DL, including topics such as supervised and unsupervised learning, linear and logistic regression, convolutional neural networks, reinforcement learning, and natural language processing. You will also learn to build and deploy ML and DL models in Python and TensorFlow and gain practical experience by working on real-world projects. 

This course also includes benefits such as: 

  • Mentorship and guidance from industry experts 
  • Placement assistance to help you find the right job
  • An Advanced Certificate from IIIT Bangalore

You can also check out our free courses offered by upGrad in Management, Data Science, Machine Learning, Digital Marketing, and Technology. All of these courses have top-notch learning resources, weekly live lectures, industry assignments, and a certificate of course completion – all free of cost!

What are the libraries in Python which can be used for a CNN?

The libraries in Python which can be used for a CNN include TensorFlow, Keras, PyTorch, Caffe, Theano, Scikit-learn, MxNet, CNTK, OpenCV, and SciPy. These libraries can be used to obtain pre-built modules to create applications and easily implement CNN algorithms.

How many dimensions are there in CNN layers?

Convolutional neural networks contain neurons arranged in 3 dimensions: width, height, and depth. This three-dimensional structure of neurons is present within the convolution layer, which works through the computing process with the given input to deliver the resulting output.

Can I develop a CNN in R?

Yes, CNNs can be developed using both Python and R. With R providing exceptionally detailed libraries, creating a convolution neural network with R is pretty easy.

Want to share this article?

Leave a comment

Your email address will not be published. Required fields are marked *

Our Best Artificial Intelligence Course

Get Free Consultation

Leave a comment

Your email address will not be published. Required fields are marked *

Get Free career counselling from upGrad experts!
Book a session with an industry professional today!
No Thanks
Let's do it
Get Free career counselling from upGrad experts!
Book a Session with an industry professional today!
Let's do it
No Thanks