in Blog

March 31, 2024

Deep Learning Architecture Examples


Edwin Lisowski

CSO & Co-Founder

Reading time:

14 minutes

As you know from our previous article about machine learning and deep learning, DL is an advanced technology based on neural networks that try to imitate the way the human cortex works. Today, we want to get deeper into this subject. You have to know that neural networks are by no means homogenous. In fact, we can indicate at least six types of neural networks and deep learning architectures that are built on them.

In this article, we are going to show you the most popular and versatile types of deep learning architecture. Soon, abbreviations like RNN, CNN, or DSN will no longer be mysterious.


First of all, we have to state that deep learning architecture consists of deep/neural networks of varying topologies. The general principle is that neural networks are based on several layers that proceed data–an input layer (raw data), hidden layers (they process and combine input data), and an output layer (it produces the outcome: result, estimation, forecast, etc.). Thanks to the development of numerous layers of neural networks (each providing some function), deep learning is now more practical.

It’s a bit like a machine learning framework–it allows you to make more practical use of this technology, accelerates your work, and enables various endeavors without the need to build an ML algorithm entirely from scratch.

When it comes to deep learning, you have various types of neural networks. And deep learning architectures are based on these networks. Today, we can indicate six of the most common deep learning architectures:

  • RNN
  • LSTM
  • GRU
  • CNN
  • DBN
  • DSN
  • Tranformer
  • GAN

Don’t worry if you don’t know these abbreviations; we are going to explain each one of them. Let’s start with the first one.

What Are the Different Deep Learning Architectures

RNN: Recurrent Neural Networks (RNNs)

RNN is one of the fundamental network architectures from which other deep learning architectures are built. RNNs consist of a rich set of deep learning architectures. They can use their internal state (memory) to process variable-length sequences of inputs. Let’s say that RNNs have a memory. Every processed information is captured, stored, and utilized to calculate the final outcome. This makes them useful when it comes to, for instance, speech recognition[1]. Moreover, the recurrent network might have connections that feedback into prior layers (or even into the same layer). This feedback allows them to maintain the memory of past inputs and solve problems in time.

RNNs are very useful when it comes to fields where the sequence of presented information is key. They are commonly used in NLP (i.a. chatbots), speech synthesis, and machine translations.

Currently, we can indicate two types of RNN:

  • Bidirectional RNN: They work two ways; the output layer can get information from past and future states simultaneously[2].
  • Deep RNN: Multiple layers are present. As a result, the DL model can extract more hierarchical information.

speech recognition

LSTM: Long Short-Term Memory

It’s also a type of RNN. However, LSTM has feedback connections. This means that it can process not only single data points (such as images) but also entire sequences of data (such as audio or video files)[3].

LSTM derives from neural network architectures and is based on the concept of a memory cell. The memory cell can retain its value for a short or long time as a function of its inputs, which allows the cell to remember what’s essential and not just its last computed value.

A typical LSTM architecture is composed of a cell, an input gate, an output gate, and a forget gate. The cell remembers values over arbitrary time intervals, and these three gates regulate the flow of information into and out of the cell.

  • The input gate controls when new information can flow into the memory.
  • The output gate controls when the information that is contained in the cell is used in the output.
  • The forget gate controls when a piece of information can be forgotten, allowing the cell to process new data.

Today, LSTMs are commonly used in such fields as text compression, handwriting recognition, speech recognition, gesture recognition, and image captioning[4].


This abbreviation stands for Gated Recurrent Unit. It’s a type of LSTM. The major difference is that GRU has fewer parameters than LSTM, as it lacks an output gate[5]. GRUs are used for smaller and less frequent datasets, where they show better performance.

datasets, people, work, computer

CNN: Convolutional Neural Networks (CNNs)

This architecture is commonly used for image processing, image recognition, video analysis, and NLP.

CNN can take in an input image, assign importance to various aspects/objects in the image, and be able to differentiate one from the others[6]. The name ‘convolutional’ derives from a mathematical operation involving the convolution of different functions. CNNs consist of an input and an output layer, as well as multiple hidden layers. The CNN’s hidden layers typically consist of a series of convolutional layers.

Here’s how CNNs work: First, the input is received by the network. Each input (for instance, image) will pass through a series of convolution layers with various filters. The control layer controls how the signal flows from one layer to the other. Next, you have to flatten the output and feed it into the fully connected layer where all the layers of the network are connected with every neuron from a preceding layer to the neurons from the subsequent layer. As a result, you can classify the output.

deep learning architecture, man

DBN: Deep Belief Network

DBN is a multilayer network (typically deep, including many hidden layers) in which each pair of connected layers is a Restricted Boltzmann Machine (RBM). Therefore, we can state that DBN is a stack of RBMs. DBN is composed of multiple layers of latent variables (“hidden units”), with connections between the layers but not between units within each layer[7]. DBNs use probabilities and unsupervised learning to produce outputs. Unlike other models, each layer in DBN learns the entire input. In CNNs, the first layers only filter inputs for basic features, and the latter layers recombine all the simple patterns found by the previous layers. DBNs work holistically and regulate each layer in order.

DBNs can be used i.a. in image recognition and NLP.

DSN: Deep Stacking Network

We saved DSN for last because this deep learning architecture is different from the others. DSNs are also frequently called DCN–Deep Convex Network. DSN/DCN comprises a deep network, but it’s actually a set of individual deep networks. Each network within DSN has its own hidden layers that process data. This architecture has been designed in order to improve the training issue, which is quite complicated when it comes to traditional deep learning models. Thanks to many layers, DSNs consider training, not a single problem that has to be solved but a set of individual problems.

According to a paper “An Evaluation of Deep Learning Miniature Concerning in Soft Computing”[8] published in 2015, “the central idea of the DSN design relates to the concept of stacking, as proposed originally, where simple modules of functions or classifiers are composed first and then they are stacked on top of each other in order to learn complex functions or classifiers.”

Typically, DSNs consist of three or more modules. Each module consists of an input layer, a hidden layer, and an output layer. These modules are stacked one on top of another, which means that the input of a given module is based on the output of prior modules/layers. This construction enables DSNs to learn more complex classification than it would be possible with just one module.

These six architectures are the most common ones in the modern deep learning architecture world. At this point, we should also mention the last, and considered the most straightforward, architecture. Let’s talk for a second about autoencoders.

deep learning, phone



The Transformer is a powerful deep learning architecture that has significantly impacted the field of natural language processing (NLP). It was first introduced in a 2017 paper by Google researchers and has since become a cornerstone in various advanced language models. Unlike traditional models that rely on Recurrent Neural Networks (RNNs) for sequential information extraction, Transformers leverage self-attention mechanisms to understand context and relationships between different elements in a sequence.

Key points about the Transformer architecture include:

  • Self-Attention Mechanism
    Transformers apply self-attention to model relationships between all elements in a sequence, allowing them to capture dependencies regardless of position.
  • Encoder-Decoder Structure
    They consist of an encoder that processes the input sequence and a decoder that generates the output sequence, both utilizing self-attention.
  • Parallel Processing
    Transformers can process input sequences in parallel, enhancing computational efficiency.
  • Evolution of NLP
    Transformers have led to the development of advanced models like BERT, GPT, and LaMDA, which excel in tasks such as language understanding, generation, and translation.
  • Applications
    Transformers are widely used in tasks like machine translation, text generation, question-answering, and more, showcasing their versatility and effectiveness in handling sequential data.

The Transformer’s ability to capture complex relationships in data, its parallel processing capabilities, and its impact on various NLP tasks make it a fundamental architecture in modern deep learning research, driving advancements in language understanding and generation.

Generative Adversarial Networks (GANs)

Generative Adversarial Networks (GANs) are a powerful class of deep learning models used for generative tasks, where they automatically learn and generate new data instances that resemble the original dataset. GANs consist of two primary components:

  1. Generator: The generator network creates new data instances, such as images, based on random input. It aims to generate outputs that are realistic and indistinguishable from real data.
  2. Discriminator: The discriminator network acts as a classifier, distinguishing between real data instances from the original dataset and fake data generated by the generator. It assigns a probability score to each input, indicating the authenticity of the data.

Key points about GANs include:

  • Adversarial Training: GANs operate in a competitive manner where the generator and discriminator are trained simultaneously. The generator aims to produce realistic outputs to fool the discriminator, while the discriminator learns to differentiate between real and generated data.
  • Applications: GANs have diverse applications, including image generation, virtual reality, predictive imagery, text-based image generation, and more. They are particularly useful for tasks requiring the creation of new data based on existing patterns.
  • Conditional GANs: Conditional Generative Adversarial Networks (cGANs) are a type of GAN that generates outputs based on additional auxiliary information, enhancing the control and specificity of the generated data.

GANs have revolutionized generative modeling by enabling the creation of high-quality, realistic data that can be used in various domains such as image synthesis, content creation, and pattern recognition. Their ability to learn complex patterns and generate new data has made them a fundamental tool in the field of deep learning.

Read more: Deep Learning Applications

What Is the Architecture of the Deep Learning Model?

Deep learning models, including various architectures like recurrent neural networks (RNN), convolutional neural networks (CNN), and deep belief networks (DBN), are structured in a specific manner to enable learning from complex data and making predictions or classifications. The architecture of a deep learning model typically consists of several interconnected layers, each serving a specific purpose in processing and transforming input data to generate useful outputs.

Here’s an overview of the architecture components:

Input Layer

This is the initial layer of the deep learning model where raw data is fed into the network. The number of neurons in this layer corresponds to the dimensionality of the input data. Each neuron represents a feature or attribute of the input data.

Hidden Layers

These are intermediate layers between the input and output layers where the actual processing of data occurs. Each hidden layer comprises multiple neurons, and each neuron performs a weighted sum of inputs followed by the application of an activation function. The number of hidden layers and neurons in each layer can vary depending on the complexity of the problem and the architecture of the model.

Output Layer

This is the final layer of the model where the predictions or classifications are generated. The number of neurons in the output layer depends on the nature of the task. For instance, in a binary classification task, there might be one neuron representing each class with a sigmoid activation function to produce probabilities. In a multi-class classification task, there would be one neuron per class with a softmax activation function.

Connections and Weights

Each neuron in a layer is connected to every neuron in the subsequent layer, forming a fully connected or dense network. These connections have associated weights that are learned during the training process. The weights determine the strength of the connections between neurons and are adjusted iteratively to minimize the difference between the model’s predictions and the actual outputs.

Activation Functions

Activation functions introduce non-linearity into the model, enabling it to learn complex patterns and relationships in the data. Common activation functions include sigmoid, tanh, ReLU (Rectified Linear Unit), and softmax.

Loss Function

The loss function measures the difference between the model’s predictions and the actual targets. It serves as the objective function during training, guiding the optimization process to minimize prediction errors. The choice of loss function depends on the nature of the task, such as mean squared error for regression tasks and categorical cross-entropy for classification tasks.

Optimization Algorithm

Optimization algorithms, such as stochastic gradient descent (SGD), Adam, or RMSprop, are used to update the weights of the model iteratively based on the gradients of the loss function with respect to the weights. These algorithms aim to find the optimal set of weights that minimize the loss function.


Overall, the architecture of a deep learning model is designed to efficiently process and learn from complex data, enabling it to make accurate predictions or classifications on unseen examples. The effectiveness of the architecture depends on various factors, including the choice of layers, activation functions, optimization algorithms, and hyperparameters, which are often determined through experimentation and tuning.


Deep Learning Architecture – Autoencoders

Autoencoders are a specific type of feedforward neural network. The general idea is that the input and the output are pretty much the same. What does it mean? Simply put, Autoencoders condense the input into a lower-dimensional code. Based on this, the outcome is produced. In this model, the code is a compact version of the input. One of Autoencoders’ main tasks is to identify and determine what constitutes regular data and then identify the anomalies or aberrations.

Autoencoders comprise three components:

  • Encoder (condenses the input and produces the code)
  • Code
  • Decoder (rebuilds the input using the code)

Autoencoders are mainly used for dimensionality reduction and, naturally, anomaly detection (for instance, frauds). Simplicity is one of their greatest advantages. They are easy to build and train. However, there’s also the other side of the coin. You need high-quality, representative training data. If you don’t, the information that comes out of the Autoencoder can be unclear or biased.

programmer, code, work

Deep Learning Architecture – conclusion

As you can see, although deep learning architectures are, generally speaking, based on the same idea, there are various ways to achieve a goal. That’s why it’s so important to choose deep learning architecture correctly. If you want to find out more about this tremendous technology, get in touch with us. With our help, your organization can benefit from deep learning architecture. Let us show you how!

This article is an updated version of the publication from Jul, 21 2020. 


[1] Wikipedia. Recurrent neural network. URL: Accessed  Jul 21, 2020.

[2] Wikipedia. Bidirectional recurrent neural networks. URL: Accessed  Jul 21, 2020.

[3] Wikipedia. Long short-term memory. URL: Accessed  Jul 21, 2020.

[4] Samaya Madhavan. Deep learning architectures. Jan 25, 2021. URL: Accessed  Jul 21, 2020.

[5] Wikipedia. Gated recurrent unit. URL: Accessed  Jul 21, 2020.

[6] Sumit Saha. A Comprehensive Guide to Convolutional Neural Networks — the ELI5 way. Dec 15, 2018. URL: Accessed  Jul 21, 2020.

[7] Wikipedia. Deep belief network. URL: Accessed  Jul 21, 2020.

[8] Dr. Yusuf Perwej. An Evaluation of Deep Learning Miniature Concerning in Soft Computing. Feb 2015. URL: Accessed  Jul 21, 2020.



Machine Learning