Author:
CSO & Co-Founder
Reading time:
As you know from our previous article about machine learning and deep learning, DL is an advanced technology based on neural networks that try to imitate the way the human cortex works. Today, we want to get deeper into this subject. You have to know that neural networks are by no means homogenous. In fact, we can indicate at least six types of neural networks and deep learning architectures that are built on them.
In this article, we are going to show you the most popular and versatile types of deep learning architecture. Soon, abbreviations like RNN, CNN, or DSN will no longer be mysterious.
First of all, we have to state that deep learning architecture consists of deep/neural networks of varying topologies. The general principle is that neural networks are based on several layers that proceed data–an input layer (raw data), hidden layers (they process and combine input data), and an output layer (it produces the outcome: result, estimation, forecast, etc.). Thanks to the development of numerous layers of neural networks (each providing some function), deep learning is now more practical.
It’s a bit like a machine learning framework–it allows you to make more practical use of this technology, accelerates your work, and enables various endeavors without the need to build an ML algorithm entirely from scratch.
When it comes to deep learning, you have various types of neural networks. And deep learning architectures are based on these networks. Today, we can indicate six of the most common deep learning architectures:
Don’t worry if you don’t know these abbreviations; we are going to explain each one of them. Let’s start with the first one.
RNN is one of the fundamental network architectures from which other deep learning architectures are built. RNNs consist of a rich set of deep learning architectures. They can use their internal state (memory) to process variable-length sequences of inputs. Let’s say that RNNs have a memory. Every processed information is captured, stored, and utilized to calculate the final outcome. This makes them useful when it comes to, for instance, speech recognition[1]. Moreover, the recurrent network might have connections that feedback into prior layers (or even into the same layer). This feedback allows them to maintain the memory of past inputs and solve problems in time.
RNNs are very useful when it comes to fields where the sequence of presented information is key. They are commonly used in NLP (i.a. chatbots), speech synthesis, and machine translations.
Currently, we can indicate two types of RNN:
It’s also a type of RNN. However, LSTM has feedback connections. This means that it can process not only single data points (such as images) but also entire sequences of data (such as audio or video files)[3].
LSTM derives from neural network architectures and is based on the concept of a memory cell. The memory cell can retain its value for a short or long time as a function of its inputs, which allows the cell to remember what’s essential and not just its last computed value.
A typical LSTM architecture is composed of a cell, an input gate, an output gate, and a forget gate. The cell remembers values over arbitrary time intervals, and these three gates regulate the flow of information into and out of the cell.
Today, LSTMs are commonly used in such fields as text compression, handwriting recognition, speech recognition, gesture recognition, and image captioning[4].
GRU
This abbreviation stands for Gated Recurrent Unit. It’s a type of LSTM. The major difference is that GRU has fewer parameters than LSTM, as it lacks an output gate[5]. GRUs are used for smaller and less frequent datasets, where they show better performance.
This architecture is commonly used for image processing, image recognition, video analysis, and NLP.
CNN can take in an input image, assign importance to various aspects/objects in the image, and be able to differentiate one from the others[6]. The name ‘convolutional’ derives from a mathematical operation involving the convolution of different functions. CNNs consist of an input and an output layer, as well as multiple hidden layers. The CNN’s hidden layers typically consist of a series of convolutional layers.
Here’s how CNNs work: First, the input is received by the network. Each input (for instance, image) will pass through a series of convolution layers with various filters. The control layer controls how the signal flows from one layer to the other. Next, you have to flatten the output and feed it into the fully connected layer where all the layers of the network are connected with every neuron from a preceding layer to the neurons from the subsequent layer. As a result, you can classify the output.
DBN is a multilayer network (typically deep, including many hidden layers) in which each pair of connected layers is a Restricted Boltzmann Machine (RBM). Therefore, we can state that DBN is a stack of RBMs. DBN is composed of multiple layers of latent variables (“hidden units”), with connections between the layers but not between units within each layer[7]. DBNs use probabilities and unsupervised learning to produce outputs. Unlike other models, each layer in DBN learns the entire input. In CNNs, the first layers only filter inputs for basic features, and the latter layers recombine all the simple patterns found by the previous layers. DBNs work holistically and regulate each layer in order.
DBNs can be used i.a. in image recognition and NLP.
We saved DSN for last because this deep learning architecture is different from the others. DSNs are also frequently called DCN–Deep Convex Network. DSN/DCN comprises a deep network, but it’s actually a set of individual deep networks. Each network within DSN has its own hidden layers that process data. This architecture has been designed in order to improve the training issue, which is quite complicated when it comes to traditional deep learning models. Thanks to many layers, DSNs consider training, not a single problem that has to be solved but a set of individual problems.
According to a paper “An Evaluation of Deep Learning Miniature Concerning in Soft Computing”[8] published in 2015, “the central idea of the DSN design relates to the concept of stacking, as proposed originally, where simple modules of functions or classifiers are composed first and then they are stacked on top of each other in order to learn complex functions or classifiers.”
Typically, DSNs consist of three or more modules. Each module consists of an input layer, a hidden layer, and an output layer. These modules are stacked one on top of another, which means that the input of a given module is based on the output of prior modules/layers. This construction enables DSNs to learn more complex classification than it would be possible with just one module.
These six architectures are the most common ones in the modern deep learning architecture world. At this point, we should also mention the last, and considered the most straightforward, architecture. Let’s talk for a second about autoencoders.
The Transformer is a powerful deep learning architecture that has significantly impacted the field of natural language processing (NLP). It was first introduced in a 2017 paper by Google researchers and has since become a cornerstone in various advanced language models. Unlike traditional models that rely on Recurrent Neural Networks (RNNs) for sequential information extraction, Transformers leverage self-attention mechanisms to understand context and relationships between different elements in a sequence.
Key points about the Transformer architecture include:
The Transformer’s ability to capture complex relationships in data, its parallel processing capabilities, and its impact on various NLP tasks make it a fundamental architecture in modern deep learning research, driving advancements in language understanding and generation.
Generative Adversarial Networks (GANs) are a powerful class of deep learning models used for generative tasks, where they automatically learn and generate new data instances that resemble the original dataset. GANs consist of two primary components:
Key points about GANs include:
GANs have revolutionized generative modeling by enabling the creation of high-quality, realistic data that can be used in various domains such as image synthesis, content creation, and pattern recognition. Their ability to learn complex patterns and generate new data has made them a fundamental tool in the field of deep learning.
Read more: Deep Learning Applications
Deep learning models, including various architectures like recurrent neural networks (RNN), convolutional neural networks (CNN), and deep belief networks (DBN), are structured in a specific manner to enable learning from complex data and making predictions or classifications. The architecture of a deep learning model typically consists of several interconnected layers, each serving a specific purpose in processing and transforming input data to generate useful outputs.
Here’s an overview of the architecture components:
Input Layer
This is the initial layer of the deep learning model where raw data is fed into the network. The number of neurons in this layer corresponds to the dimensionality of the input data. Each neuron represents a feature or attribute of the input data.
Hidden Layers
These are intermediate layers between the input and output layers where the actual processing of data occurs. Each hidden layer comprises multiple neurons, and each neuron performs a weighted sum of inputs followed by the application of an activation function. The number of hidden layers and neurons in each layer can vary depending on the complexity of the problem and the architecture of the model.
Output Layer
This is the final layer of the model where the predictions or classifications are generated. The number of neurons in the output layer depends on the nature of the task. For instance, in a binary classification task, there might be one neuron representing each class with a sigmoid activation function to produce probabilities. In a multi-class classification task, there would be one neuron per class with a softmax activation function.
Connections and Weights
Each neuron in a layer is connected to every neuron in the subsequent layer, forming a fully connected or dense network. These connections have associated weights that are learned during the training process. The weights determine the strength of the connections between neurons and are adjusted iteratively to minimize the difference between the model’s predictions and the actual outputs.
Activation Functions
Activation functions introduce non-linearity into the model, enabling it to learn complex patterns and relationships in the data. Common activation functions include sigmoid, tanh, ReLU (Rectified Linear Unit), and softmax.
Loss Function
The loss function measures the difference between the model’s predictions and the actual targets. It serves as the objective function during training, guiding the optimization process to minimize prediction errors. The choice of loss function depends on the nature of the task, such as mean squared error for regression tasks and categorical cross-entropy for classification tasks.
Optimization Algorithm
Optimization algorithms, such as stochastic gradient descent (SGD), Adam, or RMSprop, are used to update the weights of the model iteratively based on the gradients of the loss function with respect to the weights. These algorithms aim to find the optimal set of weights that minimize the loss function.
Overall, the architecture of a deep learning model is designed to efficiently process and learn from complex data, enabling it to make accurate predictions or classifications on unseen examples. The effectiveness of the architecture depends on various factors, including the choice of layers, activation functions, optimization algorithms, and hyperparameters, which are often determined through experimentation and tuning.
Autoencoders are a specific type of feedforward neural network. The general idea is that the input and the output are pretty much the same. What does it mean? Simply put, Autoencoders condense the input into a lower-dimensional code. Based on this, the outcome is produced. In this model, the code is a compact version of the input. One of Autoencoders’ main tasks is to identify and determine what constitutes regular data and then identify the anomalies or aberrations.
Autoencoders comprise three components:
Autoencoders are mainly used for dimensionality reduction and, naturally, anomaly detection (for instance, frauds). Simplicity is one of their greatest advantages. They are easy to build and train. However, there’s also the other side of the coin. You need high-quality, representative training data. If you don’t, the information that comes out of the Autoencoder can be unclear or biased.
As you can see, although deep learning architectures are, generally speaking, based on the same idea, there are various ways to achieve a goal. That’s why it’s so important to choose deep learning architecture correctly. If you want to find out more about this tremendous technology, get in touch with us. With our help, your organization can benefit from deep learning architecture. Let us show you how!
This article is an updated version of the publication from Jul, 21 2020.
[1] Wikipedia. Recurrent neural network. URL: https://en.wikipedia.org/wiki/Recurrent_neural_network. Accessed Jul 21, 2020.
[2] Wikipedia. Bidirectional recurrent neural networks. URL: https://en.wikipedia.org/wiki/Bidirectional_recurrent_neural_networks. Accessed Jul 21, 2020.
[3] Wikipedia. Long short-term memory. URL: https://en.wikipedia.org/wiki/Long_short-term_memory. Accessed Jul 21, 2020.
[4] Samaya Madhavan. Deep learning architectures. Jan 25, 2021. URL: https://developer.ibm.com/technologies/artificial-intelligence/articles/cc-machine-learning-deep-learning-architectures/. Accessed Jul 21, 2020.
[5] Wikipedia. Gated recurrent unit. URL: https://en.wikipedia.org/wiki/Gated_recurrent_unit. Accessed Jul 21, 2020.
[6] Sumit Saha. A Comprehensive Guide to Convolutional Neural Networks — the ELI5 way. Dec 15, 2018. URL: https://towardsdatascience.com/a-comprehensive-guide-to-convolutional-neural-networks-the-eli5-way-3bd2b1164a53. Accessed Jul 21, 2020.
[7] Wikipedia. Deep belief network. URL: https://en.wikipedia.org/wiki/Deep_belief_network. Accessed Jul 21, 2020.
[8] Dr. Yusuf Perwej. An Evaluation of Deep Learning Miniature Concerning in Soft Computing. Feb 2015. URL: https://www.researchgate.net/figure/A-Deep-Stacking-Network-Architecture_fig1_272885058. Accessed Jul 21, 2020.
Category: