As you know from our previous article about machine learning and deep learning, DL is an advanced technology based on neural networks that try to imitate the way the human cortex works. Today, we want to get deeper into this subject. You have to know that neural networks are by no means homogenous. In fact, we can indicate at least six types of neural networks and deep learning architectures that are built on them. In this article, we are going to show you the most popular and versatile types of deep learning architecture. Soon, abbreviations like RNN, CNN, or DSN will no longer be mysterious.
First of all, we have to state that deep learning consists of deep/neural networks of varying topologies. The general principle is that neural networks are based on several layers that proceed data–an input layer (raw data), hidden layers (they process and combine input data), and an output layer (it produces the outcome: result, estimation, forecast, etc.). Thanks to the development of numerous layers of neural networks (each providing some function), deep learning is now more practical.
It’s a bit like a machine learning framework–it allows you to make more practical use of this technology, accelerates your work, and enables various endeavors without the need to build an ML algorithm entirely from scratch.
When it comes to deep learning, you have various types of neural networks. And deep learning architectures are based on these networks. Today, we can indicate six of the most common deep learning architectures:
Don’t worry if you don’t know these abbreviations; we are going to explain each one of them. Let’s start with the first one.
RNN: Recurrent Neural Networks
RNN is one of the fundamental network architectures from which other deep learning architectures are built. RNNs consist of a rich set of deep learning architectures. They can use their internal state (memory) to process variable-length sequences of inputs. Let’s say that RNNs have a memory. Every processed information is captured, stored, and utilized to calculate the final outcome. This makes them useful when it comes to, for instance, speech recognition. Moreover, the recurrent network might have connections that feedback into prior layers (or even into the same layer). This feedback allows them to maintain the memory of past inputs and solve problems in time.
RNNs are very useful when it comes to fields where the sequence of presented information is key. They are commonly used in NLP (i.a. chatbots), speech synthesis, and machine translations.
Currently, we can indicate two types of RNN:
- Bidirectional RNN: They work two ways; the output layer can get information from past and future states simultaneously.
- Deep RNN: Multiple layers are present. As a result, the DL model can extract more hierarchical information.
LSTM: Long Short-Term Memory
It’s also a type of RNN. However, LSTM has feedback connections. This means that it can process not only single data points (such as images) but also entire sequences of data (such as audio or video files).
LSTM derives from neural network architectures and is based on the concept of a memory cell. The memory cell can retain its value for a short or long time as a function of its inputs, which allows the cell to remember what’s essential and not just its last computed value.
A typical LSTM architecture is composed of a cell, an input gate, an output gate, and a forget gate. The cell remembers values over arbitrary time intervals, and these three gates regulate the flow of information into and out of the cell.
- The input gate controls when new information can flow into the memory.
- The output gate controls when the information that is contained in the cell is used in the output.
- The forget gate controls when a piece of information can be forgotten, allowing the cell to process new data.
Today, LSTMs are commonly used in such fields as text compression, handwriting recognition, speech recognition, gesture recognition, and image captioning.
This abbreviation stands for Gated Recurrent Unit. It’s a type of LSTM. The major difference is that GRU has fewer parameters than LSTM, as it lacks an output gate. GRUs are used for smaller and less frequent datasets, where they show better performance.
CNN: Convolutional Neural Networks
This architecture is commonly used for image processing, image recognition, video analysis, and NLP.
CNN can take in an input image, assign importance to various aspects/objects in the image, and be able to differentiate one from the others. The name ‘convolutional’ derives from a mathematical operation involving the convolution of different functions. CNNs consist of an input and an output layer, as well as multiple hidden layers. The CNN’s hidden layers typically consist of a series of convolutional layers.
Here’s how CNNs work: First, the input is received by the network. Each input (for instance, image) will pass through a series of convolution layers with various filters. The control layer controls how the signal flows from one layer to the other. Next, you have to flatten the output and feed it into the fully connected layer where all the layers of the network are connected with every neuron from a preceding layer to the neurons from the subsequent layer. As a result, you can classify the output.
DBN: Deep Belief Network
DBN is a multilayer network (typically deep, including many hidden layers) in which each pair of connected layers is a Restricted Boltzmann Machine (RBM). Therefore, we can state that DBN is a stack of RBMs. DBN is composed of multiple layers of latent variables (“hidden units”), with connections between the layers but not between units within each layer. DBNs use probabilities and unsupervised learning to produce outputs. Unlike other models, each layer in DBN learns the entire input. In CNNs, the first layers only filter inputs for basic features, and the latter layers recombine all the simple patterns found by the previous layers. DBNs work holistically and regulate each layer in order.
DBNs can be used i.a. in image recognition and NLP.
DSN: Deep Stacking Network
We saved DSN for last because this deep learning architecture is different from the others. DSNs are also frequently called DCN–Deep Convex Network. DSN/DCN comprises a deep network, but it’s actually a set of individual deep networks. Each network within DSN has its own hidden layers that process data. This architecture has been designed in order to improve the training issue, which is quite complicated when it comes to traditional deep learning models. Thanks to many layers, DSNs consider training, not a single problem that has to be solved but a set of individual problems.
According to a paper “An Evaluation of Deep Learning Miniature Concerning in Soft Computing” published in 2015, “the central idea of the DSN design relates to the concept of stacking, as proposed originally, where simple modules of functions or classifiers are composed first and then they are stacked on top of each other in order to learn complex functions or classifiers.”
Typically, DSNs consist of three or more modules. Each module consists of an input layer, a hidden layer, and an output layer. These modules are stacked one on top of another, which means that the input of a given module is based on the output of prior modules/layers. This construction enables DSNs to learn more complex classification than it would be possible with just one module.
These six architectures are the most common ones in the modern deep learning world. At this point, we should also mention the last, and considered the most straightforward, architecture. Let’s talk for a second about autoencoders.
Autoencoders are a specific type of feedforward neural network. The general idea is that the input and the output are pretty much the same. What does it mean? Simply put, Autoencoders condense the input into a lower-dimensional code. Based on this, the outcome is produced. In this model, the code is a compact version of the input. One of Autoencoders’ main tasks is to identify and determine what constitutes regular data and then identify the anomalies or aberrations.
Autoencoders comprise three components:
- Encoder (condenses the input and produces the code)
- Decoder (rebuilds the input using the code)
Autoencoders are mainly used for dimensionality reduction and, naturally, anomaly detection (for instance, frauds). Simplicity is one of their greatest advantages. They are easy to build and train. However, there’s also the other side of the coin. You need high-quality, representative training data. If you don’t, the information that comes out of the Autoencoder can be unclear or biased.
Deep Learning Architecture – Conclusion
As you can see, although deep learning architectures are, generally speaking, based on the same idea, there are various ways to achieve a goal. That’s why it’s so important to choose deep learning architecture correctly. If you want to find out more about this tremendous technology, get in touch with us. With our help, your organization can benefit from deep learning. Let us show you how!