Neural networks for dummies
Are you new to neural networks and unsure/confused about how it works? Then let’s dive into it and put on our scientist hat for a moment.
We live in the modern age, led by the internet and technology. It’s no secret that we owe much of this advancement to machine learning, especially neural networks. From virtual assistants to language translations, visual recognition to recommendation systems, and fraud detection to self-driving cars, deep learning has made the “Unthinkable” Thinkable.
While many have a rough idea about how artificial intelligence and machine learning work and how they’re there to help the machines learn and act as humans do, the same cannot be said for deep learning. Even the tech community isn’t yet deeply acquainted with the basics of deep learning and how it differs from the former technologies.
So, today’s article aims to fill that gap. If you’re new to deep learning and unsure about how it works and can achieve the things it has, this article is for you. We’ll dive into the details of how it works, starting from the basics.
So, let’s start without any further ado.
What Exactly Is Deep Learning?
Deep learning is technically a subset of machine learning, which is a subset of artificial intelligence. Confused? Let’s break it down further.
Artificial Intelligence is the concept of creating machines that behave like humans, i.e., smart machines. Machine Learning refers to a set of algorithms that use data to make informed decisions by learning over time. Deep Learning, on the other hand, uses a multi-layered structure with large amounts of data to learn.
The multi-layered structure we mentioned above is known as an Artificial Neural Network and is based on the neural networks present in our brains. Our brain makes decisions based on experiences and the patterns identified. As a child, we can’t recognize patterns or make decisions, but as we grow and interact with our surroundings, we learn from them and perform complex tasks – this is exactly how deep learning work.
For example, if a child is given a ball for the first time, they won’t know what it is unless they are told. But the next time, they can identify it from their past learning – neural networks work the same way. They are given a vast amount of data as input. They learn from it, going from inaccurate to near-accurate results, just as the brain does.
Now, before diving into the specifics of how neural networks can achieve this, let’s go through some prerequisites to understand how deep learning works properly.
Neural Networks Basics – Step By Step
First, neurons are the most basic units in the neural network. Let’s take the example of the human brain again. It contains numerous neurons interconnected with one another.
Here, a neuron takes inputs from other neurons through thread-like branches known as dendrites. It then sums those inputs and fires if the result exceeds a certain threshold – otherwise, it doesn’t.
Similarly, a neuron in the artificial neural network is linked to other neurons, known as its inputs, which are added and passed through an activation function to calculate the output. The only difference is that instead of only firing or not firing, aka outputting 1 or 0, it can have a wide range of numerical outputs.
Each input gets multiplied by a value known as the neuron’s weight. It tells the effect the input has on the output, i.e., the greater the weight, the more the input effect. Moreover, we add a constant value known as the bias to the product of the inputs and weights. It is used to offset the result, and better fit the outcome with the given data.
After adding the weighted inputs and the bias, the result is passed through an Activation Function to calculate the output. There are many different types of activation functions that are used depending on the problem at hand. But let’s consider the step activation function for now, which returns 1 if its input is higher than a certain threshold; otherwise, 0.
Let’s take an example to clarify these concepts. Consider the following configuration depicted in the figure below, where inputs are 0.5 and 0.6, and the weights are 0.5 and 0.7, respectively. Let’s keep the threshold at 0.5 for the step activation function and the bias at 0.
First, we need to get the weighted sum, i.e.,
Then we need to pass the result to the step activation function. Since the sum, 0.67, is greater than the threshold (0.5), the output will be 1, i.e., the neuron will fire. Moreover, this simplistic configuration is also known as perception.
But how do we use this configuration to help us solve a problem, for example, classifying an image as a ball? For that, we need to determine the weights that result in an output, 1 when the image has a ball and 0 otherwise. But again, how do we get these optimal weights? Well, at the start, we initialize the weights with random values. Then we feed the model a lot of the data and calculate the output. We update these weights based on the difference between the actual and the expected output – the error.
This process is continued until we end up with values that provide us with more accurate results.
The perceptron can only be used when the data is linearly separable. It is useless if data is complex and can’t be separated linearly. And since most real-world scenarios include complex data, we need something more powerful than the perceptron, and that’s where deeper networks, more specifically, multi-layer neural networks, come into the picture.
Multi-Layer Neural Networks – The Game Changer
Instead of having a single neuron, we now have multiple neurons in different interconnected layers. So, we have an input layer that receives the input, any number of hidden layers, and an output layer. Moreover, all the outputs of one layer are passed as inputs to the next layer, thus creating a network.
In the above case, the input layer was directly linked with the output, so we could easily update the weights using the error at the output layer. But this won’t be enough since we have many layers now, and we need to know the error at each hidden layer. However, this approach will only allow us to compute the weights for the last hidden layer, which won’t work.
Backpropagation – The Backbone
Backpropagation allows us to compute the weights throughout the multi-layer network based on the calculated cost, i.e., to find the weights where the cost is minimum. It calculates the cost at the output layer and propagates them to hidden layers in the backward direction. Moreover, gradient descent is an optimization technique used in neural networks to minimize the cost function.
Backpropagation and gradient descent are vast topics and are out of the scope of this article. If you want to dive deeper, check out this link.
Some Common Types of Neural Networks
In this section, let’s go through some of the most common types of neural networks used in the industry today. This will give you a better idea of how such a technology could solve real-world problems.
Convolutional Neural Networks (CNNs)
Convolutional neural networks (or CNNs) are another type of neural network commonly used in computer vision. They are more complex than the basic multi-layer perceptron as they directly take a raw image as an input. The main segment of CNN is a kernel or filter that extracts required features from the given data to complete a specific task. CNNs are widely used in face authentication, optical character recognition (OCR), object detection, etc.
Recurrent Neural Networks (RNNs)
In a recurrent neural network, the output of the current step is dependent not only on the current input but also on the outcomes from the previous step by using a looping mechanism in the hidden state. It also has an internal memory to store previous inputs. This structure allows RNNs to be used for sequential data, for example, text data, speech, or time series data.
Long Short-Term Memory Networks (LSTMs)
LSTM networks are an extension of RNNs and retain information in the memory for longer. While RNNs can also keep the information, it becomes difficult to use for longer sequences because of the vanishing gradient problem. LSTM solves this issue by using sets of gates that decide which data needs to be dropped, which should be retained, and for how long.
Generative Adversarial Networks (GANs)
Generative Adversarial Networks, more commonly known as GANs, are a type of deep learning algorithm that generate new data from scratch that resembles the training data. It makes use of two models, the generator and the discriminator. The generator creates fake data, and the discriminator finds the difference between actual and simulated data. The generator uses the discriminator’s predictions to learn. Finally, it starts generating close-to-real data that the discriminator fails to identify. They have many exciting applications, such as face aging, deep fake, text-to-image translation, photos to emojis, etc.
Neural networks have become a significant aspect of our lives now. They are now behind almost every new technology and invention and play a crucial role in helping computers make intelligent decisions with minimal human intervention.
In this article, we covered what deep learning is and how it relates to machine learning and artificial intelligence. We then talked about the neuron – the basic building block of the neural network, how they work, and their limitations. We also saw how a multi-layer neural network overcomes the problems of a perceptron and can solve more complex real-world problems. Finally, we talked about other deep learning algorithms and their use cases.
Here at Talendor we have experienced Data Analysts and Machine Learning experts ready to start working remotely on your project. Sounds interesting? Please check here and hire a Machine Learning specialist today.
#data science#deep learning#home#machine learning#neural networks