Introduction to Deep Learning in PyTorch
If you are getting started with data science and deep learning, then this article is for you
Prerequisites
You just need to know the following:
- Basic programming with Python (variables, data types, loops, functions etc.)
- Some high school mathematics (vectors, matrices, derivatives, and probability)
Summary:
1- Installation
2- NumPy basics
3- PyTorch basics
4- First model
5- Conclusions
6- What should I learn now?
7- References
1. Installation
# Linux / Binder
# !pip install numpy torch==1.7.0+cpu torchvision==0.8.1+cpu torchaudio==0.7.0 -f https://download.pytorch.org/whl/torch_stable.html
# Windows
# !pip install numpy torch==1.7.0+cpu torchvision==0.8.1+cpu torchaudio==0.7.0 -f https://download.pytorch.org/whl/torch_stable.html
# MacOS
# !pip install numpy torch torchvision torchaudio
2. NumPy basics
W3schools introduction to NumPy:
NumPy is a Python library used for working with arrays.
It also has functions for working in domain of linear algebra, fourier transform, and matrices.
NumPy was created in 2005 by Travis Oliphant. It is an open source project and you can use it freely.
NumPy stands for Numerical Python.
Below are some basics
For more info
https://numpy.org/doc/stable/user/quickstart.html
https://www.w3schools.com/python/numpy/numpy_intro.asp
3. PyTorch basics
PyTorch is a Python library for processing tensors. A tensor can be a number, vector, matrix, or any n-dimensional array.
In tensors, you have many properties like the size, the number of dimensions of the tensor and the type
Now we are going to do some basics operations, like addition, subtraction, division, and multiplication
And if we want to calculate the average?
Let’s do some more complex math than average and addition
Here is the math behind z.backward()
For more complex equations, need to use the chain rule
And here is the math behind this equation
In PyTorch .backwards() is more used to calculate the gradients of the loss in a model, so is more common to calculate derivatives of matrices. To calculate the matrix need to use the Jacobian Matrix, but I will not enter this details, if you want to know more about this math I recommend you make some searches.
4. First Model
Now let’s start our first.
First: We need to work with some data, so let’s create a simple dataset.
Second: Before create the model from scratch, let’s understand what is a Deep Learning model.
A model basically is a mathematic function that applies some operations on an input and return a value.
A Deep Learning model have parameters and layers. And Deep Learning have this name because a model can have multiple layers.
Parameters are variables that are used in mathematic operations to modify the input, and from the computation loss, the model update the parameters to return a better value. A model have two types of parameters, weights, and biases. When a model initializes, it weights and biases are random values between -1 and 1
A linear layer is one of the basics things of a deep learning model.
The equation of the linear layer basically is the linear layer transposed weight multiplied by the input and sum by a bias.
Now let’s create our model.
The weight's matrix size is target_values X input_values and the bias matrix size is the targets size
Third: Now that we have the data and the model, let’s define the optimizer and the loss function.
The loss function is used to calculate the difference between the model predictions and the actual target.
In our case, we will use the Mean square error to calculate the difference.
But what we will do with the model loss? We will use the loss to calculate the gradient of the loss by every parameter of the model to optimize the model.
With the gradients we can adjust the parameters to optimize the model, because imagine the loss graph, and calculating the derivative of the loss by the parameters, we get big is the difference, so, if the derivative is not 0, so the loss can be adjusted to be smaller. Lets took an example and analyze a loss graph.
If the gradient is positive:
- increasing the weight element’s value slightly will increase the loss
- decreasing the weight element’s value slightly will decrease the loss
If a gradient element is negative:
- increasing the weight element’s value slightly will decrease the loss
- decreasing the weight element’s value slightly will increase the loss
So in our case, we will use this method to slightly subtract the parameters to optimize the model (descending along the gradient).
Fourth: So finally, let's train our model.
The training is divided in 5 steps:
- Make predictions
- Calculate the loss
- Calculate the gradients
- Adjust parameters
- Reset grads
And is basically these 5 steps repeating for a couple of times.
Now when we check the loss, the model is almost perfect
5. Conclusions
The outputs are exactly like the targets. This is because our dataset is very simple, and it is just a basic linear prediction.
Let’s see our data throw a graph
This is the line that our Deep Learning model should discover. And in our case, our model discovered this exactly line.
Using the real-world data, it is impossible to create a perfect deep learning model with 100% of accuracy because the data rarely will be linear like in our model.
In machine learning you need to be very careful while selecting the hyperparameters like the learning rate (how much of the grad you will use to subtract to the parameters)
But now, let’s evaluate our model with unseen data to see if it is really working.
Now that our model is working very well, you can feel free to create your own dataset and change the learning rate and the training epochs.
Here are the full code:
6. What should I learn now?
In this article I don’t teach you deep about the .backward, so I recommend you to watch this video:
I didn’t make this video, but is one of the bests I found on the internet.
I recommend you also to check the full playlist.
After watching this video, I recommend you to learn how to do the same thing we did, but using PyTorch classes, then learning how to do a multiple layer model, convolutional layers, auto-encoders, GAN (Generative Adversarial Network) and VAE (Variational Auto-Encoders). Then you will have a good understanding about Deep Learning, and will be able to decide what the next path to take inside Deep Learning.
In a near future I will release another article about Deep Learning, teaching more about multiple layers model.
7. References
Thank you very much for reading this article, I hope you learned the basics!