# Lecture 8 - PyTorch

This will be the final lecture, today we will first have a brief introduction of deep learning, then we will look at some basics of using PyTorch to implement some simple models in deep learning.



## Deep Learning Libraries

There are many deep learning libraries available, the most common ones for python are

- TensorFlow, Keras
- PyTorch

Working with tensorflow requires going into lot of details of the contruction of the computation graph, whereas Keras is a higher level interface for tensorflow. Tensorflow is very popular in the industry and good for production code.

PyTorch can be used as low level interface, but is much more user-friendly than tensorflow, but it also has a higher level interface. Pytorch is more popular in the research community and the 1.0 version was only recently released.


## Main features that any deep learning library should provide

No matter what library or language you use, the main features provided by a deep learning library are 
1. Use the GPU to speed up computation 
2. Ability to do automatic differentiation
3. Useful library functions for common architectures and optimization algorithms

### PyTorch
We will look at all of the above in pytorch.
The best way to think about pytorch is that its numpy + GPU + autograd.

You can install it with

```conda install pytorch```

In [None]:
import torch

The equivalent object to numpy arrays in pytorch are called tensors, but they are just multidimensional arrays.

In [None]:
torch.tensor([2,3,4,5])

In [None]:
torch.zeros((5,5))

In [None]:
x = torch.ones((5,5))
x

In [None]:
2*x + 5

In [None]:
torch.rand(5,5)

In [None]:
x = torch.rand(25)
x

In [None]:
x=x.reshape(-1,5)
x

In [None]:
x.shape

In [None]:
print(torch.arange(10))
print(torch.eye(5))
print(torch.linspace(0,1,10))

Some functions are a bit different

In [None]:
A = torch.rand(5,5)
x = torch.ones(5,1)
A.mm(x)

In [None]:
import numpy as np
A = np.random.rand(5,5)
x = np.ones((5,1))
A.dot(x)

You can convert tensors to a numpy array that shares its memory with the pytorch tensor

In [None]:
x = torch.ones(5,5)
x

In [None]:
xn = x.numpy()
xn

In [None]:
xn[4,2]=10
xn

In [None]:
x

### Using the GPU

The GPU (Graphical Processing Unit) is a separate processing unit that is specialized to handle bulk computations required for rendering high quality graphics. It mainly consists of a large number of processor cores that are individually very slow, but because of their sheer number (around 2000) they can churn through computations very quickly. 

In [None]:
import torch
torch.cuda.is_available()

Installing the GPU drivers and the CUDA toolkit can be quite messy, so if you just want to experiment with GPUs and deep learning libraries, you can use [Google colaboratory](https://colab.research.google.com/)

In [None]:
gpu = torch.device("cuda")
cpu = torch.device("cpu")

In [None]:
A = torch.rand(100,100)
B = torch.rand(100,100)

In [None]:
A.mm(B)

In [None]:
A_gpu = A.to(gpu)
B_gpu = B.to(gpu)

In [None]:
A_gpu.mm(B_gpu)

In [None]:
A.mm(B_gpu)

In [None]:
C_gpu = A_gpu.mm(B_gpu)
C = C_gpu.to(cpu)
C

### GPU - CPU memory transfer

In [None]:
big_mat = torch.rand(20000,20000);

In [None]:
big_mat_gpu = big_mat.to(gpu)

In [None]:
big_mat= big_mat_gpu.to(cpu)

In [None]:
del big_mat_gpu
torch.cuda.empty_cache()

In [None]:
del big_mat

## Speedup from GPU

In [None]:
%%timeit
A = torch.rand(3000,3000)
B = torch.rand(3000,3000)
C = torch.zeros(3000,3000)
C.copy_(B)
for i in range(5):
    C=torch.mm(A,C)

In [None]:
%%timeit
A = torch.rand(3000,3000, device = gpu)
B = torch.rand(3000,3000, device = gpu)
C = torch.zeros(3000,3000, device = gpu)
C.copy_(B)
for i in range(5):
    C=torch.mm(A,C)

## Automatic Differentiation

PyTorch uses dynamic computation graphs to compute the gradients of the parameters.

In [None]:
x = torch.tensor([2.0])
m = torch.tensor([5.0], requires_grad = True)
c = torch.tensor([2.0], requires_grad = True)

In [None]:
y = m*x + c
y

Define an error for your function

In [None]:
loss = torch.norm( y - 13)

In [None]:
m.grad

Calling `x.backward()` on any tensor forces pytorch to compute all the gradients of the tensors used to compute `x` which had the `requires_grad` flag set to `True`. The computed gradient will be stored in the `.grad` property of the tensors

In [None]:
loss.backward()

In [None]:
m.grad

In [None]:
c.grad

In [None]:
with torch.no_grad():
    m -= 0.01 * m.grad
    c -= 0.01 * c.grad

In [None]:
m,c

In [None]:
m.grad, c.grad

In [None]:
m.grad.zero_()
c.grad.zero_()

m.grad, c.grad

In [None]:
y = m*x + c

In [None]:
y

In [None]:
loss = torch.norm( y - 13)

In [None]:
loss.backward()
m.grad, c.grad

### Making it more compact

In [None]:
def model_fn(x,m,c):
    return m*x + c

In [None]:
def loss_fn(y,yt):
    return torch.norm(y-yt)

In [None]:
m = torch.tensor([5.0], requires_grad = True)
c = torch.tensor([2.0], requires_grad = True)

In [None]:
x = torch.tensor([2.0])
yt = torch.tensor([13.0])

In [None]:
y = model_fn(x,m,c)
loss = loss_fn(y,yt)
loss.backward()
with torch.no_grad():
    m -= 0.05 * m.grad
    c -= 0.05 * c.grad
m.grad.zero_()
c.grad.zero_()

print( f" m = {m}\n c = {c}\n y = {y}\n loss = {loss}")

### Slightly more complicated problem

In [None]:
import matplotlib.pyplot as plt

In [None]:
def model_fn(x,m,c):
    return m@x + c

In [None]:
def loss_fn(y,yt):
    return torch.norm(y-yt)

In [None]:
m = torch.rand((5,5), requires_grad = True)
c = torch.ones((5,1), requires_grad = True)

In [None]:
x = torch.randn(5,100)
yt = torch.randn(1,100)
losses = []

In [None]:
y = model_fn(x,m,c)
loss = loss_fn(y,yt)
loss.backward()
with torch.no_grad():
    m -= 0.05 * m.grad
    c -= 0.05 * c.grad
m.grad.zero_()
c.grad.zero_()

losses+=[loss.item()]
print( f"loss = {loss}")
plt.plot(losses);

## Using Library functions

In [None]:
model = torch.nn.Sequential(
    torch.nn.Linear(5, 5),
    torch.nn.ReLU(),
    torch.nn.Linear(5, 5),
)

In [None]:
list(model.parameters())

In [None]:
loss_fn = torch.nn.MSELoss(reduction='sum')

In [None]:
x = torch.randn(100,5)
yt = torch.randn(100,1)
losses = []

In [None]:
y = model(x)
loss = loss_fn(y,yt)
loss.backward()
with torch.no_grad():
    for param in model.parameters():
        param -= 0.01 * param.grad
        
model.zero_grad()

losses+=[loss.item()]
print( f"loss = {loss}")
plt.plot(losses);

Using the optim package

In [None]:
optimizer = torch.optim.Adam(model.parameters(), lr=0.03)

In [None]:
y = model(x)
loss = loss_fn(y,yt)
loss.backward()

optimizer.step()
optimizer.zero_grad()

losses+=[loss.item()]
print( f"loss = {loss}")
plt.plot(losses);

## MNIST Example

In [None]:
from torchvision.datasets import MNIST

In [None]:
data = MNIST(".",download=True)

In [None]:
img,y = data[2]

In [None]:
img

In [None]:
y

In [None]:
data.train_data[2].shape

In [None]:
data.train_labels[2]

### MNIST Training

In [None]:
model = torch.nn.Sequential(
    torch.nn.Linear(784, 100),
    torch.nn.ReLU(),
    torch.nn.Linear(100, 100),
    torch.nn.ReLU(),
    torch.nn.Linear(100, 10),
)

In [None]:
loss_fn = torch.nn.CrossEntropyLoss()

In [None]:
sample = np.random.choice(range(len(data.train_data)),1000)
x = data.train_data[sample].reshape(1000,-1).float()/255
yt = data.train_labels[sample]

In [None]:
optimizer = torch.optim.Adam(model.parameters(), lr=0.03)
losses = []

In [None]:
for i in range(100):
    y = model(x)
    loss = loss_fn(y,yt)
    loss.backward()

    optimizer.step()
    optimizer.zero_grad()

    losses+=[loss.item()]
    print( f"loss = {loss}")
plt.plot(losses);

In [None]:
x_test = data.train_data[-1000:].reshape(1000,-1).float()/255
y_test = data.train_labels[-1000:]

In [None]:
with torch.no_grad():
    y_pred = model(x_test)

In [None]:
print("Accuracy = ", (y_pred.argmax(dim=1) == y_test).sum().float().item()/1000.0)

## Course Conclusion

By now you should have a sufficient introduction to the various ways one can use python for scientific computing. The best way to learn more is to start using python for whatever project you are working on. Only practice will make you comfortable with using python.

### Reminders
    - Assignment 2 is due Feb 17, 5PM
    - There will be office hours after class now
    - There will be office hours next thursday as well from 10AM to 12PM
    - You can also email me and schedule office hours by appointment, if you cannot make it during the above times

In [None]:
#