반응형

20211002

 

활성화 함수

activation function:

인공지능 모델의 표현력을 높이기 위해 사용하는 함수인데 

선형 :

비선형 : (데이터의 경계를 곡선으로 분리하는 것)를 할 수 있어 복잡한 데이터들의 관계를 더 잘 띄게 만들 수 있다.

  1. 시그모이드 함수 sigmoid function
  2. 소프트맥스 함수 softmax function
  3. ReLU 함수
  4.  

출처 :

텐서플로 첫걸음

인공지능을 위한 수학

반응형

'개념 정리' 카테고리의 다른 글

계층_20210918  (0) 2021.10.11
시그모이드 함수_20210917  (0) 2021.10.11
MNIST 데이터셋_20210915  (0) 2021.10.07
K-평균 알고리즘_20210914  (0) 2021.10.07
학습률_20210913  (0) 2021.10.07
반응형

Activation Function

neuron의 output y를 입력으로 f(y)를 계산

linear system , non-linear system

 

Sigmoid : 0~ 1사이

tanh (Hyperbolic Tangent) :  [−1.0,1.0] 의 범위로 output을 제한함

Step계단 함수: 0아니면 1

ReLU (Rectified Linear Unit) : 0 보다 작은 것을 0으로 수렴하고 0보다 큰것은 원래 값으로 

LeakyReLU :  negative input에 대해 ReLU는 0를 return하나, LeakyReLU는 ax를 return

Softmax: one-hot encoding => 모든 합이 1

 

반응형

'Deep learning > 개념' 카테고리의 다른 글

loss function  (0) 2021.03.27
Optimizer  (0) 2021.03.27
DBN  (0) 2021.03.27
알고리즘 개념  (0) 2021.03.21
top-1 and top-5  (0) 2020.08.18
반응형

아래 내용은 Udemy에서 Pytorch: Deep Learning and Artificial Intelligence를 보고 정리한 내용이다.

 

22. Artificial Neural Networks Section introduction

CNNs

RNNs

 

Artificial Neural Networks: ANNs

neural networks

 

http://alexlenail.me/NN-SVG/index.html

activation functions:

These are what make neural networks 활성화 되는지 

Multiclass classification: 여러개 구분하는 것

Image data: images,. text, and sound

 

23. Forward Propagation

nerual networks -> predictions

 

E.g.: input is a face

one neuron 는 the presence of an 눈을 보고 

one neuron 는 the presence of an 코를 보고 

그들은 각각 다른 feature를 보고 있다.

 

input hidden output

layer

a chain of nerons

uniform structure

 

y =  wx+b

y = ax+b

 

sigmoid

 

앞의 neural network output을 구한다음 뒤어로 전달하면서 계싼한다.

 

regression

dense layer -> dense layer -> dense layer -> linear regression

classification

dense layer -> dense layer -> dense layer -> logistic regression

 

Hierachies

solve the compelicated problem

 

24. The Geometric Picture

geometric picture

feature engineering

linear regression

y hat = ax^2 + b

gradient descent

 

25. Activation Functions

sigmoid 0~1

 f(x) = 1 / 1 + exp(-a)

binary classification

 

standardization

 

tanh  -1~1 

 

vanishing gradient problem

변화가 거의 나지 않을 경우 

 

deaad end

default : Relu 

  doesn't have a "vaishing' gradient..

  the gradient in the left half is aleady vanished!

 

BRU activation

 higher accuracy

 

softplus

 

biological plausibility

 

26. Multiclass Classification

softmax function

softmax technically an activation function , but unlike the sigmoid/tahh, ReLU hidden activations는 아니다.

pytorch softmax function

nn.Sequential(

  nn.Linear(D,M),

  nn.ReLU(),

  nn.Linear(M, K),

  nn.Softmax()

)

 

crossEngropyLoss()

 

model = nn.Linear(D,K)

criterion = nn.CrossEntropyLoss()

 

activation function

task activation function
Regression None/Identity
binary classification sigmoid
multiclass classification softmax

 

The Model Type Doesn't matter

 

linear regression

dense

 

ann Regression 

dense + Dense

 

binary Logistic Regression

Dense+sigmoid

 

ANN Binary Classification

Dense+Dense+sigmoid

 

Multicalss Logistic Regression

Dense+ Softmax

ANN Multiclass Classification

Dense+Dense+ Softmax

 

same pattern applies to CNNs,RNNs - the type of task corresponds only to the final activation function

 

softmax is more general

multiclass classification

binary classification k = 2

27. How to Represent Images

이미지가 어떻게 데이터에 입력 되는지 확인 해야 한다.

height/width

matrix

column of the image

 

colors?

RGB  red/green/bue

black = 0

white  = 255

 

Images as input to neural networks

0 ... 255

feature vector

 

3dimensions: height,width , color

 

quantization:

color is light, measured by light intensity

fugured out that 8 bits(1byte)

2^3 => 0 ~ 255 

=> 500 x 500의 이미지는 얼마 만큼의 space를 찾이하는가 ?

500 x 500 x 3 x 8 = 6 million bits

jepg allows us to compress images

 

Hex Colors

each byte( 8 bits)

 

Grayscale Images : not have color 

2- D array (height, width)

black = 0 , white = 255

only be a white and black

plt.imshow() 

plt.imshow( , cmap ='gray') 

 

Images as input to neural networks

0...1  => 사이가 편하다.

 

Another exception 

VGG

images are centered around 0, but the range is still 256

 

Images as input to neural networks

N = #samples, D = #features

input X of shape NxD

A single image is HxWxC 

N x HxWxC 

 

Image to Feature Vector

reshape() or view()

NxD  array

28. Code Preparation (ANN)

1. load int the data

       MNIST dataset ->handwrite

2. build the model

3. train the model

4. evaluate the model

5. make the predictions

 

pytorch load MNIST

step1. load in the data -> pytorch library

grascale => 28x28 

train_dataset = torchvision.datasets.MNIST(

root = '.',

train = True, 

download = True)

x_train = train_dateset.data

y_train = train_dateset.targets

x_train.shape = N x 28 x 28

y_train.shape = N

n = 60,000

 

 

test_dataset = torchvision.datasets.MNIST(

root = '.',

train = False, 

download = True)

x_test = test_dataset.data

y_test = test_dataset.targets

x_test.shape = Ntest x 28 x 28

y_test.shape = Ntest

Ntest = 1,000

 

 

trainsforming the data

# reshape the input  -> small range

inputs = inputs.view(-1, 784)

 

step 2. model

model = nn.Sequential(

 nn.Linear(784, 128),

 nn.ReLU(),

 nn.Linear(128, 10)

)

10->  classification 결과 

 

step 3. trian the model

batch gradient Descent

for epoch in range(epochs):

  for x_batch, y_batch in batches(X,Y , batch_size = 128): => batch_size로 나누어서 학습 한다.

    train(x_batch, y_batch)

 

Batch Gradient Descent in pytTorch

train_loader = torch.utils.data.DataLoader(dataset = train_dataset, batch_siz e= batch_size, shuffle = True)

 

for epoch in range(epochs):

  for inputs, targets in train_loader:

    optimizer.zero_grad()

 

ramdom sample

 

step 4/5

n_correct = 0

n_total = 0

for inputs, targets in train_loader:

  output = model(inputs)

 

acc = n_correct/ n_total

 

_, predictions = torch.max(outputs, 1)

 

 

 

29. ANN for Image Classification

import torch
import torch.nn as nn
import torchvision
import torchvision.transforms as transforms
import numpy as np
import matplotlib.pyplot as plt
train_dataset = torchvision.datasets.MNIST(
  root = '.',
  train = True, 
  transform = transforms.ToTensor(),
  download = True)
train_dataset.data

train_dataset.data.max()

train_dataset.data.shape

train_dataset.targets

 

이미 다운로드 되여서 다운로드 하지는 않는다.

train_dataset = torchvision.datasets.MNIST(
  root = '.',
  train = True, 
  transform = transforms.ToTensor(),
  download = True)

model = nn.Sequential(
 nn.Linear(784, 128),
 nn.ReLU(),
 nn.Linear(128, 10)
)
# no need for final softmax!

gpu를 사용  여부 확인하면서 있을 경우 사용한다.

속도와 관련 있다. 

device = torch.device("cuda:0" if torch.cuda.is_available() else 'cpu')
print(device)
model.to(device)

loss and optimizer

ctriterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters())
batch_size = 128
train_loader = torch.utils.data.DataLoader(dataset = train_dataset, batch_size = batch_size, shuffle = True)
test_loader = torch.utils.data.DataLoader(dataset = test_dataset, batch_size = batch_size, shuffle = False)

 

 

tmp_loader = torch.utils.data.DataLoader(dataset = train_dataset, batch_size = 1, shuffle = True)
tmp_loader

for x,y in tmp_loader:
    print(x)
    print(x.shape)
    print(y.shape)
    break

train_dataset.transform(train_dataset.data.numpy()).max()
epochs= 10

train_losses = np.zeros(epochs)
test_losses = np.zeros(epochs)

for epoch in range(epochs):
  train_loss = []
  for inputs, targets in train_loader:
    inputs, targets = inputs.to(device) , targets.to(device)

    inputs = inputs.view(-1, 784)
    optimizer.zero_grad()

    outputs = model(inputs)
    loss = ctriterion(outputs, targets)

    loss.backward()
    optimizer.step()

    train_loss.append(loss.item())
  
  train_loss = np.mean(train_loss)

  test_loss = []
  for inputs, targets in test_loader:
    inputs, targets = inputs.to(device) , targets.to(device)

    inputs = inputs.view(-1, 784)
    optimizer.zero_grad()

    outputs = model(inputs)
    loss = ctriterion(outputs, targets)

    test_loss.append(loss.item())
  
  test_loss = np.mean(test_loss)

  train_losses[epoch] = train_loss
  test_losses[epoch] = test_loss
  print(f'Epoch {epoch+1} / {epochs} , train loss : {train_loss:.4f} , Test loss: {test_loss:.4f}')

 

plt.plot(train_losses, label ='train loss')
plt.plot(test_losses, label = 'test loss')
plt.legend()
plt.show()

n_correct = 0.
n_total = 0.
for inputs, targets in train_loader:
  inputs, targets = inputs.to(device), targets.to(device)
  inputs = inputs.view(-1, 784)
  outputs = model(inputs)
  _, predictions = torch.max(outputs, 1)
  n_correct += (predictions == targets).sum().item()
  n_total+= targets.shape[0]
train_acc = n_correct/ n_total
  
n_correct = 0.
n_total = 0.
for inputs, targets in test_loader:
  inputs, targets = inputs.to(device), targets.to(device)
  inputs = inputs.view(-1, 784)
  outputs = model(inputs)
  _, predictions = torch.max(outputs, 1)
  n_correct += (predictions == targets).sum().item()
  n_total+= targets.shape[0]
test_acc = n_correct/ n_total
print(f"Train acc: {train_acc:.4f} , Test acc:{test_acc:.4f}")
from sklearn.metrics import confusion_matrix
import numpy as np
import itertools
def plot_confusion_matrix(cm, classes, normalize = False, title =' Confusion matrix', cmap = plt.cm.Blues):
  if normalize:
    cm = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis]
    print("Normalized confusion matrix")
  else:
    print("confusion_matrix: without Normalized")
  print(cm)
  plt.imshow(cm, interpolation='nearest', cmap=cmap)
  plt.title(title)
  plt.colorbar()
  tick_marks = np.arange(len(classes))
  plt.xticks(tick_marks, classes, rotation = 45)
  plt.yticks(tick_marks, classes)

  fmt ='.2f' if normalize else 'd'
  thresh = cm.max()/2
  for i, j in itertools.product(range(cm.shape[0]), range(cm.shape[1])):
    plt.text(j, i, format(cm[i,j], fmt), horizontalalignment="center" , color="white" if cm[i, j]> thresh else 'black')
  
  plt.tight_layout()
  plt.ylabel('True label')
  plt.xlabel('Predicted label')
  plt.show()

 

x_test = test_dataset.data.numpy()
y_test = test_dataset.targets.numpy()
p_test = np.array([])
for inputs, targets in test_loader:
  inputs = inputs.to(device)

  inputs = inputs.view(-1, 784)
  
  outputs = model(inputs)
  _, predictions = torch.max(outputs, 1)
  p_test = np.concatenate((p_test, predictions.cpu().numpy()))

cm = confusion_matrix(y_test, p_test)
plot_confusion_matrix(cm, list(range(10)))

 

결과가 안같은 것 보여주기 

misclassified_idx = np.where(p_test != y_test)[0]
i = np.random.choice(misclassified_idx)
plt.imshow(x_test[i], cmap ='gray')
plt.title("True label: %s Predicted: %s" % (y_test[i], int(p_test[i])))

 

 

30. ANN for Regression

pytorch regression

import torch
import torch.nn as nn
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
N = 1000
X = np.random.random((N,2)) * 6 -3
y = np.cos(2 * X[:,0]) + np.cos(3*X[:,1])
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.scatter(X[:,0] , X[:,1] , y)

notebook과 다른점은 plt.show()할 필요 없다.

 

모델 생성하기

#build the model

model = nn.Sequential(
    nn.Linear(2, 128),
    nn.ReLU(),
    nn.Linear(128, 1)
)
criterion = nn.MSELoss()
optimizer = torch.optim.Adam(model.parameters(), lr = 0.01)
def full_gd(model, criterion, optimizer, X_train, y_train, epochs= 1000):
  train_losses = np.zeros(epochs)

  for epoch in range(epochs):
    optimizer.zero_grad()

    outputs = model(X_train)
    loss = criterion(outputs, y_train)

    loss.backward()
    optimizer.step()

    train_losses[epoch] = loss.item()

    if(epoch+1) % 50 == 0:
      print(f'Epoch {epoch+1}/{epochs}, Train loss:{loss.item():.4f} ')

  return train_losses

X_train = torch.from_numpy(X.astype(np.float32))
y_train = torch.from_numpy(y.astype(np.float32).reshape(-1,1))
train_lossses = full_gd(model, criterion, optimizer, X_train, y_train)
plt.plot(train_losses)

fig = plt.figure()
ax = fig.add_subplot(111, projection = "3d")
ax.scatter(X[:,0] , X[:,1] , y)

with torch.no_grad():
  line = np.linspace(-3, 3, 50)
  XX, yy = np.meshgrid(line, line)
  Xgrid = np.vstack((XX.flatten(), yy.flatten())).T
  Xgrid_torch = torch.from_numpy(Xgrid.astype(np.float32))
  yhat = model(Xgrid_torch).numpy().flatten()
  ax.plot_trisurf(Xgrid[:, 0] , Xgrid[:, 1] , yhat, linewidth = 0.2 , antialiased = True)
  plt.show()

 

아래 그림은 더 크게 만들어준다.

fig = plt.figure()
ax = fig.add_subplot(111, projection = "3d")
ax.scatter(X[:,0] , X[:,1] , y)

with torch.no_grad():
  line = np.linspace(-5, 5, 50)
  XX, yy = np.meshgrid(line, line)
  Xgrid = np.vstack((XX.flatten(), yy.flatten())).T
  Xgrid_torch = torch.from_numpy(Xgrid.astype(np.float32))
  yhat = model(Xgrid_torch).numpy().flatten()
  ax.plot_trisurf(Xgrid[:, 0] , Xgrid[:, 1] , yhat, linewidth = 0.2 , antialiased = True)
  plt.show()

반응형

+ Recent posts