'Activation Function' 태그의 글 목록

20211002

활성화 함수

activation function:

인공지능 모델의 표현력을 높이기 위해 사용하는 함수인데

선형 :

비선형 : (데이터의 경계를 곡선으로 분리하는 것)를 할 수 있어 복잡한 데이터들의 관계를 더 잘 띄게 만들 수 있다.

출처 :

텐서플로 첫걸음

인공지능을 위한 수학

'개념 정리' 카테고리의 다른 글

계층_20210918 (0)	2021.10.11
시그모이드 함수_20210917 (0)	2021.10.11
MNIST 데이터셋_20210915 (0)	2021.10.07
K-평균 알고리즘_20210914 (0)	2021.10.07
학습률_20210913 (0)	2021.10.07

Activation Function

neuron의 output y를 입력으로 f(y)를 계산

linear system , non-linear system

Sigmoid : 0~ 1사이

tanh (Hyperbolic Tangent) : [−1.0,1.0] 의 범위로 output을 제한함

Step계단 함수: 0아니면 1

ReLU (Rectified Linear Unit) : 0 보다 작은 것을 0으로 수렴하고 0보다 큰것은 원래 값으로

LeakyReLU : negative input에 대해 ReLU는 0를 return하나, LeakyReLU는 ax를 return

Softmax: one-hot encoding => 모든 합이 1

'Deep learning > 개념' 카테고리의 다른 글

loss function (0)	2021.03.27
Optimizer (0)	2021.03.27
DBN (0)	2021.03.27
알고리즘 개념 (0)	2021.03.21
top-1 and top-5 (0)	2020.08.18

아래 내용은 Udemy에서 Pytorch: Deep Learning and Artificial Intelligence를 보고 정리한 내용이다.

22. Artificial Neural Networks Section introduction

CNNs

RNNs

Artificial Neural Networks: ANNs

neural networks

activation functions:

These are what make neural networks 활성화 되는지

Multiclass classification: 여러개 구분하는 것

Image data: images,. text, and sound

23. Forward Propagation

nerual networks -> predictions

E.g.: input is a face

one neuron 는 the presence of an 눈을 보고

one neuron 는 the presence of an 코를 보고

그들은 각각 다른 feature를 보고 있다.

input hidden output

layer

a chain of nerons

uniform structure

y = wx+b

y = ax+b

sigmoid

앞의 neural network output을 구한다음 뒤어로 전달하면서 계싼한다.

regression

dense layer -> dense layer -> dense layer -> linear regression

classification

dense layer -> dense layer -> dense layer -> logistic regression

Hierachies

solve the compelicated problem

24. The Geometric Picture

geometric picture

feature engineering

linear regression

y hat = ax^2 + b

gradient descent

25. Activation Functions

sigmoid 0~1

f(x) = 1 / 1 + exp(-a)

binary classification

standardization

tanh -1~1

vanishing gradient problem

변화가 거의 나지 않을 경우

deaad end

default : Relu

doesn't have a "vaishing' gradient..

the gradient in the left half is aleady vanished!

BRU activation

higher accuracy

softplus

biological plausibility

26. Multiclass Classification

softmax function

softmax technically an activation function , but unlike the sigmoid/tahh, ReLU hidden activations는 아니다.

pytorch softmax function

nn.Sequential(

nn.Linear(D,M),

nn.ReLU(),

nn.Linear(M, K),

nn.Softmax()

)

crossEngropyLoss()

model = nn.Linear(D,K)

criterion = nn.CrossEntropyLoss()

activation function

task	activation function
Regression	None/Identity
binary classification	sigmoid
multiclass classification	softmax

The Model Type Doesn't matter

linear regression

dense

ann Regression

dense + Dense

binary Logistic Regression

Dense+sigmoid

ANN Binary Classification

Dense+Dense+sigmoid

Multicalss Logistic Regression

Dense+ Softmax

ANN Multiclass Classification

Dense+Dense+ Softmax

same pattern applies to CNNs,RNNs - the type of task corresponds only to the final activation function

softmax is more general

multiclass classification

binary classification k = 2

27. How to Represent Images

이미지가 어떻게 데이터에 입력 되는지 확인 해야 한다.

height/width

matrix

column of the image

colors?

RGB red/green/bue

black = 0

white = 255

Images as input to neural networks

0 ... 255

feature vector

3dimensions: height,width , color

quantization:

color is light, measured by light intensity

fugured out that 8 bits(1byte)

2^3 => 0 ~ 255

=> 500 x 500의 이미지는 얼마 만큼의 space를 찾이하는가 ?

500 x 500 x 3 x 8 = 6 million bits

jepg allows us to compress images

Hex Colors

each byte( 8 bits)

Grayscale Images : not have color

2- D array (height, width)

black = 0 , white = 255

only be a white and black

plt.imshow()

plt.imshow( , cmap ='gray')

Images as input to neural networks

0...1 => 사이가 편하다.

Another exception

VGG

images are centered around 0, but the range is still 256

Images as input to neural networks

N = #samples, D = #features

input X of shape NxD

A single image is HxWxC

N x HxWxC

Image to Feature Vector

reshape() or view()

NxD array

28. Code Preparation (ANN)

1. load int the data

MNIST dataset ->handwrite

2. build the model

3. train the model

4. evaluate the model

5. make the predictions

pytorch load MNIST

step1. load in the data -> pytorch library

grascale => 28x28

train_dataset = torchvision.datasets.MNIST(

root = '.',

train = True,

download = True)

x_train = train_dateset.data

y_train = train_dateset.targets

x_train.shape = N x 28 x 28

y_train.shape = N

n = 60,000

test_dataset = torchvision.datasets.MNIST(

root = '.',

train = False,

download = True)

x_test = test_dataset.data

y_test = test_dataset.targets

x_test.shape = Ntest x 28 x 28

y_test.shape = Ntest

Ntest = 1,000

trainsforming the data

# reshape the input -> small range

inputs = inputs.view(-1, 784)

step 2. model

model = nn.Sequential(

nn.Linear(784, 128),

nn.ReLU(),

nn.Linear(128, 10)

)

10-> classification 결과

step 3. trian the model

batch gradient Descent

for epoch in range(epochs):

for x_batch, y_batch in batches(X,Y , batch_size = 128): => batch_size로 나누어서 학습 한다.

train(x_batch, y_batch)

Batch Gradient Descent in pytTorch

train_loader = torch.utils.data.DataLoader(dataset = train_dataset, batch_siz e= batch_size, shuffle = True)

for epoch in range(epochs):

for inputs, targets in train_loader:

optimizer.zero_grad()

ramdom sample

step 4/5

n_correct = 0

n_total = 0

for inputs, targets in train_loader:

output = model(inputs)

acc = n_correct/ n_total

_, predictions = torch.max(outputs, 1)

29. ANN for Image Classification

import torch
import torch.nn as nn
import torchvision
import torchvision.transforms as transforms
import numpy as np
import matplotlib.pyplot as plt

train_dataset = torchvision.datasets.MNIST(
  root = '.',
  train = True, 
  transform = transforms.ToTensor(),
  download = True)

train_dataset.data

train_dataset.data.max()

train_dataset.data.shape

train_dataset.targets

이미 다운로드 되여서 다운로드 하지는 않는다.

train_dataset = torchvision.datasets.MNIST(
  root = '.',
  train = True, 
  transform = transforms.ToTensor(),
  download = True)

model = nn.Sequential(
 nn.Linear(784, 128),
 nn.ReLU(),
 nn.Linear(128, 10)
)
# no need for final softmax!

gpu를 사용 여부 확인하면서 있을 경우 사용한다.

속도와 관련 있다.

device = torch.device("cuda:0" if torch.cuda.is_available() else 'cpu')
print(device)
model.to(device)

loss and optimizer

ctriterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters())

batch_size = 128
train_loader = torch.utils.data.DataLoader(dataset = train_dataset, batch_size = batch_size, shuffle = True)
test_loader = torch.utils.data.DataLoader(dataset = test_dataset, batch_size = batch_size, shuffle = False)

tmp_loader = torch.utils.data.DataLoader(dataset = train_dataset, batch_size = 1, shuffle = True)
tmp_loader

for x,y in tmp_loader:
    print(x)
    print(x.shape)
    print(y.shape)
    break

train_dataset.transform(train_dataset.data.numpy()).max()

epochs= 10

train_losses = np.zeros(epochs)
test_losses = np.zeros(epochs)

for epoch in range(epochs):
  train_loss = []
  for inputs, targets in train_loader:
    inputs, targets = inputs.to(device) , targets.to(device)

    inputs = inputs.view(-1, 784)
    optimizer.zero_grad()

    outputs = model(inputs)
    loss = ctriterion(outputs, targets)

    loss.backward()
    optimizer.step()

    train_loss.append(loss.item())
  
  train_loss = np.mean(train_loss)

  test_loss = []
  for inputs, targets in test_loader:
    inputs, targets = inputs.to(device) , targets.to(device)

    inputs = inputs.view(-1, 784)
    optimizer.zero_grad()

    outputs = model(inputs)
    loss = ctriterion(outputs, targets)

    test_loss.append(loss.item())
  
  test_loss = np.mean(test_loss)

  train_losses[epoch] = train_loss
  test_losses[epoch] = test_loss
  print(f'Epoch {epoch+1} / {epochs} , train loss : {train_loss:.4f} , Test loss: {test_loss:.4f}')

plt.plot(train_losses, label ='train loss')
plt.plot(test_losses, label = 'test loss')
plt.legend()
plt.show()

n_correct = 0.
n_total = 0.
for inputs, targets in train_loader:
  inputs, targets = inputs.to(device), targets.to(device)
  inputs = inputs.view(-1, 784)
  outputs = model(inputs)
  _, predictions = torch.max(outputs, 1)
  n_correct += (predictions == targets).sum().item()
  n_total+= targets.shape[0]
train_acc = n_correct/ n_total
  
n_correct = 0.
n_total = 0.
for inputs, targets in test_loader:
  inputs, targets = inputs.to(device), targets.to(device)
  inputs = inputs.view(-1, 784)
  outputs = model(inputs)
  _, predictions = torch.max(outputs, 1)
  n_correct += (predictions == targets).sum().item()
  n_total+= targets.shape[0]
test_acc = n_correct/ n_total
print(f"Train acc: {train_acc:.4f} , Test acc:{test_acc:.4f}")

from sklearn.metrics import confusion_matrix
import numpy as np
import itertools
def plot_confusion_matrix(cm, classes, normalize = False, title =' Confusion matrix', cmap = plt.cm.Blues):
  if normalize:
    cm = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis]
    print("Normalized confusion matrix")
  else:
    print("confusion_matrix: without Normalized")
  print(cm)
  plt.imshow(cm, interpolation='nearest', cmap=cmap)
  plt.title(title)
  plt.colorbar()
  tick_marks = np.arange(len(classes))
  plt.xticks(tick_marks, classes, rotation = 45)
  plt.yticks(tick_marks, classes)

  fmt ='.2f' if normalize else 'd'
  thresh = cm.max()/2
  for i, j in itertools.product(range(cm.shape[0]), range(cm.shape[1])):
    plt.text(j, i, format(cm[i,j], fmt), horizontalalignment="center" , color="white" if cm[i, j]> thresh else 'black')
  
  plt.tight_layout()
  plt.ylabel('True label')
  plt.xlabel('Predicted label')
  plt.show()

x_test = test_dataset.data.numpy()
y_test = test_dataset.targets.numpy()
p_test = np.array([])
for inputs, targets in test_loader:
  inputs = inputs.to(device)

  inputs = inputs.view(-1, 784)
  
  outputs = model(inputs)
  _, predictions = torch.max(outputs, 1)
  p_test = np.concatenate((p_test, predictions.cpu().numpy()))

cm = confusion_matrix(y_test, p_test)
plot_confusion_matrix(cm, list(range(10)))

결과가 안같은 것 보여주기

misclassified_idx = np.where(p_test != y_test)[0]
i = np.random.choice(misclassified_idx)
plt.imshow(x_test[i], cmap ='gray')
plt.title("True label: %s Predicted: %s" % (y_test[i], int(p_test[i])))

30. ANN for Regression

pytorch regression

import torch
import torch.nn as nn
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D

N = 1000
X = np.random.random((N,2)) * 6 -3
y = np.cos(2 * X[:,0]) + np.cos(3*X[:,1])

fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.scatter(X[:,0] , X[:,1] , y)

notebook과 다른점은 plt.show()할 필요 없다.

모델 생성하기

#build the model

model = nn.Sequential(
    nn.Linear(2, 128),
    nn.ReLU(),
    nn.Linear(128, 1)
)

criterion = nn.MSELoss()
optimizer = torch.optim.Adam(model.parameters(), lr = 0.01)

def full_gd(model, criterion, optimizer, X_train, y_train, epochs= 1000):
  train_losses = np.zeros(epochs)

  for epoch in range(epochs):
    optimizer.zero_grad()

    outputs = model(X_train)
    loss = criterion(outputs, y_train)

    loss.backward()
    optimizer.step()

    train_losses[epoch] = loss.item()

    if(epoch+1) % 50 == 0:
      print(f'Epoch {epoch+1}/{epochs}, Train loss:{loss.item():.4f} ')

  return train_losses

X_train = torch.from_numpy(X.astype(np.float32))
y_train = torch.from_numpy(y.astype(np.float32).reshape(-1,1))
train_lossses = full_gd(model, criterion, optimizer, X_train, y_train)

plt.plot(train_losses)

fig = plt.figure()
ax = fig.add_subplot(111, projection = "3d")
ax.scatter(X[:,0] , X[:,1] , y)

with torch.no_grad():
  line = np.linspace(-3, 3, 50)
  XX, yy = np.meshgrid(line, line)
  Xgrid = np.vstack((XX.flatten(), yy.flatten())).T
  Xgrid_torch = torch.from_numpy(Xgrid.astype(np.float32))
  yhat = model(Xgrid_torch).numpy().flatten()
  ax.plot_trisurf(Xgrid[:, 0] , Xgrid[:, 1] , yhat, linewidth = 0.2 , antialiased = True)
  plt.show()

아래 그림은 더 크게 만들어준다.

fig = plt.figure()
ax = fig.add_subplot(111, projection = "3d")
ax.scatter(X[:,0] , X[:,1] , y)

with torch.no_grad():
  line = np.linspace(-5, 5, 50)
  XX, yy = np.meshgrid(line, line)
  Xgrid = np.vstack((XX.flatten(), yy.flatten())).T
  Xgrid_torch = torch.from_numpy(Xgrid.astype(np.float32))
  yhat = model(Xgrid_torch).numpy().flatten()
  ax.plot_trisurf(Xgrid[:, 0] , Xgrid[:, 1] , yhat, linewidth = 0.2 , antialiased = True)
  plt.show()

'교육동영상 > 02. pytorch: Deep Learning' 카테고리의 다른 글

06-2. Recurrent Neural Networks, Time Series, and Sequence Data (0)	2020.12.14
06. Recurrent Neural Networks, Time Series, and Sequence Data (0)	2020.11.20
05. Convolutional Nerual Networks (0)	2020.11.19
03. Machine Learning and Neurons (0)	2020.11.16
01. Introduction 02. Google Colab (0)	2020.11.14

NAIAHD