'Stochastic Gradient Descent' 태그의 글 목록

Stochastic Gradient Descent

확률적 경사 하강법_20211229 2021.12.30
15. In Depth: Gradient Descent 2021.01.13

확률적 경사 하강법_20211229

2021. 12. 30. 18:40

확률적 경사 하강법

SGD

Stochastic Gradient Descent

신경망 학습에서는 데이터 일부를 뽑은 이미 배치를 이용해 가중치를 반복해서 업데이트하는 확률적 경사 하강법을 이용합니다.

매개변수의 기울기를 구해 , 기울어진 방향으로 매개변수 값을 갱신하는 일을 몇 번이고 반복해서 점점 최적의 값에 다가갔습니다.

출처 :

처음 배우는 인공지능

밑바닥부터 시작하는 딥러닝

'개념 정리' 카테고리의 다른 글

볼츠만 머신_20211231 (0)	2021.12.31
데이터 정규화_20211230 (0)	2021.12.30
기울기 소실 문제_20211228 (0)	2021.12.28
다층 퍼셉트론_20211227 (0)	2021.12.28
멀티태스킹 학습_20211226 (0)	2021.12.28

15. In Depth: Gradient Descent

2021. 1. 13. 12:08

아래 내용은 Udemy에서 Pytorch: Deep Learning and Artificial Intelligence를 보고 정리한 내용이다.

Gradient Descent

backbone of deep learning

k-means clustering

hidden markov models

matric factorization

Big learning rate

small learning rate

Stochastic Gradient Descent

optimizer = 'sgd'

stochastic

Momentum

SGD

움직이는 것

physics momentum

zigzag

momentum이 없으면 엄청 zigzig가 많이 있다.

momentum이 있으면 빨리 내리온다.

Variable and Adaptive Learning Rates

momentum is nice: huge performance gains, almost no work. 0.9 is usually fine.

learning rate scheduling

#1. step decay

exponentail decay

leaning rate:

너무 크게 해도 안좋고 너무 작게 해도 안좋다.

too slow -> increase learning rate

be careful! may hit a plateau temporarily

AdaGrad:Adaptive Learning Rete Techniques

everything is element-wise

each scalar parameter and its learning rate is updated independently of the others

It has been observed that AdaGrad decreases learning rate too aggressively

RMSProp

Introduced by Geoff Hinton+ team

since cache is growing too fast, let's decrease it on each update:

어떤것이 맞는지도 알 수 가 없다.

major packages have implemented both

Tensorflow initializers cache = 1

Kears initializzers cache = 0

AdaGrad:

at every batch:

cache+= gradient * 2

param = param- learning_rate * gradient/ sqrt(cache_ epsilon)

RMSProp

At every batch:

cache = decay * cache + (1-decay) * gradient ** 2

param = param- learning_rate * gradient/ sqrt(cache_ epsilon)

epsilon = 10 ** -8, 10 ** -9, 10 ** -10 ,etc ..., decay = 0.9, 0.99, 0.999, 0.9999, etc...

Adam optimizer

go-to default thes days

"RMSprop with momentum"

exponentailly-Smoothed Averages

RMS = "root mean square"

Adam:

m and v

'교육동영상 > 02. pytorch: Deep Learning' 카테고리의 다른 글

16. Setting up your Environment (0)	2021.01.14
14. In-Depth: Loss Functions (0)	2021.01.13
13. VIP (0)	2021.01.08
12. Stock Trading Project with Deep Reinforcement Learning (0)	2021.01.06
10. Deep Reinforcement Learning (0)	2021.01.04

PREV 1 NEXT

NAIAHD

Stochastic Gradient Descent

확률적 경사 하강법_20211229

'개념 정리' 카테고리의 다른 글

15. In Depth: Gradient Descent

아래 내용은 Udemy에서 Pytorch: Deep Learning and Artificial Intelligence를 보고 정리한 내용이다.

Gradient Descent

Stochastic Gradient Descent

Momentum

Variable and Adaptive Learning Rates

Adam optimizer

'교육동영상 > 02. pytorch: Deep Learning' 카테고리의 다른 글

+ Recent posts

티스토리툴바