
아래 내용은 Udemy에서 Pytorch: Deep Learning and Artificial Intelligence를 보고 정리한 내용이다.

Reinforcement Learning 강화학습

supervised/unsupervised learning


image classifier just function


supervised Learning is like a static function

image를 입력하면 결과가 나온다.



Reinforcement Learning : loop

loop와 비슷하다.

we want to achieve a goal

future plan





environment: the computer program that implements the tic-tac-toe game




game.is_over() ->누가 이겼는지 computer agent

X O  
O   X




agent, environment, episode

states, actions, rewards

state = for each location

action = where to place the next X/ O

Reward = a number, received at each step of the game


Example : Maze


breakout: states actions rewards

states:  what we observe in the environment, but can be derived from current and past observations

actions: just the different buttons/ inputs on the joystick or control pad

spaces:  state spaces





states actions rewards Policies

states: discrete or continuous



policy as a function or mapping


the markov assumption

1. 예를 들어 내일 날씨는 rainy, sunny ,or cloudy 로 예측한다.

the markov assumption는 that tomorrow's weather depends only on today's weather, but not on any earlier day

어제 날씨와 관련만 되여있다.

2. 다음단어를 예측할 때 


Bellman equation

prediction vs. control.



'교육동영상 > 02. pytorch: Deep Learning' 카테고리의 다른 글

13. VIP  (0) 2021.01.08
12. Stock Trading Project with Deep Reinforcement Learning  (0) 2021.01.06
10. GANs  (0) 2020.12.28
09. Transfer Learning  (0) 2020.12.23
08. Recommender System  (0) 2020.12.21

+ Recent posts