오승상 강화학습 23 Actor-Critic method
Seungsang Oh・2 minutes read
Professor Seungsang Oh teaches deep learning and reinforcement learning at Korea University with his DL math team, focusing on the Actor Critic Method to enhance learning performance and enable online learning through boosted critiques and specific parameter updates in the Critic and Actor Networks. Different variations of the method, like Q value actor critic and Advantage Actor Criticism, involve coding the Critic Network using Monte Carlo or temporal difference targets to improve tracking technology and overall learning efficiency.
Insights
- The Actor Critic Method in deep reinforcement learning involves two key networks: a value function network and a follower network. This method aims to boost learning performance through a boosted critique approach rather than traditional returns, allowing for faster online learning by sharing parameters between the critique and actor networks.
- Implementing the Actor Critic Method requires coding the Critic Network with either a Monte Carlo target or a temporal difference target, where the latter involves adjusting the return value while keeping the formula mostly unchanged. Additionally, the method introduces a discount factor, gamma, to enhance tracking capabilities and updates weights in both networks using specific parameters like Alpha and Beta, crucial for the learning process.
Get key ideas from YouTube videos. It’s free
Recent questions
What does Professor Seungsang Oh teach?
Deep learning and reinforcement learning
What is the Actor Critic Method?
A method for deep reinforcement learning
How does the Actor Critic Method update weights?
Using specific parameters like Alpha and Beta
What is the purpose of the value function network?
To approximate the value function
What is the significance of the discount factor in the Actor Critic Method?
To improve tracking technology
Related videos
Code Bullet
A.I. Learns to DRIVE
Digital Daru
What is Reinforcement Learning in Machine Learning Hindi
ootb STUDIO
[EN/JP] 철학과만의 말싸움 이기는 법 [고려대 철학과] | 전과자 ep.35
Andrej Karpathy
The spelled-out intro to neural networks and backpropagation: building micrograd
Stanford Online
Overview Artificial Intelligence Course | Stanford CS221: Learn AI (Autumn 2019)