What is the Actor Critic Method?

A method for deep reinforcement learning

How does the Actor Critic Method update weights?

Using specific parameters like Alpha and Beta

What is the purpose of the value function network?

To approximate the value function

오승상 강화학습 23 Actor-Critic method

Seungsang Oh・2 minutes read

Professor Seungsang Oh teaches deep learning and reinforcement learning at Korea University with his DL math team, focusing on the Actor Critic Method to enhance learning performance and enable online learning through boosted critiques and specific parameter updates in the Critic and Actor Networks. Different variations of the method, like Q value actor critic and Advantage Actor Criticism, involve coding the Critic Network using Monte Carlo or temporal difference targets to improve tracking technology and overall learning efficiency.

Insights

The Actor Critic Method in deep reinforcement learning involves two key networks: a value function network and a follower network. This method aims to boost learning performance through a boosted critique approach rather than traditional returns, allowing for faster online learning by sharing parameters between the critique and actor networks.
Implementing the Actor Critic Method requires coding the Critic Network with either a Monte Carlo target or a temporal difference target, where the latter involves adjusting the return value while keeping the formula mostly unchanged. Additionally, the method introduces a discount factor, gamma, to enhance tracking capabilities and updates weights in both networks using specific parameters like Alpha and Beta, crucial for the learning process.

Get key ideas from YouTube videos. It’s free

Summary

00:00

Enhancing Learning with Actor Critic Method

Professor Seungsang Oh teaches deep learning and reinforcement learning at Korea University.
He leads a DL math team called Deep Learning and Mathematics.
The actor critical method is crucial in deep reinforcement learning.
The method involves approximating the value function and the follower network.
The Actor Critic Method includes two networks: a value function network and a follower network.
The Actor Critic Method aims to enhance learning performance and enable online learning.
The method utilizes a boosted critique instead of traditional returns for faster learning.
The Actor Critic Method consists of a critique network and an actor network, often sharing parameters.
Different variations of the method include Q value actorcritic and Advantage Actor Criticism.
Implementing the method involves coding the Critic Network using either a Monte Carlo target or a temporal difference target.

17:39

"Updating Palosi Gradeon with Temporal Difference"

The value used for updating Palosi Gradeon is set separately as delta, pre-calculated, and then multiplied by the remaining value during the actual update process.
The temporal difference target is altered by changing the return to a different value, while the rest of the formula remains the same, with the update now being Palasi.
The discount factor, referred to as gamma, is introduced in the formula, multiplying it to improve tracking technology.
In the Actor Critic method, the weight is updated in both the Critic Network and Actor Network using specific parameters, such as Alpha for Palosi Greiden and Beta for the Critic Network, with the learning method for Temporal Difference Aggregate being the weight update process.

오승상 강화학습 23 Actor-Critic method

Insights

Get key ideas from YouTube videos. It’s free

Recent questions

Related videos

A.I. Learns to DRIVE

What is Reinforcement Learning in Machine Learning Hindi

[EN/JP] 철학과만의 말싸움 이기는 법 [고려대 철학과] | 전과자 ep.35

The spelled-out intro to neural networks and backpropagation: building micrograd

Overview Artificial Intelligence Course | Stanford CS221: Learn AI (Autumn 2019)

Summary

Enhancing Learning with Actor Critic Method

"Updating Palosi Gradeon with Temporal Difference"

Try it yourself — It’s free.