오승상 강화학습 23 Actor-Critic method

Seungsang Oh17 minutes read

Professor Seungsang Oh teaches deep learning and reinforcement learning at Korea University with his DL math team, focusing on the Actor Critic Method to enhance learning performance and enable online learning through boosted critiques and specific parameter updates in the Critic and Actor Networks. Different variations of the method, like Q value actor critic and Advantage Actor Criticism, involve coding the Critic Network using Monte Carlo or temporal difference targets to improve tracking technology and overall learning efficiency.

Insights

  • The Actor Critic Method in deep reinforcement learning involves two key networks: a value function network and a follower network. This method aims to boost learning performance through a boosted critique approach rather than traditional returns, allowing for faster online learning by sharing parameters between the critique and actor networks.
  • Implementing the Actor Critic Method requires coding the Critic Network with either a Monte Carlo target or a temporal difference target, where the latter involves adjusting the return value while keeping the formula mostly unchanged. Additionally, the method introduces a discount factor, gamma, to enhance tracking capabilities and updates weights in both networks using specific parameters like Alpha and Beta, crucial for the learning process.

Get key ideas from YouTube videos. It’s free

Recent questions

  • What does Professor Seungsang Oh teach?

    Deep learning and reinforcement learning

  • What is the Actor Critic Method?

    A method for deep reinforcement learning

  • How does the Actor Critic Method update weights?

    Using specific parameters like Alpha and Beta

  • What is the purpose of the value function network?

    To approximate the value function

  • What is the significance of the discount factor in the Actor Critic Method?

    To improve tracking technology

Related videos

Summary

00:00

Enhancing Learning with Actor Critic Method

  • Professor Seungsang Oh teaches deep learning and reinforcement learning at Korea University.
  • He leads a DL math team called Deep Learning and Mathematics.
  • The actor critical method is crucial in deep reinforcement learning.
  • The method involves approximating the value function and the follower network.
  • The Actor Critic Method includes two networks: a value function network and a follower network.
  • The Actor Critic Method aims to enhance learning performance and enable online learning.
  • The method utilizes a boosted critique instead of traditional returns for faster learning.
  • The Actor Critic Method consists of a critique network and an actor network, often sharing parameters.
  • Different variations of the method include Q value actorcritic and Advantage Actor Criticism.
  • Implementing the method involves coding the Critic Network using either a Monte Carlo target or a temporal difference target.

17:39

"Updating Palosi Gradeon with Temporal Difference"

  • The value used for updating Palosi Gradeon is set separately as delta, pre-calculated, and then multiplied by the remaining value during the actual update process.
  • The temporal difference target is altered by changing the return to a different value, while the rest of the formula remains the same, with the update now being Palasi.
  • The discount factor, referred to as gamma, is introduced in the formula, multiplying it to improve tracking technology.
  • In the Actor Critic method, the weight is updated in both the Critic Network and Actor Network using specific parameters, such as Alpha for Palosi Greiden and Beta for the Critic Network, with the learning method for Temporal Difference Aggregate being the weight update process.
Channel avatarChannel avatarChannel avatarChannel avatarChannel avatar

Try it yourself — It’s free.