Policy gradient for black-box optimization

Policy gradient method are widely used in the Reinforcement Learning settings. In this post we build policy gradient from the ground up, starting from the easier static scenario first, where we maximize a reward function {r} depending solely on our control variable {x}. In subsequent posts, we will turn our attention to the contextual bandit setting, where the reward also depends on a “state” that evolves. Finally, we will turn to the “full-blown” Reinforcement Learning scenario, where state evolves endogenously, as a function of the control variable.

Blogpost Link

Policy-gradient, optimization, Lorenzo-Maggi, 2023

Discussion

Real name:

E-Mail:

Website:

Enter your comment. Wiki syntax is allowed:

Please fill all the letters into the box to prove you're human.

Please keep this field empty:

Subscribe to comments

𝐶𝑎𝑟𝑎𝑚𝑖𝑧𝑎𝑟𝑢'𝑠 𝐵𝑙𝑜𝑔

Sidebar

Policy gradient for black-box optimization

Discussion

𝐶𝑎𝑟𝑎𝑚𝑖𝑧𝑎𝑟𝑢'𝑠 𝐵𝑙𝑜𝑔

User Tools

Site Tools

Sidebar

Policy gradient for black-box optimization

Discussion

Page Tools