Policy gradient for black-box optimization

Policy gradient method are widely used in the Reinforcement Learning settings. In this post we build policy gradient from the ground up, starting from the easier static scenario first, where we maximize a reward function {r} depending solely on our control variable {x}. In subsequent posts, we will turn our attention to the contextual bandit setting, where the reward also depends on a “state” that evolves. Finally, we will turn to the “full-blown” Reinforcement Learning scenario, where state evolves endogenously, as a function of the control variable.

Blogpost Link

Policy-gradient, optimization, Lorenzo-Maggi, 2023

Twitter
Facebook
Google+
LinkedIn
Pinterest
Tumblr
Reddit
Taringa
StumbleUpon
Telegram
Hacker News
Xing
Vk
Email