Sidebar

Blog


Blog


https://horea.caramizaru.xyz


โ† Go Back


Search by Tags




Policy gradient for black-box optimization

Policy gradient method are widely used in the Reinforcement Learning settings. In this post we build policy gradient from the ground up, starting from the easier static scenario first, where we maximize a reward function {r} depending solely on our control variable {x}. In subsequent posts, we will turn our attention to the contextual bandit setting, where the reward also depends on a โ€œstateโ€ that evolves. Finally, we will turn to the โ€œfull-blownโ€ Reinforcement Learning scenario, where state evolves endogenously, as a function of the control variable.

Discussion

Enter your comment. Wiki syntax is allowed:
 
feed/2024/11/08/policy_gradient_for_black-box_optimization.txt ยท Last modified: 2024/11/08 10:07 by Horea Caramizaru