Do Differentiable Simulators Give Better Policy Gradients?

"Do Differentiable Simulators Give Better Policy Gradients?" by H.J. Terry Suh, Max Simchowitz, Kaiqing Zhang, Russ Tedrake

Abstract:

Differentiable simulators promise faster computa- tion time for reinforcement learning by replacing zeroth-order gradient estimates of a stochastic objective with an estimate based on first-order gradients. However, it is yet unclear what fac- tors decide the performance of the two estimators on complex landscapes that involve long-horizon planning and control on physical systems, despite the crucial relevance of this question for the util- ity of differentiable simulators. We show that characteristics of certain physical systems, such as stiffness or discontinuities, may compromise the efficacy of the first-order estimator, and ana- lyze this phenomenon through the lens of bias and variance. We additionally propose an α-order gra- dient estimator, with α ∈ [0, 1], which correctly utilizes exact gradients to combine the efficiency of first-order estimates with the robustness of zero- order methods. We demonstrate the pitfalls of traditional estimators and the advantages of the α-order estimator on some numerical examples.

robotics, differentiable-simulators, reinforcement-learning

Discussion

Real name:

E-Mail:

Website:

Enter your comment. Wiki syntax is allowed:

Please fill all the letters into the box to prove you're human.

Please keep this field empty:

Subscribe to comments

𝐶𝑎𝑟𝑎𝑚𝑖𝑧𝑎𝑟𝑢'𝑠 𝐵𝑙𝑜𝑔

Sidebar

Do Differentiable Simulators Give Better Policy Gradients?

Discussion

𝐶𝑎𝑟𝑎𝑚𝑖𝑧𝑎𝑟𝑢'𝑠 𝐵𝑙𝑜𝑔

User Tools

Site Tools

Sidebar

Do Differentiable Simulators Give Better Policy Gradients?

Discussion

Page Tools