Abstract:
Differentiable simulators promise faster computa- tion time for reinforcement learning by replacing zeroth-order gradient estimates of a stochastic objective with an estimate based on first-order gradients. However, it is yet unclear what fac- tors decide the performance of the two estimators on complex landscapes that involve long-horizon planning and control on physical systems, despite the crucial relevance of this question for the util- ity of differentiable simulators. We show that characteristics of certain physical systems, such as stiffness or discontinuities, may compromise the efficacy of the first-order estimator, and ana- lyze this phenomenon through the lens of bias and variance. We additionally propose an Ξ±-order gra- dient estimator, with Ξ± β [0, 1], which correctly utilizes exact gradients to combine the efficiency of first-order estimates with the robustness of zero- order methods. We demonstrate the pitfalls of traditional estimators and the advantages of the Ξ±-order estimator on some numerical examples.