Optimal Control of Double Integrator with Dynamic Programming

I showed in my last blog post how LQR control can stabilize a cart pole around its unstable fixed point.

However, LQR control doesn’t help to swing the cart pole up into the balance point from states which are far away from the fixed point.

The underactuated robotics course discusses energy pumping techniques, to add or remove energy from the system until it reaches the homoclinic orbit (the trajectory which passes through the unstable fixed point) - and when the state is near the fixed point the fixed control scheme is switched to LQR control.

The course shows an alternate approach - using dynamic programming to algorithmically find the time-minimal optimal control from any position in the state-space. The lectures and homework shows an example of the dynamic programming algorithm on the double integrator system, and I replicated this in C# to get confidence that I had understood and implemented the algorithm correctly.

The animations show how the optimal control, and the cost function, are iterated to the converged solution, via value iteration:



One thing that took me a while to figure out, and I didn’t see straight away from the notes, is that as this is an infinite horizon problem, the gamma term is necessary in adding to the previous cost.

A sample trajectory using the analytical solution is shown here: /blog/2016-10-01-optimal-control-of-double-integrator-with-dynamic-programming/images/trajectory_analytical.png Using the dynamic programming approach for a state space discretized in x between -10 and 10 with 49 grid and x_dot discretised between -5 and 5 with 49 grid points gives the following:


The trajectory does reach the set point, but it is skewed from the analytical solution, and the acceleration isn’t at -1 on the approach to the target, but is somewhere between -1 and 0.

This is because of the discretisation, the grid points around the switching surface are too large, so the values are being interpolated where they should be saturated at 1 or -1. Changing the state space to be more focused on this trajectory (x between -5 and 5, and x_dot between -2.5 and 2.5) with 199 points, gives a result much closer to the analytical solution:


I haven’t yet coded this up for the cart pole problem, the core of the algorithm is the same, but with more dimensions. The only real part of the algorithm that changes is the volumetric interpolation. I don’t want to recreate Python’s ndarray in C#. I might look at a C++ implementation using Eigen’s tensor class.

The code is here: https://github.com/taumuon/robotics/tree/master/underactuated_notes/DoubleIntegratorDynamicProgramming

The next topic to investigate is reinforcement learning (the solution in Drake uses a Markov Decision Process).