The differentiable renderer currently only supports CUDA C++. Add a C++-only implementation to support CPU-only inference. Likely also addresses #10.