gradient of non-smooth functions #129

CarloLucibello · 2020-12-30T06:55:27Z

Hi,
we are using FiniteDifferences in NNlib.jl to validate the automatic derivative compute through Zygote of some operators, like ReLU and maxpool, that contain a few singularities.
Since we test on a few random inputs, if FD evaluates the function in a small neighborhood of the input the FD estimate should be correct and not affected by the singularities with high probability. What we see instead is that sometimes the estimate is off.
I couldn't figure out from the docs how to select a small grid spacing or something along the lines, I tried with

fdm = FiniteDifferenceMethod(1e-3 .* [-2, -1, 0, 1, 2], 1)

but with no observable improvement (even worse, I get some cholesky factorization errors If I remember correctly).
Any help with this would be appreciated.

Best,
Carlo

willtebbutt · 2020-12-30T10:24:18Z

Hi Carlo,

Glad to hear that you're using FiniteDifferences in NNlib :)

@wesselb is the best qualified person to help with the issue of custom grids etc, but I have a couple of thoughts.

In particular regarding singularities in the derivatives, we've also struggled with that. My usual strategy in the case of scalar functions is to try and make sure that I don't cross them either by using forward_fdm or reverse_fdm to control where the points land, so that might help you in that case. We've also found that it's generally best to pick the points carefully rather than at random, so try and avoid the discontinuities in the primal.

Regarding the cholesky: how are you currently trying to test it?

wesselb · 2020-12-30T11:13:26Z

Hi @CarloLucibello,

Here a simple example of using forward_fdm, which @willtebbutt mentioned.

julia> central_fdm(5, 1)(log, 1e-3)
ERROR: DomainError with -0.004694116537609249:
log will only return a complex result if called with a complex argument. Try log(Complex(x)).

This fails, because central_fdm evaluates log on both sides of 1e-3, which happens to evaluate log at a negative number. Instead of using central_fdm, you can use forward_fdm, which only evaluates log with numbers larger or equal to 1e-3:

julia> forward_fdm(5, 1)(log, 1e-3)
999.9999999746666

If the case isn't so clear cut and you just want to evaluate the function near the derivative, I would recommend to turn off adaptation and limit the step size to something appropriately small:

julia> central_fdm(12, 1, adapt=0)(log, 1e-3, max_step=1e-5)
1000.0000000000076

I notice that, currently, adaptation doesn't respect max_step, which is something I'll look into immediately.

CarloLucibello · 2020-12-30T11:43:29Z

Thank you both for the quick responses.
Since I'm writing a generic testing function I'd like to go with the second approach,
but FiniteDifferences.grad doesn't support the max_step keyword. Is there any workaround?

wesselb · 2020-12-31T13:40:04Z

Hmm, what I can come up with right now is something like the following:

julia> grad((f, x) -> central_fdm(8, 1)(f, x, 1e-4), sum, ones(2))
([0.9999999999990572, 0.9999999999990572],)

This fixes the step size to 1e-4, which may be suboptimal.

You're right that keyword arguments cannot be passed to grad, which is a missing feature. I'm currently thinking about a PR that fixes that.

wesselb · 2021-01-08T15:00:15Z

@CarloLucibello Once #130 is merged, you can limit how far the function is evaluated away from x with max_range. Here's an example:

julia> m = central_fdm(5, 1, max_range=1e-4);

julia> grad(m, sum, ones(5))
([1.0000000000004334, 1.0000000000004334, 1.0000000000004334, 1.0000000000004334, 1.0000000000004334],)

CarloLucibello · 2021-01-08T15:08:42Z

that's great, thanks! For the grad case, max_range represents the maximum excursion in any coordinate, right? So, you are guaranteed to evaluate within an L_{\infty} ball of radius max_range centered at x

wesselb · 2021-01-08T15:09:34Z

@CarloLucibello Yes, that's exactly right.

wesselb mentioned this issue Jan 2, 2021

Improve Robustness and Performance #130

Merged

11 tasks

wesselb closed this as completed in #130 Jan 10, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gradient of non-smooth functions #129

gradient of non-smooth functions #129

CarloLucibello commented Dec 30, 2020 •

edited

Loading

willtebbutt commented Dec 30, 2020

wesselb commented Dec 30, 2020

CarloLucibello commented Dec 30, 2020

wesselb commented Dec 31, 2020 •

edited

Loading

wesselb commented Jan 8, 2021

CarloLucibello commented Jan 8, 2021

wesselb commented Jan 8, 2021

gradient of non-smooth functions #129

gradient of non-smooth functions #129

Comments

CarloLucibello commented Dec 30, 2020 • edited Loading

willtebbutt commented Dec 30, 2020

wesselb commented Dec 30, 2020

CarloLucibello commented Dec 30, 2020

wesselb commented Dec 31, 2020 • edited Loading

wesselb commented Jan 8, 2021

CarloLucibello commented Jan 8, 2021

wesselb commented Jan 8, 2021

CarloLucibello commented Dec 30, 2020 •

edited

Loading

wesselb commented Dec 31, 2020 •

edited

Loading