Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gradient of non-smooth functions #129

Closed
CarloLucibello opened this issue Dec 30, 2020 · 7 comments · Fixed by #130
Closed

gradient of non-smooth functions #129

CarloLucibello opened this issue Dec 30, 2020 · 7 comments · Fixed by #130

Comments

@CarloLucibello
Copy link

CarloLucibello commented Dec 30, 2020

Hi,
we are using FiniteDifferences in NNlib.jl to validate the automatic derivative compute through Zygote of some operators, like ReLU and maxpool, that contain a few singularities.
Since we test on a few random inputs, if FD evaluates the function in a small neighborhood of the input the FD estimate should be correct and not affected by the singularities with high probability. What we see instead is that sometimes the estimate is off.
I couldn't figure out from the docs how to select a small grid spacing or something along the lines, I tried with

fdm = FiniteDifferenceMethod(1e-3 .* [-2, -1, 0, 1, 2], 1) 

but with no observable improvement (even worse, I get some cholesky factorization errors If I remember correctly).
Any help with this would be appreciated.

Best,
Carlo

@willtebbutt
Copy link
Member

Hi Carlo,

Glad to hear that you're using FiniteDifferences in NNlib :)

@wesselb is the best qualified person to help with the issue of custom grids etc, but I have a couple of thoughts.

In particular regarding singularities in the derivatives, we've also struggled with that. My usual strategy in the case of scalar functions is to try and make sure that I don't cross them either by using forward_fdm or reverse_fdm to control where the points land, so that might help you in that case. We've also found that it's generally best to pick the points carefully rather than at random, so try and avoid the discontinuities in the primal.

Regarding the cholesky: how are you currently trying to test it?

@wesselb
Copy link
Member

wesselb commented Dec 30, 2020

Hi @CarloLucibello,

Here a simple example of using forward_fdm, which @willtebbutt mentioned.

julia> central_fdm(5, 1)(log, 1e-3)
ERROR: DomainError with -0.004694116537609249:
log will only return a complex result if called with a complex argument. Try log(Complex(x)).

This fails, because central_fdm evaluates log on both sides of 1e-3, which happens to evaluate log at a negative number. Instead of using central_fdm, you can use forward_fdm, which only evaluates log with numbers larger or equal to 1e-3:

julia> forward_fdm(5, 1)(log, 1e-3)
999.9999999746666

If the case isn't so clear cut and you just want to evaluate the function near the derivative, I would recommend to turn off adaptation and limit the step size to something appropriately small:

julia> central_fdm(12, 1, adapt=0)(log, 1e-3, max_step=1e-5)
1000.0000000000076

I notice that, currently, adaptation doesn't respect max_step, which is something I'll look into immediately.

@CarloLucibello
Copy link
Author

Thank you both for the quick responses.
Since I'm writing a generic testing function I'd like to go with the second approach,
but FiniteDifferences.grad doesn't support the max_step keyword. Is there any workaround?

@wesselb
Copy link
Member

wesselb commented Dec 31, 2020

Hmm, what I can come up with right now is something like the following:

julia> grad((f, x) -> central_fdm(8, 1)(f, x, 1e-4), sum, ones(2))
([0.9999999999990572, 0.9999999999990572],)

This fixes the step size to 1e-4, which may be suboptimal.

You're right that keyword arguments cannot be passed to grad, which is a missing feature. I'm currently thinking about a PR that fixes that.

@wesselb
Copy link
Member

wesselb commented Jan 8, 2021

@CarloLucibello Once #130 is merged, you can limit how far the function is evaluated away from x with max_range. Here's an example:

julia> m = central_fdm(5, 1, max_range=1e-4);

julia> grad(m, sum, ones(5))
([1.0000000000004334, 1.0000000000004334, 1.0000000000004334, 1.0000000000004334, 1.0000000000004334],)

@CarloLucibello
Copy link
Author

that's great, thanks! For the grad case, max_range represents the maximum excursion in any coordinate, right? So, you are guaranteed to evaluate within an L_{\infty} ball of radius max_range centered at x

@wesselb
Copy link
Member

wesselb commented Jan 8, 2021

@CarloLucibello Yes, that's exactly right.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants