Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding Gibbskernel #374

Merged
merged 28 commits into from
Oct 1, 2021
Merged

Conversation

Cyberface
Copy link
Contributor

Summary

This PR begins to implement the non-stationary Gibbs kernel what is being discussed in issue: #372

This is a modification of the standard Squared-Exponential kernel where the lengthscale is no longer constant but is a function of the input parameters.

In this PR I have simply added a function GibbsKernel to evaluate the gibbs kernel as well as added a test to check that when the lengthscale function just returns a constant value of 1.0 then it should be the same as the SqExponentialKernel().

The implementation of the gibbs kernel is probably going to change as I learn more about how your kernels are defined/implemented.

This is my first PR here and I am only just starting to learn julia so I'm sorry for some basic mistakes!
And I appriciate any comments to make this better thanks!

Proposed changes

  • Adds a new file "src/kernels/gibbskernel.jl"
  • include and export this in "src/KernelFunctions.jl"
  • add tests and include the test in "test/runtests.jl"

What alternatives have you considered?

A better implementation would be to use the kernel/transform infrastructure you have and I will work on this after this PR.

Breaking changes

None

Copy link
Member

@willtebbutt willtebbutt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for opening this! It's a very nice well-sized PR.

The basic functionality all seems is present. The main thing that needs addressing is turning the GibbsKernel function into a struct (see comment below) -- other than that it's just formatting-related things and other minor details that need addressing.

@Cyberface
Copy link
Contributor Author

I have updated the unit test - it was a little more confusing and complicated that I originally thought! Hopefully you can follow what I've done through the comments.

Let me know what you think, cheers!

k_gibbs = GibbsKernel(l_func)

# check they are equal
@test k_gibbs(x, y) == k(x, y)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for the test it would be great if you could be as faithful as possible to the latex formula, it makes it easier to see if it's doing the right thing!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not really sure what you mean, could you elaborate or sketch out an idea of what you would like to see?

Thanks!

Copy link
Member

@theogf theogf Sep 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something like

@test k_gibbs(x, y) == sqrt((2 * ell(x) * ell(y)) / (ell(x)^2 + ell(y)^2)) * exp( - norm(x - y)^2 / (ell(x)^2 + ell(y)^2))

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

n.b. probably better to use here rather than ==.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK I see what you mean. I've added your suggestion but actually this test is failing and I'm not sure why!

I don't think I fully understand the implementation in https://github.com/Cyberface/KernelFunctions.jl/blob/gibbskernel/src/kernels/gibbskernel.jl#L37 so will take a look a bit later again.

Copy link
Contributor Author

@Cyberface Cyberface Sep 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure the using norm(x-y) is correct here? Calling this on

x = randn(2)
y = randn(2)
norm(x-y)

This returns a single number - shouldn't this be a 2x2 matrix?

Assuming the norm is from LinearAlgebra ?

Sorry for my confusion!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No no, this the L2 norm! So it's sqrt(sum((x_i - y_i)^2)), this is what we would expect I think

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK I get you now - I thought this was supposed to give the covariance matrix but that is the kernelmatrix function!

This is just the covariance between two points in n-dimensions.

Hopefully I got this correct :)

added a broken test to seek help about why it's broken
Comment on lines 41 to 42
l = hypot(lx, ly)
kernel = (sqrt(2 * lx * ly) / l) * with_lengthscale(SqExponentialKernel(), l)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
l = hypot(lx, ly)
kernel = (sqrt(2 * lx * ly) / l) * with_lengthscale(SqExponentialKernel(), l)
l = hypot(lx, ly) / sqrt(2)
kernel = (sqrt(lx * ly) / l) * with_lengthscale(SqExponentialKernel(), l)

I think this should more correspond to the exact formula

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without promotion to Float64:

Suggested change
l = hypot(lx, ly)
kernel = (sqrt(2 * lx * ly) / l) * with_lengthscale(SqExponentialKernel(), l)
l = invsqrt2 * hypot(lx, ly)
kernel = (sqrt(lx * ly) / l) * with_lengthscale(SqExponentialKernel(), l)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this correct though because l is passed into with_lengthscale - doesn't the SqExponentialKernel already have the 1/2 in the exponential looking here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't it be

l = hypot(lx, ly)
kernel = (sqrt(2 * lx * ly) / l) * with_lengthscale(SqExponentialKernel(), 1 / l^2)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this correct though because l is passed into with_lengthscale - doesn't the SqExponentialKernel already have the 1/2 in the exponential looking here

Yes but in the latex formula the factor 2 is not there, so this way we cancel it out!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the 2 disappears because we already have l(x) + l(y) ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah the 2 would be there when l(x) = l(y) = constant.

I think I finally understand most of the things here but I am confused by with_lengthscale. Can you explain why we shouldn't square the argument to with_lengthscale? Looking at the with_lengthscale code it seems to just multiply by 1/l.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the 2 disappears because we already have l(x) + l(y) ?

I'm not sure what 2 you refer to (and there is no l(x) + l(y) term anywhere 🤔). The 2 in the square root outside of the squared exponential kernel disappears because we already scale the square root of the denominator by 1/sqrt(2).

why we shouldn't square the argument to with_lengthscale

It will be squared by the squared exponential kernel. A squared exponential kernel with lengthscale l is of the form k(x, y) = exp(- (x - y)^2/(2*l^2)).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the l(x) + l(y) term I was refering to is the denominator in the exponential i.e.
exp(-(x - y)^2 / (l(x)^2 + l(y)^2)

And when l(x) = l(y) = l = constant you get exp(-(x - y)^2 / (2l^2) ?

Also thanks for explaining to me about the with_lengthscale

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly, if they are constant you get exp(-(x-y)^2 / (2 * l^2)). Very explicitly, in general the implementation yields

$$k(x, y) = sqrt(l(x) * l(y)) / (sqrt(l(x)^2 + l(y)^2) / sqrt(2)) * exp(- (x - y)^2 / (2 * (sqrt(l(x)^2 + l(y)^2) / sqrt(2))^2)) = sqrt(2 * l(x) * l(y) / (l(x)^2 + l(y)^2)) * exp(- (x - y)^2 / (l(x)^2 + l(y)^2))$$

which is exactly the formula in the docstring (and the other issue).

Comment on lines 7 to 24
ell(x) = exp(sum(sin, x))

# this is the gibbs lengthscale function.
# to check that we can recover the stationary SqExponentialKernel
# with a constant lengthscale we just set this function
# equal to a constant.
l_func(x) = ell(1)

# in order to compare to the Gibbs kernel we compute the
# equivalent lengthscale l(x)^2 + l(y)^2.
# See the denominator of the exponential term in the gibbs kernel.
lengthscale = hypot(l_func(x), l_func(y))

# create a SqExponentialKernel with a lengthscale
k = with_lengthscale(SqExponentialKernel(), lengthscale)

# create a gibbs kernel with our constant lengthscale function
k_gibbs = GibbsKernel(l_func)
Copy link
Member

@theogf theogf Sep 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ell(x) = exp(sum(sin, x))
# this is the gibbs lengthscale function.
# to check that we can recover the stationary SqExponentialKernel
# with a constant lengthscale we just set this function
# equal to a constant.
l_func(x) = ell(1)
# in order to compare to the Gibbs kernel we compute the
# equivalent lengthscale l(x)^2 + l(y)^2.
# See the denominator of the exponential term in the gibbs kernel.
lengthscale = hypot(l_func(x), l_func(y))
# create a SqExponentialKernel with a lengthscale
k = with_lengthscale(SqExponentialKernel(), lengthscale)
# create a gibbs kernel with our constant lengthscale function
k_gibbs = GibbsKernel(l_func)
# this is the gibbs lengthscale function.
ell(x) = exp(sum(sin, x))
# create a gibbs kernel with our specific lengthscale function
k_gibbs = GibbsKernel(ell)

That's more what I meant to use instead of a constant.

bugs ironed out hopefully with lots of help from the devs
@Cyberface
Copy link
Contributor Author

OK I think this PR is basically done?

Copy link
Member

@willtebbutt willtebbutt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy -- once @theogf is happy, we can merge :)

remove space thanks @willtebbutt

Co-authored-by: willtebbutt <[email protected]>
Copy link
Member

@theogf theogf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a comment for the docstring. And could you also add the kernel in the docs (just add the kernel name in the list in docs/kernels.md)?

kernel. The lengthscale parameter ``l`` becomes a function of
position ``l(x)``.

``l(x) = l`` then you recover the standard Squared-Exponential kernel
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
``l(x) = l`` then you recover the standard Squared-Exponential kernel
For a constant function``l(x) = c``, one recovers the standard Squared-Exponential kernel with lengthscale `0.5 c`

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great thanks and that answers my other question about the square root / squaring about with_lengthscale

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The lengthscale is just c, it seems: In this case we get exp(-(x-y)^2/(2*c^2)), both with our implementation and the formula it is based on which corresponds to a lengthscale of c.

thanks @theogf

Co-authored-by: Théo Galy-Fajou <[email protected]>
@Cyberface
Copy link
Contributor Author

Just a comment for the docstring. And could you also add the kernel in the docs (just add the kernel name in the list in docs/kernels.md)?

Yeah sure thing. Shall I add a new section in base kernels called 'non-stationary kernels' ? Or shall I put it in 'Exponential Kernels' ?

@willtebbutt
Copy link
Member

I'd just put it in alphabetical order in the base kernels list. Feels like the appropriate place to me (the base kernels aren't necessarily stationary)

@Cyberface
Copy link
Contributor Author

great thanks - done!

@Cyberface
Copy link
Contributor Author

The PR says there is a request to change something but I'm not sure what it's referring to.

@devmotion
Copy link
Member

The PR says there is a request to change something but I'm not sure what it's referring to.

It was my request to fix the docstring.

Copy link
Member

@devmotion devmotion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor improvements to the docstring.

@theogf
Copy link
Member

theogf commented Oct 1, 2021

Great! All we need now is a version bump. In Project.toml can you change version = "0.10.20" to version = "0.10.21"

@devmotion devmotion merged commit 16f1149 into JuliaGaussianProcesses:master Oct 1, 2021
@devmotion
Copy link
Member

Thanks a lot @Cyberface!

@Cyberface
Copy link
Contributor Author

that's really great! Thanks so much for all your help with this! Next I'll work on getting the kernelmatrix working and eventually getting it so that the gibbs kernel length scale can be modelled with a separate GP and fit/optimise that!

@Cyberface Cyberface deleted the gibbskernel branch October 1, 2021 13:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants