-
Notifications
You must be signed in to change notification settings - Fork 2
Multivariate Poisson as marked #51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅ 📢 Thoughts on this report? Let us know! |
|
The code coverage is worse than before because of the twin implementation of MultivariatePoissonProcess |
gdalle
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, and sorry for the review! It needs a few improvements but you can probably remove duplicate code now
src/poisson/poisson_process.jl
Outdated
| struct PoissonProcess{R<:Real,D} <: AbstractPointProcess | ||
| λ::R | ||
| mark_dist::D | ||
| PoissonProcess{R,D}(λ::R, mark_dist::D) where {R,D} = new{R,D}(λ, mark_dist) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This constructor is not needed, it is defined by default
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once again, I adapted it from Distributions.jl. If I understand correctly it is needed so that the (Any, Any) constructor is not defined by default. If I remove it, then julia gives a warning of method redefinition at the first constructor line 31.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I replaced this old inner constructor by the one that checks if the intensity is positive.
src/poisson/poisson_process.jl
Outdated
| function PoissonProcess(λ::Integer, mark_dist; check_args::Bool=true) | ||
| return PoissonProcess(float(λ), mark_dist; check_args=check_args) | ||
| end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I adapted what is done for Exponential in Distributions.jl
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The design of Distributions.jl leaves plenty to be desired and is fairly old, so it shouldn't be a universal reference. Let's avoid automatic conversions for the sake of simplicity
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK
src/poisson/poisson_process.jl
Outdated
| # TODO: Replace PoissonProcess{R,Categorical{R,Vector{R}}} by | ||
| # MultivariatePoissonProcess{R} | ||
| # when everything else is OK | ||
| function intensity_vector(pp::PoissonProcess{R,Categorical{R,Vector{R}}}) where {R} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this generalizable to other kinds of processes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also it needs to be documented since it is public
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean that it could be a function defined for some kind of "AbstractMultivariatePointProcess" ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or at least document it for the Poisson process
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a docstring
src/poisson/poisson_process.jl
Outdated
| return log(ground_intensity(pp)) + logdensityof(mark_distribution(pp), m) | ||
| end | ||
|
|
||
| ### Conversions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once again, I adapted this from Distributions.jl. I would say that it is necessary as soon as you have the automatic conversion from integer to float (one of your other comments).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's remove them and stay simple. If a user tries to create a Poisson process with an integer intensity and it errors, that's on them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK
b7b5b3c to
cd7bbc3
Compare
src/poisson/poisson_process.jl
Outdated
| `MultivariatePoissonProcess{R}` is simply a type alias for `PoissonProcess{R,Categorical{R,Vector{R}}}`. | ||
| """ | ||
| const MultivariatePoissonProcess{R<:Real} = PoissonProcess{R,Categorical{R,Vector{R}}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just realized that the automatic conversion made by Distributions.jl (related to other discussions above) makes, for instance, that PoissonProcess(Int[1,1]) isa MultivariatePoissonProcess{Int} is false.
If this "problem" happens only in the case of integers intensities, we could just don't care, but it may also arises to other Real's, no ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Distributions.jl has a lot of issues related to types. For my package HiddenMarkovModels.jl, I actually rolled out my own version of Categorical, only to avoid them. There was a project called MeasureTheory.jl to rebuild Distributions.jl on a saner basis, but I think it's dead at the moment.
Either we decide that type issues are not our fault if they come from Distributions.jl, or we fix them ourselves. I'm not sure which is the best approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A possibility would be to write
| const MultivariatePoissonProcess{R<:Real} = PoissonProcess{R,Categorical{R,Vector{R}}} | |
| const MultivariatePoissonProcess{R<:Real} = PoissonProcess{R,Categorical{T,Vector{T}}} where T |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The suggestion above is not satisfactory but it makes MultivariatePoissonProcess{R} a UnionAll and not a DataType.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From what I understand from Categorical in Distributions.jl, the type parameter is actually associated with the type of the probabilities in the probability vector. So, the type must be an AbstractFloat, since it is a number between 0 and 1. Except, for example, in the case Categorical([1, 0]), this returns a Categorical{Int64, Vector{Int64}}.
If the intensities are integers, the type for Categorical will still be an AbstratFloat, since it will be the type of the resulting probabilities. What seems to work is
const MultivariatePoissonProcess{R<:Real} = PoissonProcess{R,Categorical{F,Vector{F}}} where {F<:AbstractFloat}There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is an other suggestion :
const MultivariatePoissonProcess{R<:Real, F} = PoissonProcess{R,Categorical{F,Vector{F}}}Hence, this is a
DataType.The drawback is that you need to provide e.g.
MultivariatePoissonProcess{Float32, Float32}instead ofMultivariatePoissonProcess{Float32}to thefitfunction.
Don't we need to define a separate fit method anyway (or, at least, suffstats)? From what I understand, MultivariatePoissonProcess is for non-marked events, so when we call fit with some vector of histories, we cannot use the marks in the histories, we need build the marks so that each event time has a mark corresponding to which history it came from.
If that is the case, couldn't we just define the method fit(pp::Type{<:MultivariatePoissonProcess{R}}, hists::Vector{<:History}) where {R<:Real}, instead of using MultivariatePoissonPoisson{R,F}? Or have both, but the first just dispatches to the second one with F = float(R)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now, fit is implemented by providing the marks, even for multivariate processes. That is one of the reasons why I tried to unify multivariate and marked frameworks with this PR.
FYI, the "main" fit method is :
function StatsAPI.fit(
::Type{PoissonProcess{R,D}}, ss::PoissonProcessStats{R1,R2}
) where {R<:Real,D,R1<:Real,R2<:Real}
λ = convert(R, ss.nb_events / ss.duration)
mark_dist = fit(D, ss.marks, ss.weights)
return PoissonProcess(λ, mark_dist)
endLeveraging on your idea, I would suggest to add this method
function StatsAPI.fit(
::Type{MultivariatePoissonPoisson{R}}, ss::PoissonProcessStats{R1,R2}
) where {R<:Real,R1<:Real,R2<:Real}
λ = convert(R, ss.nb_events / ss.duration)
mark_dist = fit(Categorical, ss.marks, ss.weights)
return PoissonProcess(λ, mark_dist)
endHere, we let Distributions.jl chose the concrete Categorical type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now, fit is implemented by providing the marks, even for multivariate processes. That is one of the reasons why I tried to unify multivariate and marked frameworks with this PR.
Makes sense. We can make a convenience function afterwards to build a mulitvariate history, if we see fit.
My take is that Distributions.jl will stop you from propagating any fancy types either way, so introducing a distinction between the type you want (R) and the type you get after conversion (F) is only giving you false hopes
I was playing around with the fit method for Categorical and it does not let you choose the type anyway. fit(Categorical, [1,2,2,1]) and fit(Categorical{BigFloat, Vector{BigFloat}}, [1,2,2,1]) both return a Categorical{Float64, Vector{Float64}}, so this choice of type parameter is an illusion.
So we could do
function StatsAPI.fit(
::Type{MultivariatePoissonPoisson{R}}, ss::PoissonProcessStats{R1,R2}
) where {R<:Real,R1<:Real,R2<:Real}
return fit(MultivariatePoisson{R, Float64}, ss)
endor just force F to be a Float64 in the constructor, which only affects the precision of the probabilities of each mark, and this problem with fit goes away.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If propagation of fancy types is impossible due to Distributions.jl as @gdalle suggested, I would say that the better and simpler is to force Float64 in the constructor.
I can make a commit that does that if you want.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is the better solution too. Maybe just leave a comment somewhere saying that this is the reason we forced the type, just to let whoever may be working on the code afterwards know about this.
|
@gdalle I think that every conversation can be resolved except the last one concerning |
|
I thought that other parts of the code needed modification but it modifying the alias suffices. gf = ForwardDiff.gradient(f1, 3 * ones(10))does not throw an error. @JoseKling : do you want to handle the merging ? |
|
great. And I can handle it, no problem |
|
Merged manually via the command line in commit 8dc41c3. |
The goal is to merge the two concrete types
MarkedPoissonProcessandMultivariatePoissonProcessas a new concrete type calledPoissonProcess(hence it more or less corresponds to the oldAbstractPoissonProcess)At this stage, the two implementations coexist. In particular, there is the test file
multivariate_as_marked.jlthat checks that the old implementation of multivariate Poisson is equivalent to the new one.To close this PR, it remains to:
src/poisson/abstract_poisson_process.jlandtest/multivariate_as_marked.jl,poisson/multivariateandpoisson/marked,