Skip to content

Conversation

@jucheval
Copy link
Collaborator

@jucheval jucheval commented Mar 14, 2025

The goal is to merge the two concrete types MarkedPoissonProcess and MultivariatePoissonProcess as a new concrete type called PoissonProcess (hence it more or less corresponds to the old AbstractPoissonProcess)

At this stage, the two implementations coexist. In particular, there is the test file multivariate_as_marked.jl that checks that the old implementation of multivariate Poisson is equivalent to the new one.

To close this PR, it remains to:

  • remove the files src/poisson/abstract_poisson_process.jl and test/multivariate_as_marked.jl,
  • remove and the folders poisson/multivariate and poisson/marked,
  • fix the TODOs (related to aliases which would be in conflict in the current version).

@codecov
Copy link

codecov bot commented Mar 14, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

📢 Thoughts on this report? Let us know!

@jucheval
Copy link
Collaborator Author

The code coverage is worse than before because of the twin implementation of MultivariatePoissonProcess

@jucheval jucheval requested a review from gdalle March 17, 2025 14:45
Copy link
Collaborator

@gdalle gdalle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, and sorry for the review! It needs a few improvements but you can probably remove duplicate code now

struct PoissonProcess{R<:Real,D} <: AbstractPointProcess
λ::R
mark_dist::D
PoissonProcess{R,D}::R, mark_dist::D) where {R,D} = new{R,D}(λ, mark_dist)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This constructor is not needed, it is defined by default

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once again, I adapted it from Distributions.jl. If I understand correctly it is needed so that the (Any, Any) constructor is not defined by default. If I remove it, then julia gives a warning of method redefinition at the first constructor line 31.

Copy link
Collaborator Author

@jucheval jucheval May 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I replaced this old inner constructor by the one that checks if the intensity is positive.

Comment on lines 42 to 44
function PoissonProcess::Integer, mark_dist; check_args::Bool=true)
return PoissonProcess(float(λ), mark_dist; check_args=check_args)
end
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this necessary?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I adapted what is done for Exponential in Distributions.jl

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The design of Distributions.jl leaves plenty to be desired and is fairly old, so it shouldn't be a universal reference. Let's avoid automatic conversions for the sake of simplicity

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

# TODO: Replace PoissonProcess{R,Categorical{R,Vector{R}}} by
# MultivariatePoissonProcess{R}
# when everything else is OK
function intensity_vector(pp::PoissonProcess{R,Categorical{R,Vector{R}}}) where {R}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this generalizable to other kinds of processes?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also it needs to be documented since it is public

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean that it could be a function defined for some kind of "AbstractMultivariatePointProcess" ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or at least document it for the Poisson process

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a docstring

return log(ground_intensity(pp)) + logdensityof(mark_distribution(pp), m)
end

### Conversions
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these necessary?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once again, I adapted this from Distributions.jl. I would say that it is necessary as soon as you have the automatic conversion from integer to float (one of your other comments).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's remove them and stay simple. If a user tries to create a Poisson process with an integer intensity and it errors, that's on them.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

@jucheval jucheval force-pushed the multivariate-as-marked branch from b7b5b3c to cd7bbc3 Compare May 6, 2025 13:27
`MultivariatePoissonProcess{R}` is simply a type alias for `PoissonProcess{R,Categorical{R,Vector{R}}}`.
"""
const MultivariatePoissonProcess{R<:Real} = PoissonProcess{R,Categorical{R,Vector{R}}}
Copy link
Collaborator Author

@jucheval jucheval May 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just realized that the automatic conversion made by Distributions.jl (related to other discussions above) makes, for instance, that PoissonProcess(Int[1,1]) isa MultivariatePoissonProcess{Int} is false.
If this "problem" happens only in the case of integers intensities, we could just don't care, but it may also arises to other Real's, no ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Distributions.jl has a lot of issues related to types. For my package HiddenMarkovModels.jl, I actually rolled out my own version of Categorical, only to avoid them. There was a project called MeasureTheory.jl to rebuild Distributions.jl on a saner basis, but I think it's dead at the moment.

Either we decide that type issues are not our fault if they come from Distributions.jl, or we fix them ourselves. I'm not sure which is the best approach.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A possibility would be to write

Suggested change
const MultivariatePoissonProcess{R<:Real} = PoissonProcess{R,Categorical{R,Vector{R}}}
const MultivariatePoissonProcess{R<:Real} = PoissonProcess{R,Categorical{T,Vector{T}}} where T

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The suggestion above is not satisfactory but it makes MultivariatePoissonProcess{R} a UnionAll and not a DataType.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I understand from Categorical in Distributions.jl, the type parameter is actually associated with the type of the probabilities in the probability vector. So, the type must be an AbstractFloat, since it is a number between 0 and 1. Except, for example, in the case Categorical([1, 0]), this returns a Categorical{Int64, Vector{Int64}}.

If the intensities are integers, the type for Categorical will still be an AbstratFloat, since it will be the type of the resulting probabilities. What seems to work is

const MultivariatePoissonProcess{R<:Real} = PoissonProcess{R,Categorical{F,Vector{F}}} where {F<:AbstractFloat}

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is an other suggestion :

const MultivariatePoissonProcess{R<:Real, F} = PoissonProcess{R,Categorical{F,Vector{F}}}

Hence, this is a DataType.

The drawback is that you need to provide e.g. MultivariatePoissonProcess{Float32, Float32} instead of MultivariatePoissonProcess{Float32} to the fit function.

Don't we need to define a separate fit method anyway (or, at least, suffstats)? From what I understand, MultivariatePoissonProcess is for non-marked events, so when we call fit with some vector of histories, we cannot use the marks in the histories, we need build the marks so that each event time has a mark corresponding to which history it came from.

If that is the case, couldn't we just define the method fit(pp::Type{<:MultivariatePoissonProcess{R}}, hists::Vector{<:History}) where {R<:Real}, instead of using MultivariatePoissonPoisson{R,F}? Or have both, but the first just dispatches to the second one with F = float(R)?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now, fit is implemented by providing the marks, even for multivariate processes. That is one of the reasons why I tried to unify multivariate and marked frameworks with this PR.

FYI, the "main" fit method is :

function StatsAPI.fit(
    ::Type{PoissonProcess{R,D}}, ss::PoissonProcessStats{R1,R2}
) where {R<:Real,D,R1<:Real,R2<:Real}
    λ = convert(R, ss.nb_events / ss.duration)
    mark_dist = fit(D, ss.marks, ss.weights)
    return PoissonProcess(λ, mark_dist)
end

Leveraging on your idea, I would suggest to add this method

function StatsAPI.fit(
    ::Type{MultivariatePoissonPoisson{R}}, ss::PoissonProcessStats{R1,R2}
) where {R<:Real,R1<:Real,R2<:Real}
    λ = convert(R, ss.nb_events / ss.duration)
    mark_dist = fit(Categorical, ss.marks, ss.weights)
    return PoissonProcess(λ, mark_dist)
end

Here, we let Distributions.jl chose the concrete Categorical type.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now, fit is implemented by providing the marks, even for multivariate processes. That is one of the reasons why I tried to unify multivariate and marked frameworks with this PR.

Makes sense. We can make a convenience function afterwards to build a mulitvariate history, if we see fit.

My take is that Distributions.jl will stop you from propagating any fancy types either way, so introducing a distinction between the type you want (R) and the type you get after conversion (F) is only giving you false hopes

I was playing around with the fit method for Categorical and it does not let you choose the type anyway. fit(Categorical, [1,2,2,1]) and fit(Categorical{BigFloat, Vector{BigFloat}}, [1,2,2,1]) both return a Categorical{Float64, Vector{Float64}}, so this choice of type parameter is an illusion.

So we could do

function StatsAPI.fit(
    ::Type{MultivariatePoissonPoisson{R}}, ss::PoissonProcessStats{R1,R2}
) where {R<:Real,R1<:Real,R2<:Real}
    return fit(MultivariatePoisson{R, Float64}, ss)
end

or just force F to be a Float64 in the constructor, which only affects the precision of the probabilities of each mark, and this problem with fit goes away.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If propagation of fancy types is impossible due to Distributions.jl as @gdalle suggested, I would say that the better and simpler is to force Float64 in the constructor.

I can make a commit that does that if you want.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is the better solution too. Maybe just leave a comment somewhere saying that this is the reason we forced the type, just to let whoever may be working on the code afterwards know about this.

@jucheval
Copy link
Collaborator Author

@gdalle I think that every conversation can be resolved except the last one concerning MultivariatePoissonProcess for which I added a suggestion.

@jucheval
Copy link
Collaborator Author

jucheval commented Nov 14, 2025

I thought that other parts of the code needed modification but it modifying the alias suffices.
FYI, autodiff seems to work still: line 40 of test/multivariate_poisson_process.jl, i.e.

gf = ForwardDiff.gradient(f1, 3 * ones(10))

does not throw an error.

@JoseKling : do you want to handle the merging ?

@JoseKling
Copy link
Owner

great.

And I can handle it, no problem

@JoseKling
Copy link
Owner

Merged manually via the command line in commit 8dc41c3.

@JoseKling JoseKling closed this Nov 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants