Skip to content

Conversation

@JoseKling
Copy link
Owner

  • src/hawkes/hawkes_process.jl

    • HawkesProcess type
    • Methods: rand, fit, time_change, ground_intensity, integrated_ground_intensity, logdensityof
  • tests/hawkes.jl

    • Some unit tests for Hawkes processes
  • history.jl

    • added 'event_times(h, tmin, tmax)'
    • added 'event_marks(h, tmin, tmax)'
  • simulation.jl

    • added 'simulate_poisson' - for other methods to use

- src/hawkes/hawkes_process.jl
    - Methods: rand, fit, time_change, ground_intensity,
               integrated_ground_intensity, logdensityof

- tests/hawkes.jl
    - Some unit tests for Hawkes processes

- history.jl
    - added 'event_times(h, tmin, tmax)'
    - added 'event_marks(h, tmin, tmax)'

- simulation.jl
    - added 'simulate_poisson' - for other methods to use
@codecov
Copy link

codecov bot commented Jul 10, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

Copy link
Collaborator

@jucheval jucheval left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not check every line, but there are already lots of changes requested and questions.

test/hawkes.jl Outdated
@test isa(h_sim, History{Nothing,Float64})
# Estimation
hp_est = fit(HawkesProcess, h_sim)
@test isa(hp_est, HawkesProcess)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a test to check that the estimated parameters are close to the real ones ?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am no sure how to do this, to be honest.

One way would be to simulate many processes with known parameters and then get the mean/median of the estimations, but how would I define a bound for how far off this can be from the true parameters?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is to give us a warning in case of a harmful change in a future update.
You could chose a history and a rng seed such that the estimation is "close" to the true parameters. Then, add a unit test to check that this distance to the true parameters do not deteriorate in the next updates.

Copy link
Owner Author

@JoseKling JoseKling Jul 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if this is the best approach, but, in principle, this would show us if something changes the algorithm too much.

Could make it simpler by just fixing a seed and checking if it is close to the median, if this is too much.

Also, I saved the result of a simulation in a text file, because this is recommended in the Random.jl documentation, as they might change the algorithm and, therefore, change the results in the future.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Julien is right, checking proximity between estimated and true parameters is an excellent way to guard against mistakes or breaking changes.
You can use StableRNGs.jl to get reproducible random seeds across versions and platforms, which makes text files unnecessary.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Room for improvement is good ; )

About your question, the problem is just a poor explanation in the comments.
In reality this t90 is the inverse of the time it takes for a 90% decay. The only reason for sampling the inverse of t90 instead of t90 is that this ensures that the distribution of $\omega$ will be uniform. The thing is that, because t90 is chosen from the interval (1/2, 2), both t90 and its inverse are inside this interval.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, got it. I assumed it was something like this but wanted to make sure there wasn't a bug. I'll move forward on optimizing a bit!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I optimized to significantly improve performance and increase memory efficiency. The main change replaces the previous O(n²) double loop used to compute descendant probabilities with an O(n) recurrence formulation based on running sums (A and S). This eliminates redundant computations and scales efficiently with the number of events. I also introduced a few small buffers to avoid unnecessary array allocations and removed redundant exponential recalculations within the EM loop. This caused the fit time to go from 55s for ~8k events to 10.53 ms without changing the final estimates. All tests still pass as well

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @rsenne!
Completely missed that! It is even in the paper, actually. Anyway, that's the good thing about having extra pair of eyes going through it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happens to the best of us, glad I could help :)

- Fixed `logdensityof`
- HawkesProcess is now parametric and has as check in the inner constructor
- Added explanation for `generate_descendants` and choice of initial guess in `fit`
- Fixed logdensityof
- HawkesProcess is now parametric and has as check in the inner constructor
- Added explanation for generate_descendants and choice of initial guess in fit

Fixed formatting
- Changed parameter names of `HawkesProcess`
- Added test for estimation
- Changed parameter names of `HawkesProcess`
- Added test for estimation

Forgot to run the formatter
- Changed parameter names of `HawkesProcess`
- Added test for estimation

Forgot to run the formatter
Removed the docstring from `simulate_poisson_times` causing docs deployment problem.
Copy link
Collaborator

@gdalle gdalle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome that you're getting started with new processes!
I left a few comments to try and guide you towards idiomatic Julia code, let me know what you think!

Expectation-Maximization algorithm from [E. Lewis, G. Mohler (2011)](https://api.semanticscholar.org/CorpusID:2105771)).
The relevant calculations are in page 4, equations 6-13.
Let (t_1 < ... < t_N) be the event times over the interval [0, T). We use the immigrant-descendant representation,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If possible, one should refrain from using LaTeX in docstrings because it renders badly in the REPL. If you have to include formulas or symbols, it's better to put them inside backticks so that they display as monospace code, and then add a page to the Documenter.jl documentation for proper LaTeX equations

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can visualize the docstring by typing

?fit

in a Julia REPL

max_iter::Int=1000,
)
N = nb_events(h)
N == 0 && return HawkesProcess(0.0, 0.0, 0.0)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this should return a process of the same type as the one provided to fit, otherwise the function will be type-unstable. Are you familiar with that concept?
So here we should put something like

return HawkesProcess(zero(T), zero(T), zero(T))

and parametrize T inside

::Type{HawkesProcess{T}}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Careful about the notation conflict with T = duration(h) on line 58

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I am familiar with it, although it is sometimes hard to spot.

From what I understand, the method is not type stable because the first return statement always returns a HawkesProcess{Float64}, but the second could return a BigFloat (or something Float64 could promote to), and the compiler cannot know in advance which of the statements will be hit.

In my code I was fixing the types of everything to be Float64, so I did not have to worry about this. But it is good to have to deal with it.

T = duration(h)
norm_ts = h.times .* (N / T) # Average inter-event time equal to 1. Numerical stability
n_iters = 0
step = step_tol + 1.0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whenever you use floating point numbers like 1.0, you're forcing the entire computation to happen in Float64, even if the user provided Float32 everywhere. This could be replaced by one(T) for the right choice of T.

What is the right choice of T? Probably the result of the promotion between the time type of the History and the parameter type of the HawkesProcess

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed the types returned by rand and time_change too. The rule I used is the following:

time_change(hp::HawkesProcess{HP}, h::{M,H})

  • Returns a History{M,H}, same type as argument

fit(::Type{HawkesProcess{HP}}, h::History{M,H})

  • Returns a HawkesProcess{HP}, the exact type passed as an argument
  • If argument passed is HawkesProcess (without a type parmeter), return HawkesProcess{T}, where T == promote_type(Float64, H).

rand(hp::HawkesProcess{HP}, tmin, tmax)

  • Returns a History{M,H} where H == promote_type(typeof(tmin), typeof(tmax))

Comment on lines 147 to 152
# Internal function for simulating Hawkes processes
# The first generation, gen_0, is the `immigrants`, which is a set of event times.
# For each t_g ∈ gen_n, simulate an inhomogeneous Poisson process over the interval [t_g, T]
# with intensity λ(t) = α exp(-ω(t - t_g)) with the inverse method.
# gen_{n+1} is the set of all events simulated from all events in gen_n.
# The algorithm stops when the simulation from one generation results in no further events.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use block comments like so:

#=
First line
second line
=#

function generate_descendants(rng::AbstractRNG, immigrants::Vector, T, α, ω)
descendants = eltype(immigrants)[]
next_gen = immigrants
while ~isempty(next_gen)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
while ~isempty(next_gen)
while !isempty(next_gen)

src/history.jl Outdated
Return the sorted vector of event times between `tmin` and `tmax` for `h`.
"""
function event_times(h::History, tmin, tmax)
h.times[searchsortedfirst(h.times, tmin):(searchsortedfirst(h.times, tmax) - 1)]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this could be a @view?

test/hawkes.jl Outdated
@test isa(h_sim, History{Nothing,Float64})
# Estimation
hp_est = fit(HawkesProcess, h_sim)
@test isa(hp_est, HawkesProcess)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Julien is right, checking proximity between estimated and true parameters is an excellent way to guard against mistakes or breaking changes.
You can use StableRNGs.jl to get reproducible random seeds across versions and platforms, which makes text files unnecessary.

@@ -0,0 +1 @@
[0.01017337415246089, 0.013640560097159748, 0.013925428236956083, 0.014608995173263944, 0.015735173436628665, 0.017731455643465788, 0.018173140414903033, 0.019398584582704777, 0.02169310038049055, 0.026539311564590558, 0.03535296046523939, 0.035385747642310054, 0.04888791827373706, 0.049245437622213895, 0.05207317680771159, 0.05417399567567511, 0.06282683568521101, 0.07081244234996387, 0.0791015159737285, 0.0861333024505229, 0.087551141093, 0.08772509630067638, 0.09411335483093886, 0.09724491891582732, 0.0988625084503767, 0.10010837332313449, 0.10135018373031557, 0.10363364868768966, 0.10592656977963455, 0.10715180725875367, 0.11028563316661419, 0.1134331444864608, 0.12272955724493595, 0.12726221430956405, 0.12836304084244704, 0.12844801940952075, 0.13597112047166005, 0.15072073156516308, 0.15114633616168527, 0.15451628188316555, 0.1615175726493574, 0.17404961232856206, 0.19135505688083287, 0.22076206529908327, 0.23982012455994361, 0.24316977218179758, 0.24474350243009585, 0.24644188741540635, 0.24758741892662547, 0.26335250468309224, 0.2657968651067172, 0.26846031955131877, 0.2714547249648518, 0.2752925737618863, 0.27565342198838594, 0.27609484665183454, 0.2793520184225446, 0.28043167139411607, 0.28266772707753685, 0.2892355684930643, 0.28935629867942364, 0.2910561718681186, 0.29157899048672486, 0.30523440749876696, 0.31427715680728796, 0.3160799465423002, 0.3186409783533404, 0.31928256511258085, 0.3561905127846101, 0.35726422828011145, 0.37542826109696614, 0.3761737738905845, 0.383211879818265, 0.38412250885804855, 0.3892202718510143, 0.38973687161088244, 0.3903222364578105, 0.3921460092118373, 0.3927565640326585, 0.3929931412548428, 0.39550199927854035, 0.4004287509915203, 0.4012584947658139, 0.4188933113083467, 0.4222797931631154, 0.42547015568968916, 0.43096758007962505, 0.4318550285906202, 0.44109809732147987, 0.4411113062102109, 0.4414891960155095, 0.4419159595553975, 0.4455996720525952, 0.4496160036372285, 0.4498795159779206, 0.45469339795091446, 0.45799213961322816, 0.4582125417978449, 0.4593459923561455, 0.460330625273055, 0.46151099501089304, 0.4663379688495272, 0.4668223709854475, 0.4689158086068933, 0.47513195296839694, 0.48261581524557473, 0.5018848417890917, 0.5082613332032163, 0.5098025975578675, 0.5103574120239697, 0.5105994460715356, 0.5251305262758121, 0.5394744828867348, 0.5398929198384044, 0.5451644369828262, 0.5456458484033474, 0.5461471626817158, 0.5562494823653819, 0.5577837518050861, 0.5629099224574312, 0.5639313234396448, 0.5652484663007217, 0.56579448745316, 0.569761662149659, 0.5772534932071732, 0.57747271973829, 0.581312103742883, 0.5897117080989583, 0.6042594925582441, 0.6062770267185059, 0.6146506742853325, 0.619164121825413, 0.6233280853442464, 0.6562295767400433, 0.657191314194358, 0.6683396665157701, 0.6934098253153014, 0.7098168085084123, 0.7142551495263125, 0.7149292250425998, 0.7222898663646184, 0.7230225303240918, 0.7258703766070407, 0.7329257086975677, 0.734965964575541, 0.7367052753163055, 0.7369917675462134, 0.7377952054108926, 0.7431668020456873, 0.7442335599332027, 0.7494356056840505, 0.7592464069425915, 0.76513378821372, 0.7778400193541839, 0.783570871666886, 0.7892533149681009, 0.8151907848154715, 0.8163624371624438, 0.8203512192351479, 0.8231907075189483, 0.8254450315531231, 0.8327541619684787, 0.8340446391585701, 0.8365483397626655, 0.8380870238704485, 0.8396006152087313, 0.8424109556863868, 0.8449279007257848, 0.8493882720082039, 0.8499247624654398, 0.8570201197254786, 0.8708262595580382, 0.8757516578617132, 0.8851984472410247, 0.8943124667367437, 0.9099513775965703, 0.9158323560687229, 0.916124781335173, 0.9186099384299968, 0.9207850430116301, 0.9261702419474741, 0.9266763718478324, 0.927525059408, 0.9326181870421033, 0.9332024579347786, 0.9369755618858636, 0.9371907441403706, 0.9389797249122629, 0.939953948070523, 0.9420001470511369, 0.943798275663379, 0.9466861545841257, 0.947173907438536, 0.94880346770712, 0.9500849345227314, 0.953356815578679, 0.9535829930299445, 0.9596111063743575, 0.9686141854288621, 0.9810644372158931] No newline at end of file
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above, this is not needed

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that it's more common to use uppercases at the beginning of sentences in Julia documentation

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I probably hit u (vim shortcut to make everything lowercase) by mistake while having the docstring selected.

- Added docstring to `HawkesProcess`
- Avoided LaTeX in docstrings
- Remove all `@inbounds`
- Reference parameters inside `hawkes_times.txt`
  file, instead of hard coded in tests.
- Fixed return types for `rand`, `fit`, and `time_change`
  and added tests.
@JoseKling
Copy link
Owner Author

Hello.
Any problems left here? Had some issues lately but I am back.
Cheers

@rsenne
Copy link
Collaborator

rsenne commented Oct 24, 2025

Anything else we want to do for this PR?

@JoseKling
Copy link
Owner Author

I think all issues were resolved...

@rsenne
Copy link
Collaborator

rsenne commented Oct 27, 2025

I think all issues were resolved...

Are we good to merge then? What would be good to work on next?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants