-
Couldn't load subscription status.
- Fork 2
Standard Hawkes Process #59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
- src/hawkes/hawkes_process.jl
- Methods: rand, fit, time_change, ground_intensity,
integrated_ground_intensity, logdensityof
- tests/hawkes.jl
- Some unit tests for Hawkes processes
- history.jl
- added 'event_times(h, tmin, tmax)'
- added 'event_marks(h, tmin, tmax)'
- simulation.jl
- added 'simulate_poisson' - for other methods to use
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not check every line, but there are already lots of changes requested and questions.
test/hawkes.jl
Outdated
| @test isa(h_sim, History{Nothing,Float64}) | ||
| # Estimation | ||
| hp_est = fit(HawkesProcess, h_sim) | ||
| @test isa(hp_est, HawkesProcess) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a test to check that the estimated parameters are close to the real ones ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am no sure how to do this, to be honest.
One way would be to simulate many processes with known parameters and then get the mean/median of the estimations, but how would I define a bound for how far off this can be from the true parameters?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea is to give us a warning in case of a harmful change in a future update.
You could chose a history and a rng seed such that the estimation is "close" to the true parameters. Then, add a unit test to check that this distance to the true parameters do not deteriorate in the next updates.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if this is the best approach, but, in principle, this would show us if something changes the algorithm too much.
Could make it simpler by just fixing a seed and checking if it is close to the median, if this is too much.
Also, I saved the result of a simulation in a text file, because this is recommended in the Random.jl documentation, as they might change the algorithm and, therefore, change the results in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Julien is right, checking proximity between estimated and true parameters is an excellent way to guard against mistakes or breaking changes.
You can use StableRNGs.jl to get reproducible random seeds across versions and platforms, which makes text files unnecessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Room for improvement is good ; )
About your question, the problem is just a poor explanation in the comments.
In reality this t90 is the inverse of the time it takes for a 90% decay. The only reason for sampling the inverse of t90 instead of t90 is that this ensures that the distribution of
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, got it. I assumed it was something like this but wanted to make sure there wasn't a bug. I'll move forward on optimizing a bit!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I optimized to significantly improve performance and increase memory efficiency. The main change replaces the previous O(n²) double loop used to compute descendant probabilities with an O(n) recurrence formulation based on running sums (A and S). This eliminates redundant computations and scales efficiently with the number of events. I also introduced a few small buffers to avoid unnecessary array allocations and removed redundant exponential recalculations within the EM loop. This caused the fit time to go from 55s for ~8k events to 10.53 ms without changing the final estimates. All tests still pass as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @rsenne!
Completely missed that! It is even in the paper, actually. Anyway, that's the good thing about having extra pair of eyes going through it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Happens to the best of us, glad I could help :)
- Fixed `logdensityof` - HawkesProcess is now parametric and has as check in the inner constructor - Added explanation for `generate_descendants` and choice of initial guess in `fit`
- Fixed logdensityof - HawkesProcess is now parametric and has as check in the inner constructor - Added explanation for generate_descendants and choice of initial guess in fit Fixed formatting
- Changed parameter names of `HawkesProcess` - Added test for estimation
- Changed parameter names of `HawkesProcess` - Added test for estimation Forgot to run the formatter
- Changed parameter names of `HawkesProcess` - Added test for estimation Forgot to run the formatter Removed the docstring from `simulate_poisson_times` causing docs deployment problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome that you're getting started with new processes!
I left a few comments to try and guide you towards idiomatic Julia code, let me know what you think!
src/hawkes/hawkes_process.jl
Outdated
| Expectation-Maximization algorithm from [E. Lewis, G. Mohler (2011)](https://api.semanticscholar.org/CorpusID:2105771)). | ||
| The relevant calculations are in page 4, equations 6-13. | ||
| Let (t_1 < ... < t_N) be the event times over the interval [0, T). We use the immigrant-descendant representation, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If possible, one should refrain from using LaTeX in docstrings because it renders badly in the REPL. If you have to include formulas or symbols, it's better to put them inside backticks so that they display as monospace code, and then add a page to the Documenter.jl documentation for proper LaTeX equations
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can visualize the docstring by typing
?fit
in a Julia REPL
src/hawkes/hawkes_process.jl
Outdated
| max_iter::Int=1000, | ||
| ) | ||
| N = nb_events(h) | ||
| N == 0 && return HawkesProcess(0.0, 0.0, 0.0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that this should return a process of the same type as the one provided to fit, otherwise the function will be type-unstable. Are you familiar with that concept?
So here we should put something like
return HawkesProcess(zero(T), zero(T), zero(T))and parametrize T inside
::Type{HawkesProcess{T}}There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Careful about the notation conflict with T = duration(h) on line 58
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I am familiar with it, although it is sometimes hard to spot.
From what I understand, the method is not type stable because the first return statement always returns a HawkesProcess{Float64}, but the second could return a BigFloat (or something Float64 could promote to), and the compiler cannot know in advance which of the statements will be hit.
In my code I was fixing the types of everything to be Float64, so I did not have to worry about this. But it is good to have to deal with it.
src/hawkes/hawkes_process.jl
Outdated
| T = duration(h) | ||
| norm_ts = h.times .* (N / T) # Average inter-event time equal to 1. Numerical stability | ||
| n_iters = 0 | ||
| step = step_tol + 1.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whenever you use floating point numbers like 1.0, you're forcing the entire computation to happen in Float64, even if the user provided Float32 everywhere. This could be replaced by one(T) for the right choice of T.
What is the right choice of T? Probably the result of the promotion between the time type of the History and the parameter type of the HawkesProcess
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed the types returned by rand and time_change too. The rule I used is the following:
time_change(hp::HawkesProcess{HP}, h::{M,H})
- Returns a
History{M,H}, same type as argument
fit(::Type{HawkesProcess{HP}}, h::History{M,H})
- Returns a
HawkesProcess{HP}, the exact type passed as an argument - If argument passed is
HawkesProcess(without a type parmeter), returnHawkesProcess{T}, whereT == promote_type(Float64, H).
rand(hp::HawkesProcess{HP}, tmin, tmax)
- Returns a
History{M,H}whereH == promote_type(typeof(tmin), typeof(tmax))
src/hawkes/hawkes_process.jl
Outdated
| # Internal function for simulating Hawkes processes | ||
| # The first generation, gen_0, is the `immigrants`, which is a set of event times. | ||
| # For each t_g ∈ gen_n, simulate an inhomogeneous Poisson process over the interval [t_g, T] | ||
| # with intensity λ(t) = α exp(-ω(t - t_g)) with the inverse method. | ||
| # gen_{n+1} is the set of all events simulated from all events in gen_n. | ||
| # The algorithm stops when the simulation from one generation results in no further events. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can use block comments like so:
#=
First line
second line
=#
src/hawkes/hawkes_process.jl
Outdated
| function generate_descendants(rng::AbstractRNG, immigrants::Vector, T, α, ω) | ||
| descendants = eltype(immigrants)[] | ||
| next_gen = immigrants | ||
| while ~isempty(next_gen) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| while ~isempty(next_gen) | |
| while !isempty(next_gen) |
src/history.jl
Outdated
| Return the sorted vector of event times between `tmin` and `tmax` for `h`. | ||
| """ | ||
| function event_times(h::History, tmin, tmax) | ||
| h.times[searchsortedfirst(h.times, tmin):(searchsortedfirst(h.times, tmax) - 1)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this could be a @view?
test/hawkes.jl
Outdated
| @test isa(h_sim, History{Nothing,Float64}) | ||
| # Estimation | ||
| hp_est = fit(HawkesProcess, h_sim) | ||
| @test isa(hp_est, HawkesProcess) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Julien is right, checking proximity between estimated and true parameters is an excellent way to guard against mistakes or breaking changes.
You can use StableRNGs.jl to get reproducible random seeds across versions and platforms, which makes text files unnecessary.
test/hawkes_times.txt
Outdated
| @@ -0,0 +1 @@ | |||
| [0.01017337415246089, 0.013640560097159748, 0.013925428236956083, 0.014608995173263944, 0.015735173436628665, 0.017731455643465788, 0.018173140414903033, 0.019398584582704777, 0.02169310038049055, 0.026539311564590558, 0.03535296046523939, 0.035385747642310054, 0.04888791827373706, 0.049245437622213895, 0.05207317680771159, 0.05417399567567511, 0.06282683568521101, 0.07081244234996387, 0.0791015159737285, 0.0861333024505229, 0.087551141093, 0.08772509630067638, 0.09411335483093886, 0.09724491891582732, 0.0988625084503767, 0.10010837332313449, 0.10135018373031557, 0.10363364868768966, 0.10592656977963455, 0.10715180725875367, 0.11028563316661419, 0.1134331444864608, 0.12272955724493595, 0.12726221430956405, 0.12836304084244704, 0.12844801940952075, 0.13597112047166005, 0.15072073156516308, 0.15114633616168527, 0.15451628188316555, 0.1615175726493574, 0.17404961232856206, 0.19135505688083287, 0.22076206529908327, 0.23982012455994361, 0.24316977218179758, 0.24474350243009585, 0.24644188741540635, 0.24758741892662547, 0.26335250468309224, 0.2657968651067172, 0.26846031955131877, 0.2714547249648518, 0.2752925737618863, 0.27565342198838594, 0.27609484665183454, 0.2793520184225446, 0.28043167139411607, 0.28266772707753685, 0.2892355684930643, 0.28935629867942364, 0.2910561718681186, 0.29157899048672486, 0.30523440749876696, 0.31427715680728796, 0.3160799465423002, 0.3186409783533404, 0.31928256511258085, 0.3561905127846101, 0.35726422828011145, 0.37542826109696614, 0.3761737738905845, 0.383211879818265, 0.38412250885804855, 0.3892202718510143, 0.38973687161088244, 0.3903222364578105, 0.3921460092118373, 0.3927565640326585, 0.3929931412548428, 0.39550199927854035, 0.4004287509915203, 0.4012584947658139, 0.4188933113083467, 0.4222797931631154, 0.42547015568968916, 0.43096758007962505, 0.4318550285906202, 0.44109809732147987, 0.4411113062102109, 0.4414891960155095, 0.4419159595553975, 0.4455996720525952, 0.4496160036372285, 0.4498795159779206, 0.45469339795091446, 0.45799213961322816, 0.4582125417978449, 0.4593459923561455, 0.460330625273055, 0.46151099501089304, 0.4663379688495272, 0.4668223709854475, 0.4689158086068933, 0.47513195296839694, 0.48261581524557473, 0.5018848417890917, 0.5082613332032163, 0.5098025975578675, 0.5103574120239697, 0.5105994460715356, 0.5251305262758121, 0.5394744828867348, 0.5398929198384044, 0.5451644369828262, 0.5456458484033474, 0.5461471626817158, 0.5562494823653819, 0.5577837518050861, 0.5629099224574312, 0.5639313234396448, 0.5652484663007217, 0.56579448745316, 0.569761662149659, 0.5772534932071732, 0.57747271973829, 0.581312103742883, 0.5897117080989583, 0.6042594925582441, 0.6062770267185059, 0.6146506742853325, 0.619164121825413, 0.6233280853442464, 0.6562295767400433, 0.657191314194358, 0.6683396665157701, 0.6934098253153014, 0.7098168085084123, 0.7142551495263125, 0.7149292250425998, 0.7222898663646184, 0.7230225303240918, 0.7258703766070407, 0.7329257086975677, 0.734965964575541, 0.7367052753163055, 0.7369917675462134, 0.7377952054108926, 0.7431668020456873, 0.7442335599332027, 0.7494356056840505, 0.7592464069425915, 0.76513378821372, 0.7778400193541839, 0.783570871666886, 0.7892533149681009, 0.8151907848154715, 0.8163624371624438, 0.8203512192351479, 0.8231907075189483, 0.8254450315531231, 0.8327541619684787, 0.8340446391585701, 0.8365483397626655, 0.8380870238704485, 0.8396006152087313, 0.8424109556863868, 0.8449279007257848, 0.8493882720082039, 0.8499247624654398, 0.8570201197254786, 0.8708262595580382, 0.8757516578617132, 0.8851984472410247, 0.8943124667367437, 0.9099513775965703, 0.9158323560687229, 0.916124781335173, 0.9186099384299968, 0.9207850430116301, 0.9261702419474741, 0.9266763718478324, 0.927525059408, 0.9326181870421033, 0.9332024579347786, 0.9369755618858636, 0.9371907441403706, 0.9389797249122629, 0.939953948070523, 0.9420001470511369, 0.943798275663379, 0.9466861545841257, 0.947173907438536, 0.94880346770712, 0.9500849345227314, 0.953356815578679, 0.9535829930299445, 0.9596111063743575, 0.9686141854288621, 0.9810644372158931] No newline at end of file | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See above, this is not needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that it's more common to use uppercases at the beginning of sentences in Julia documentation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I probably hit u (vim shortcut to make everything lowercase) by mistake while having the docstring selected.
- Added docstring to `HawkesProcess` - Avoided LaTeX in docstrings - Remove all `@inbounds`
- Reference parameters inside `hawkes_times.txt` file, instead of hard coded in tests. - Fixed return types for `rand`, `fit`, and `time_change` and added tests.
|
Hello. |
|
Anything else we want to do for this PR? |
|
I think all issues were resolved... |
Are we good to merge then? What would be good to work on next? |
src/hawkes/hawkes_process.jl
tests/hawkes.jl
history.jl
simulation.jl