- 
                Notifications
    
You must be signed in to change notification settings  - Fork 60
 
          Optimize GCLDAModel with numba and Parallel
          #744
        
          New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
          Codecov ReportBase: 88.56% // Head: 87.68% // Decreases project coverage by  
 
 Additional details and impacted files@@            Coverage Diff             @@
##             main     #744      +/-   ##
==========================================
- Coverage   88.56%   87.68%   -0.88%     
==========================================
  Files          38       35       -3     
  Lines        4371     3913     -458     
==========================================
- Hits         3871     3431     -440     
+ Misses        500      482      -18     
 Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov.  | 
    
| 
           With the changes to the two update methods, it seems like they could be converted to functions instead. Is there any benefit to using   | 
    
| 
           You're right. The two update methods can be converted to functions. I don't see any benefit in using   | 
    
        
          
                nimare/annotate/gclda.py
              
                Outdated
          
        
      | # Float precision issue in Numba: https://github.com/numba/numba/issues/3426 | ||
| probs_pdf = np.trunc(probs_pdf * (10**12)) / (10**12) | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line will need to be removed in the future. It is ensuring the sum never goes over 1.0 so that we don't get this error in numba, but this truncation is causing the maps on the example to look different to the ones from previous versions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was running some tests with Neurosynth and Neuroquery, and got the error again:
File "<...>/lib/python3.9/site-packages/nimare/annotate/gclda.py", line 622, in fit
    self._update(loglikely_freq=loglikely_freq)
  File "<...>/lib/python3.9/site-packages/nimare/annotate/gclda.py", line 681, in _update
    ) = _update_peak_assignments(
  File "<...>/lib/python3.9/site-packages/numba/cpython/randomimpl.py", line 833, in binomial_impl
    raise ValueError("binomial(): p outside of [0, 1]")
ValueError: binomial(): p outside of [0, 1]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would np.clip help here?
during normalization, you could do this:
probs_pdf = (probs_pdf / np.sum(probs_pdf)).clip(0, 1)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried using clip but it still raised the same error. I think it has to do with the way numba has the sum implemented.
| 
           Given these limitations (precision issue with sum, and codecov of jitted functions), do you think we should try Cython for these two functions?  | 
    
| 
           I think @jdkent or @adelavega would be better able to answer that.  | 
    
        
          
                nimare/annotate/gclda.py
              
                Outdated
          
        
      | LGR = logging.getLogger(__name__) | ||
| 
               | 
          ||
| 
               | 
          ||
| @jit(nopython=True, cache=True) | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would codecov track this function if nopython=False?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure. Let me try this out.
| 
           @adelavega   | 
    
        
          
                nimare/annotate/gclda.py
              
                Outdated
          
        
      | # from the sampling distribution | ||
| # Returns a binary [1 x R*T] vector with a '1' in location that was sampled | ||
| # and zeros everywhere else | ||
| assignment_vec = np.random.multinomial(1, probs_pdf) | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The documentation says to "use the multinomial method of a default_rng() instance instead". If we're lucky, that might also be faster?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have tried different combinations with:
rng = np.random.default_rng()
assignment_vec = rng.multinomial(1, probs_pdf)
and everything has failed in numba.
Using the correction proposed in this issue https://github.com/numba/numba/issues/3426, the current assignment_vec = np.random.multinomial(1, probs_pdf) is working fine without truncating probs_pdf.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah that's good to hear. I'd love to hear what the speed up is using numba on a large-ish dataset. When you get a chance to run some bench marks, ping me.
We then need to decide if its worth the "risk" to fork numba. It is unfortunately a maintenance problem potentially.
| 
           @adelavega Running times per iteration for 200 topics: 
 Typically we have 10,000 iterations, so to train GCLDA on Neurosynth with   | 
    
| 
           The latest changes showed improvement in  Creating chunks ( However, this is only scalable to a small number of CPUs. I got small linear scalability for only <= 4 CPUs in a local machine and <= 8 CPUs in an HPC. What do you all think? Should we: 
 I think number 1 is good, since using numba for   | 
    
| 
           @JulioAPeraza awesome! q: why is it only scalable to a small number of CPUs? I don't quite understand that. but given that limitation, I agree with your reasoning and think #1 is a good option. That's an order of magnitude so not bad.  | 
    
| 
           @adelavega  | 
    
| 
           Ah, disk i/o would certainly explain it. I wonder using "threads" would also help: https://joblib.readthedocs.io/en/latest/parallel.html#thread-based-parallelism-vs-process-based-parallelism  | 
    
numba and Parallel
      | 
           I tested the scalability of the function with more CPUs in the HPC and it looks good. I also ran some tests using  This PR is ready for review. Thanks!  | 
    
| 
           Do you know if explicitly setting  https://numba.readthedocs.io/en/stable/user/parallel.html It looks like the main concern would be the same matrices are being access by the document loop, but it looks like it would be different parts of the arrays?  | 
    
| 
           Thanks, @adelavega. 
 I tried   | 
    
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, just a minor suggestion for readability for me.
        
          
                nimare/annotate/gclda.py
              
                Outdated
          
        
      | [ | ||
| self.data["ptoken_doc_idx"][chunk_i : chunk_i + chunk_size], | ||
| old_regions[chunk_i : chunk_i + chunk_size], | ||
| old_topics[chunk_i : chunk_i + chunk_size], | ||
| peak_probs[chunk_i : chunk_i + chunk_size, :, :], | ||
| seeds[chunk_i : chunk_i + chunk_size], | ||
| ] | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can this be a dictionary so that the arguments being sent to _update_peak_assignments are explicit?
        
          
                nimare/annotate/gclda.py
              
                Outdated
          
        
      | 
               | 
          ||
| results = Parallel(n_jobs=self.n_cores, max_nbytes=None)( | ||
| delayed(_update_peak_assignments)( | ||
| *chunk, | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
with above, this would now be **chunk
        
          
                nimare/annotate/gclda.py
              
                Outdated
          
        
      | self.topics["n_peak_tokens_region_by_topic"], | ||
| self.topics["n_peak_tokens_doc_by_topic"], | ||
| self.topics["n_word_tokens_doc_by_topic"], | ||
| self.params["n_regions"], | ||
| self.params["n_topics"], | ||
| self.params["alpha"], | ||
| self.params["delta"], | ||
| self.params["gamma"], | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a bit redundant, but now the arguments here should be named too (e.g., `gamma = self.params["gamma"]
| 
           just want to make note of the current gclda example and the one in this pull request: current decoding ROI: pull request decoding ROI: I suppose I do not know how stochastic the results are on this test dataset since it is very small. EDIT: consistent since first change in this pull request: https://nimare--744.org.readthedocs.build/en/744/auto_examples/03_annotation/04_plot_gclda.html#sphx-glr-auto-examples-03-annotation-04-plot-gclda-py  | 
    
        
          
                nimare/annotate/gclda.py
              
                Outdated
          
        
      | len(self.vocabulary), | ||
| self.params["beta"], | ||
| self.params["gamma"], | ||
| self.seed, | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed that earlier int he code self.seed = 0 rather than self.seed = self.params["seed_init"].
Could this be part of the issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably that explains the difference you saw because we are testing with a small number of iterations. I think self.params["seed_init"] is used to initialize the peak_topic and the peak_subregion assignments, whereas self.seed is used to perform the sampling in _update().
        
          
                nimare/annotate/gclda.py
              
                Outdated
          
        
      | Updated number of word-tokens assigned to each topic (T) per document (D). | ||
| """ | ||
| # --- Seed random number generator | ||
| np.random.seed(randseed) | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's likely jit does something slightly different here, since the results change if I comment out @jit above.
Closes None.
Currently, the
GCLDAmodel is slow, mainly when the vocabulary is large (for example for the 3228 terms from Neurosynth)._update_word_topic_assignmentstakes ~73 seconds per iteration and_update_peak_assignmentstakes ~52 seconds per iteration for a total of ~125 seconds of_update()per iteration. Compiling and running these functions withnumbareduce these times to 2, 4, and 6 seconds per iteration respectively.Changes proposed in this pull request:
_update_word_topic_assignmentsand_update_peak_assignmentsto static methods, and add a decorator to compile and run them withnumbainnopythonmode.