- 
                Notifications
    You must be signed in to change notification settings 
- Fork 319
perf: add experimental support for using mimalloc allocator #404
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
          
     Open
      
      
            wincent
  wants to merge
  7
  commits into
  main
  
    
      
        
          
  
    
      Choose a base branch
      
     
    
      
        
      
      
        
          
          
        
        
          
            
              
              
              
  
           
        
        
          
            
              
              
           
        
       
     
  
        
          
            
          
            
          
        
       
    
      
from
wincent/mimalloc
  
      
      
   
  
    
  
  
  
 
  
      
    base: main
Could not load branches
            
              
  
    Branch not found: {{ refName }}
  
            
                
      Loading
              
            Could not load tags
            
            
              Nothing to show
            
              
  
            
                
      Loading
              
            Are you sure you want to change the base?
            Some commits from the old base branch may be removed from the timeline,
            and old review comments may become outdated.
          
          Conversation
  
    
      This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
      Learn more about bidirectional Unicode characters
    
  
  
    
    Fixes: ``` luajit: ...and-t/bin/benchmarks/../../lua/wincent/commandt/init.lua:199: attempt to call field 'nvim_buf_is_valid' (a nil value) ```
Fixes: ``` luajit: ...and-t/bin/benchmarks/../../lua/wincent/commandt/init.lua:244: attempt to index field 'scanners' (a nil value) ```
Vendoring from: - https://github.com/microsoft/mimalloc and specifically: - https://github.com/microsoft/mimalloc/releases/tag/v2.0.6 I added a script to pull down the release archive and dump it into a directory, because I don't want to use a submodule for this (people installing a Vim plugin from a Git repo shouldn't have to know/worry about whether it needs or uses submodules). Space on disk for this set of files (some of which are obviously redundant in our context) is: du -sh lua/wincent/commandt/lib/vendor/github/microsoft 4.8M lua/wincent/commandt/lib/vendor/github/microsoft As it is not clear whether this is going to be a great idea or not, it only takes effect if you call `make` with `USE_MIMALLOC` set. You can verify that it actually _is_ overriding the standard `malloc()` etc calls by running a command with `MIMALLOC_VERBOSE`, which will cause it to print some extra info out: env MIMALLOC_VERBOSE=1 TIMES=1 bin/benchmarks/scanner.lua Impact (unfortunately, a bit inconclusive) on scanner and matcher benchmarks follows. Note that numbers shouldn't be compared across machines because they were produced at different times (for example, the M3 numbers are from a different version of the OS, and the branch was rebased, compared with the other machines). On mid-2015 MacBook Pro ======================= These numbers are all over the map due to thermal throttling. best avg sd +/- p (best) (avg) (sd) +/- p buffer 0.04094 0.04178 0.00278 [-0.6%] (0.04100) (0.04186) (0.00287) [-0.6%] file 0.30707 0.31436 0.02486 [-1.0%] 0.05 (0.30735) (0.31473) (0.02499) [-1.0%] 0.05 find 0.05827 0.06678 0.01162 [+1.5%] 0.05 (0.92013) (0.93752) (0.04453) [-1.0%] 0.025 git 0.05163 0.06000 0.01115 [+3.3%] 0.0005 (1.00993) (1.02469) (0.04072) [-0.7%] 0.025 rg 0.06419 0.07229 0.01203 [+3.8%] 0.005 (1.61018) (1.66326) (0.08803) [+0.3%] watchman 0.01095 0.01121 0.00068 [+0.2%] (1.16830) (1.17605) (0.01835) [+0.6%] 0.005 total 0.54387 0.56643 0.04391 [+0.4%] (5.09873) (5.15811) (0.15328) [-0.1%] best avg sd +/- p (best) (avg) (sd) +/- p pathological 0.44648 0.48275 0.19826 [-10.0%] 0.01 (0.44705) (0.48350) (0.19793) [-10.0%] 0.01 command-t 0.41205 0.44292 0.21658 [+3.8%] 0.005 (0.41255) (0.44364) (0.21681) [+3.8%] 0.005 chromium (subset) 2.75724 2.99017 0.47925 [-1.3%] (0.51232) (0.55960) (0.17228) [-1.5%] chromium (whole) 3.18933 3.63241 0.64392 [-0.7%] (0.41821) (0.49571) (0.14853) [-0.3%] 0.05 big (400k) 4.90155 5.51271 1.20748 [-1.0%] (0.65297) (0.74723) (0.23045) [-4.5%] 0.05 total 11.74815 13.06097 2.16866 [-1.2%] (2.47007) (2.72968) (0.54795) [-2.8%] 0.025 M1 MacBook Pro ============== best avg sd +/- p (best) (avg) (sd) +/- p buffer 0.04407 0.05368 0.01123 [-1.4%] 0.025 (0.04433) (0.05413) (0.01150) [-1.6%] 0.025 file 0.20902 0.21428 0.01060 [+1.0%] 0.01 (0.20902) (0.21511) (0.01219) [+1.1%] 0.005 find 0.02687 0.03006 0.01015 [+3.9%] 0.05 (0.63141) (0.64156) (0.03483) [+0.7%] 0.05 git 0.02693 0.02995 0.00980 [+2.2%] (0.71734) (0.72825) (0.04266) [-0.4%] rg 0.02916 0.03318 0.01136 [+2.9%] (0.90193) (0.91710) (0.07157) [+1.4%] 0.005 watchman 0.01100 0.01156 0.00165 [-0.7%] (1.18802) (1.21274) (0.13422) [+1.5%] 0.005 total 0.36119 0.37272 0.03632 [+1.1%] (3.71713) (3.76889) (0.18577) [+0.9%] 0.005 best avg sd +/- p (best) (avg) (sd) +/- p pathological 0.28526 0.29636 0.08356 [-4.0%] 0.025 (0.28527) (0.29647) (0.08343) [-4.0%] 0.025 command-t 0.23759 0.24616 0.07356 [+1.6%] (0.23760) (0.24618) (0.07354) [+1.6%] chromium (subset) 1.56761 1.58469 0.03655 [-0.3%] (0.41376) (0.42040) (0.02032) [-0.4%] chromium (whole) 1.87180 1.88726 0.06174 [-0.4%] 0.025 (0.31695) (0.32809) (0.03497) [+0.4%] big (400k) 2.90455 2.92204 0.07185 [-0.2%] (0.48384) (0.50533) (0.07608) [-0.0%] total 6.88851 6.93650 0.15002 [-0.4%] 0.025 (1.74550) (1.79647) (0.14517) [-0.5%] M3 MacBook Pro ============== best avg sd +/- p (best) (avg) (sd) +/- p buffer 0.01255 0.01400 0.00409 [+2.0%] (0.01260) (0.01447) (0.00635) [-3.3%] file 0.14749 0.15026 0.00629 [+38.1%] 0.0005 (0.14843) (0.15115) (0.00626) [+37.9%] 0.0005 find 0.20783 0.27306 0.12796 [+15.8%] 0.0005 (1.13360) (1.38588) (0.55490) [+15.3%] 0.0005 git 0.21748 0.25155 0.10398 [+13.0%] 0.0005 (1.17693) (1.40937) (0.54965) [+9.1%] 0.0005 rg 0.20640 0.26983 0.12977 [+12.2%] 0.0005 (1.55310) (1.78037) (0.55921) [+6.9%] 0.0005 watchman 0.01813 0.01980 0.00287 [+6.1%] 0.0005 (1.19740) (1.21007) (0.02198) [-0.2%] total 0.81542 0.97850 0.33560 [+17.1%] 0.0005 (5.23262) (5.95132) (1.66475) [+8.7%] 0.0005 best avg sd +/- p (best) (avg) (sd) +/- p pathological 0.21079 0.22604 0.10943 [+4.8%] 0.025 (0.21107) (0.22640) (0.10972) [+4.7%] 0.025 command-t 0.16694 0.17164 0.04923 [-0.6%] (0.16716) (0.17228) (0.05253) [-0.5%] chromium (subset) 1.35310 1.36239 0.02010 [+0.1%] (0.28797) (0.29255) (0.01108) [+0.3%] chromium (whole) 1.11148 1.11599 0.01258 [+0.3%] 0.01 (0.12167) (0.12478) (0.00828) [-0.2%] big (400k) 1.67454 1.68249 0.05630 [+0.6%] 0.0005 (0.18195) (0.18487) (0.00876) [+0.0%] total 4.52863 4.55855 0.15573 [+0.5%] 0.01 (0.97644) (1.00087) (0.12712) [+1.0%] Ryzen 5950X Arch Linux ====================== best avg sd +/- p (best) (avg) (sd) +/- p buffer 0.02465 0.02544 0.01098 [-0.4%] (0.02467) (0.02546) (0.01099) [-0.5%] file 0.09906 0.09948 0.00124 [-0.1%] (0.09943) (0.09995) (0.00130) [-0.2%] find 0.01852 0.01885 0.00084 [+0.5%] (0.25137) (0.25430) (0.00762) [+0.1%] git 0.01718 0.01811 0.00210 [+0.6%] (0.22095) (0.22468) (0.01156) [-0.6%] rg 0.01748 0.01792 0.00105 [+0.5%] (0.60575) (0.61077) (0.01562) [-0.1%] watchman 0.00178 0.00186 0.00033 [-5.6%] (0.02282) (0.02717) (0.02826) [-11.5%] total 0.17975 0.18165 0.01018 [-0.0%] (1.23025) (1.24233) (0.04061) [-0.4%] 0.05 best avg sd +/- p (best) (avg) (sd) +/- p pathological 0.26186 0.27703 0.10940 [-4.4%] 0.0005 (0.26196) (0.27715) (0.10946) [-4.4%] 0.0005 command-t 0.19271 0.20058 0.05044 [-3.0%] 0.0005 (0.19279) (0.20065) (0.05047) [-3.0%] 0.0005 chromium (subset) 1.83627 1.89158 0.25631 [-3.8%] 0.01 (0.45977) (0.49985) (0.21028) [-15.7%] 0.005 chromium (whole) 1.36877 1.38916 0.06031 [+2.6%] 0.0005 (0.12129) (0.12530) (0.01659) [-0.4%] big (400k) 2.39053 2.43636 0.11813 [+1.8%] 0.0005 (0.19600) (0.20396) (0.02644) [-0.1%] total 6.09256 6.19472 0.33431 [-0.2%] (1.24139) (1.30690) (0.25114) [-7.5%] 0.005
The .prettierignore change is because there are a couple of things in the Markdown files that Prettier doesn't like. The clang-format thing comes from a tip here: - https://stackoverflow.com/a/57272592/2103996 Should prevent CI failures like this one: - https://github.com/wincent/command-t/actions/runs/2979207632
Wasn't needed on clang, but is needed with gcc:
    /usr/bin/ld: mimalloc-override.o: relocation R_X86_64_TPOFF32
    against `recurse' can not be used when making a shared object;
    recompile with -fPIC
    I can't see a changelog or release notes in the repo, so here is the diff: - microsoft/mimalloc@v2.0.6...v2.1.7
0938103    to
    836698d      
    Compare
  
    | Quick test of Hoard, for comparison: Results (relative to   | 
  
    Sign up for free
    to join this conversation on GitHub.
    Already have an account?
    Sign in to comment
  
      
  Add this suggestion to a batch that can be applied as a single commit.
  This suggestion is invalid because no changes were made to the code.
  Suggestions cannot be applied while the pull request is closed.
  Suggestions cannot be applied while viewing a subset of changes.
  Only one suggestion per line can be applied in a batch.
  Add this suggestion to a batch that can be applied as a single commit.
  Applying suggestions on deleted lines is not supported.
  You must change the existing code in this line in order to create a valid suggestion.
  Outdated suggestions cannot be applied.
  This suggestion has been applied or marked resolved.
  Suggestions cannot be applied from pending reviews.
  Suggestions cannot be applied on multi-line comments.
  Suggestions cannot be applied while the pull request is queued to merge.
  Suggestion cannot be applied right now. Please check back later.
  
    
  
    
Vendoring from microsoft/mimalloc and specifically the
v2.0.6 tagv2.1.7 tag.mimalloc is a simple allocator focused on performance and it is easy to drop in as a replacement for
malloc()and friends as described in its README. So as not to bring in a dependency on CMake, we just build thestatic.cversion. Sadly, the performance delta (see numbers below) is not a clear win; the numbers are a bit all over the place. This probably isn't that surprising because most of the heavy memory allocation in Command-T is already micro-managed internally (but simply, with little overhead) using big slabs allocated withmmap(). Nevertheless, parking this here as a possible idea.I added a script to pull down the release archive and dump it into a directory, because I don't want to use a submodule for this (people installing a Vim plugin from a Git repo shouldn't have to know/worry about whether it needs or uses submodules). Space on disk for this set of files (some of which are obviously redundant in our context) is:
As it is not clear whether this is going to be a great idea or not, it only takes effect if you call
makewithUSE_MIMALLOCset. You can verify that it actually is overriding the standardmalloc()etc calls by running a command withMIMALLOC_VERBOSE, which will cause it to print some extra info out:Impact (unfortunately, a bit inconclusive) on scanner and matcher benchmarks follows. Note that numbers shouldn't be compared across machines because they were produced at different times (for example, the M3 numbers are from a different version of the OS, and the branch was rebased, compared with the other machines).
On mid-2015 MacBook Pro
These numbers are all over the map due to thermal throttling.
M1 MacBook Pro
M3 MacBook Pro
Ryzen 5950X Arch Linux