-
Notifications
You must be signed in to change notification settings - Fork 50
Description
Why
Developers and artists cloning the repo will clone a few gigabytes less (2.5GB saved in following tests).
Suggestion
GitHub will perform git gc only on demand. An org admin will need to open a support request with GitHub explicitly requesting git gc --aggressive on the asset repo.
Before this happens; the history should be rewritten with gitattributes to declare binaries in the initial commit.
Here's the .gitattributes file I used in tests.
* binary
*.cfdg text diff
*.md text diff
*.svg text=auto diff
.git* text diff
Background
Git handles binary deltas just fine; but you can improve how it handles binary data if you declare with Git you have binaries.
Git delta compression on binaries:
- Original asset repo (no checkout): 8.4GB
- Without gitattributes (git gc): 7.7GB
- With gitattributes (git gc): 5.9GB
- Raw assets size (single checkout without Git): 11GB
11GB of assets are tracked across 216 Git commits with automatic delta compression (8.4GB) packed into separate "blob packs". If you do Git maintanence and execute git gc --aggressive then git will "repack" all of the commits and determine their binary deltas (more efficient packing after you have a long history of commits).
If you add a .gitattributes file and reorganize the Git history so that the gitattributes is the initial commit, then Git appears to handle binary assets more efficiently than relying on their automatic heuristics for binary deltas.
File size by file extension:
| File extension | File size by type | Type of file |
|---|---|---|
| 7z | 243M | binary |
| blend | 3.6G | binary |
| cfdg | 4.0K | text |
| jpg | 8.8M | binary |
| kra | 516K | binary |
| md | 4.0K | text |
| odg | 32K | binary |
| png | 2.8M | binary |
| psd | 4.0M | binary |
| svg | 3.0M | text/mixed |
| xcf | 6.3G | binary |
| zip | 189M | binary |
| total | 11G |
Benchmark
git gc --aggressive benchmark (with reordered history with .gitattributes is initial commit):
70 minutes
$ time git gc --aggressive
Enumerating objects: 4300, done.
Counting objects: 100% (4300/4300), done.
Delta compression using up to 8 threads
Compressing objects: 83% (3586/4296)
Compressing objects: 100% (4296/4296), done.
Writing objects: 100% (4300/4300), done.
Selecting bitmap commits: 163, done.
Building bitmaps: 100% (106/106), done.
Total 4300 (delta 2034), reused 2167 (delta 0), pack-reused 0
real 70m31.830s
user 301m40.752s
sys 0m29.756s
Some source
Size calculation:
total_size() { grep -o '.*total$' | sed "s/\\([^ \\t]\\+\\).*total/${1}: \\1/";}
find * -type f | sed 's/^.*\.//' | sort -u | while read -er ext; do find * -type f -name "*.${ext}" -exec du -sch {} + | total_size "${ext}";done
du -shc * | total_size total