Skip to content

Declare gitattributes, reorganize Git history, and open a GitHub maintenance support request #166

@samrocketman

Description

@samrocketman

Why

Developers and artists cloning the repo will clone a few gigabytes less (2.5GB saved in following tests).

Suggestion

GitHub will perform git gc only on demand. An org admin will need to open a support request with GitHub explicitly requesting git gc --aggressive on the asset repo.

Before this happens; the history should be rewritten with gitattributes to declare binaries in the initial commit.

Here's the .gitattributes file I used in tests.

* binary
*.cfdg text diff
*.md text diff
*.svg text=auto diff
.git* text diff

Background

Git handles binary deltas just fine; but you can improve how it handles binary data if you declare with Git you have binaries.

Git delta compression on binaries:

  • Original asset repo (no checkout): 8.4GB
  • Without gitattributes (git gc): 7.7GB
  • With gitattributes (git gc): 5.9GB
  • Raw assets size (single checkout without Git): 11GB

11GB of assets are tracked across 216 Git commits with automatic delta compression (8.4GB) packed into separate "blob packs". If you do Git maintanence and execute git gc --aggressive then git will "repack" all of the commits and determine their binary deltas (more efficient packing after you have a long history of commits).

If you add a .gitattributes file and reorganize the Git history so that the gitattributes is the initial commit, then Git appears to handle binary assets more efficiently than relying on their automatic heuristics for binary deltas.

File size by file extension:

File extension File size by type Type of file
7z 243M binary
blend 3.6G binary
cfdg 4.0K text
jpg 8.8M binary
kra 516K binary
md 4.0K text
odg 32K binary
png 2.8M binary
psd 4.0M binary
svg 3.0M text/mixed
xcf 6.3G binary
zip 189M binary
total 11G

Benchmark

git gc --aggressive benchmark (with reordered history with .gitattributes is initial commit):

70 minutes

$ time git gc --aggressive
Enumerating objects: 4300, done.
Counting objects: 100% (4300/4300), done.
Delta compression using up to 8 threads
Compressing objects:  83% (3586/4296)
Compressing objects: 100% (4296/4296), done.
Writing objects: 100% (4300/4300), done.
Selecting bitmap commits: 163, done.
Building bitmaps: 100% (106/106), done.
Total 4300 (delta 2034), reused 2167 (delta 0), pack-reused 0

real	70m31.830s
user	301m40.752s
sys	0m29.756s

Some source

Size calculation:

total_size() { grep -o '.*total$' | sed "s/\\([^ \\t]\\+\\).*total/${1}: \\1/";}

find * -type f | sed 's/^.*\.//' | sort -u | while read -er ext; do find * -type f -name "*.${ext}" -exec du -sch {} + | total_size "${ext}";done

du -shc * | total_size total

Background Reading

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions