update borg2 changes up to beta 10.

ThomasWaldmann · ThomasWaldmann · commit 7f357b4d6219 · 2024-09-09T20:17:45.000+02:00
diff --git a/releases/borg-2.0.rst b/releases/borg-2.0.rst
@@ -16,156 +16,197 @@ Breaking compatibility
 
 **The "bad" news first:**
 
-This is a breaking release, it is not directly compatible with borg 1.x repos and thus
-not a quick upgrade.
+This is a breaking release, it is not directly compatible with borg 1.x repos
+and thus not a quick upgrade.
 
-Also, there were cli changes, thus you will need to review/edit your scripts. Wrappers
-and GUIs for borg also need to get adapted.
+Also, there were cli changes, thus you will need to review/edit your scripts.
+Wrappers and GUIs for borg also need to get adapted.
 
 **The good news are:**
 
-- if you like, you can efficiently copy your existing archives from old borg 1.x repos to
-  new borg 2 repos using "borg transfer" (you will need space and time for this, though).
+- if you like, you can efficiently copy your existing archives from old borg
+  1.x repos to new borg 2 repos using "borg transfer" (you will need space
+  and time for this, though).
 - by doing a breaking release, we could:
 
-  - fix a lot of long-term issues that could not (easily) be fixed in a non-breaking release
+  - fix a lot of long-term issues that could not (easily) be fixed in a non-
+    breaking release
   - make the code cleaner and simpler, get rid of cruft and legacy
-  - improve security and speed
+  - improve security, speed and parallelism
   - open doors for new features and applications that were not possible yet
   - make the docs shorter and using borg easier
-- this is the first breaking release since many years and we do not plan another one
-  anytime soon.
+- this is the first breaking release since many years and we do not plan
+  another one anytime soon.
 
 Major new features
 ~~~~~~~~~~~~~~~~~~
 
-- create: added retries for input files (e.g. if there is a read error or file changed while reading)
-- extract --continue: continue a previously interrupted extraction
-- additionally to ssh: repos, also implement repos via unix domain (ipc) socket
+- new repository and locking implementation based on borgstore project
+
+  - borgstore is a key/value store in python, currently supporting file: and
+    sftp: backends. borgstore backends are easy to implement, so there might
+    be more in future, like direct access to cloud storage repos.
+  - borg uses these to implement file: and ssh: repos and (new) sftp: repos.
+  - additionally to ssh: repos, we also have socket: repos now.
+  - concurrent parallel access to a repository is now possible for most borg
+    commands (except check and compact).
+  - a "repository index" is not needed anymore because objects are directly
+    found by their ID. the memory needs of this index were proportional to
+    the object count in the repository, thus borg now needs less RAM.
+  - stale repository locks get auto-removed if they don't get refreshed or if
+    their owner process is known-dead.
+  - borg compact does much less I/O because it doesn't need to compact large
+    "segment files" to free space, each repo object is now stored separately
+    and thus can be deleted individually also.
+  - borg delete and prune are much faster now.
+  - the repository works very differently now:
+
+    - borg 1.x: transaction-based (commit or roll back), log-like, append-only
+      segment files, precise refcounting, repo index needed, exclusive lock
+      needed, checkpointing and .part files needed.
+    - borg 2: convergence, write-order, separate objects, no refcounting,
+      garbage collection, no repo index needed, simplicity, mostly works with
+      a shared lock, no need for checkpointing or .part files.
 
-- better, more modern, faster crypto
+- multi-repo improvements
 
-  - new keys/repos only use new crypto: AEAD, AES-OCB, chacha20-poly1305, argon2.
-  - using session keys: more secure and easier to manage, especially in multi-client or multi-repo
-    contexts. doing this, we could get rid of problematic long term nonce/counter management.
-  - the old crypto code will get removed in borg 2.1 (currently we still need it to read from
-    your old borg 1.x repos). removing AES-CTR, pbkdf2, encrypt-and-mac, counter/nonce management
-    will make borg more secure, easier to use and develop.
-
-- repos are faster, safer and easier to deal with
-
-  - borg rcompress can do a repo-wide efficient recompression.
-  - the new PUT2 data format uses much less crc32 and more xxh64 and offers
-    a header-only checksum (PUT1 only offered one checksum for header+data).
-    that way, we can safely read header infos without having to also read all the data.
-  - vastly different speeds in misc. crc32 implementations do not matter any more.
-    because of this, we can just use python's zlib.crc32 and do not need libdeflate's crc32.
-  - the repo index now also stores "csize" (less random I/O for some ops)
-  - the repo index now has an API to store and query misc. "flags" (can be used e.g. for
-    bookkeeping of long-running whole-repo operations)
+  - borg 1.x only could deal with 1 repository per borg invocation. borg 2.0
+    now also knows about another repo (see --other-repo option) for some
+    commands, like borg transfer, borg repo-create, ...
+  - borg repo-create can create "related repositories" of an existing repo,
+    e.g. to use them for efficient archive transfers using borg transfer.
+  - borg transfer can copy and convert archives from a borg 1.x repo to a
+    related borg 2 repo. to save time, it will transfer the compressed file
+    content chunks without recompressing. but, to make your repo more secure,
+    it will decrypt / re-encrypt all the chunks.
+  - borg transfer can copy archives from one borg 2 repo to a related other
+    borg 2 repo, without doing any conversion.
+  - borg transfer usually transfers compressed chunks (avoids recompression),
+    but there is also the option to recompress them using a specific
+    compressor.
 
-- multi-repo improvements
+- better, more modern, faster crypto
 
-  - borg 1.x only could deal with 1 repository per borg invocation. borg 2.0 now also knows
-    about another repo (see --other-repo option) for some commands, like borg transfer,
-    borg rcreate, ...
-  - borg rcreate can create "related repositories" of an existing repo, e.g. to use them
-    for efficient archive transfers using borg transfer.
-  - borg transfer can copy and convert archives from a borg 1.x repo to a related borg 2 repo.
-    to save time, it will transfer the compressed file content chunks without recompressing.
-    but, to make your repo more secure, it will decrypt / re-encrypt all the chunks.
-  - borg transfer can copy archives from one borg 2 repo to a related other borg 2 repo,
-    without doing any conversion.
-  - borg transfer usually transfers compressed chunks (avoids recompression), but there is
-    also the option to recompress them using a specific compressor.
+  - new keys/repos only use new crypto: AEAD, AES-OCB, chacha20-poly1305,
+    argon2.
+  - using session keys: more secure and easier to manage, especially in multi-
+    client or multi-repo contexts. doing this, we could get rid of problematic
+    long term nonce/counter management.
+  - the old crypto code will get removed in borg 2.1 (currently we still need
+    it to read from your old borg 1.x repos). removing AES-CTR, pbkdf2,
+    encrypt-and-mac, counter/nonce management will make borg more secure,
+    easier to use and develop.
 
 - command line interface cleanups
 
-  - no scp style repo parameters any more (parsing ambiguity issues, no :port possible),
-    just use the better ssh://user@host:port/path .
+  - no scp style repo parameters any more (parsing ambiguity issues, no
+    :port possible), just use the better ssh://user@host:port/path .
   - separated repo and archive, no "::" any more
-  - split some commands that worked on archives and repos into 2 separate commands
-    (makes the code/docs/help easier)
-  - renamed borg init to borg rcreate for better consistency
-  - BORG_EXIT_CODES=modern is the default now to get more specific process exit codes
-
+  - split some commands that worked on archives and repos into 2 separate
+    commands (makes the code/docs/help easier)
+  - renamed borg init to borg repo-create for better consistency
+  - BORG_EXIT_CODES=modern is the default now to get more specific process
+    exit codes
 
 - added commands / options:
 
   - you will usually need to give either -r (aka --repo) or BORG_REPO env var.
   - --match-archives now has support for regex or glob/shell style matching
+  - extract --continue: continue a previously interrupted extraction
+  - new borg repo-compress command can do a repo-wide efficient recompression.
   - borg key change-location: usable for repokey <-> keyfile location change
   - borg benchmark cpu (so you can actually see what's fastest for your CPU)
-  - borg import/export-tar --tar-format=GNU/PAX/BORG, support ctime/atime PAX headers.
-    GNU and PAX are standard formats, while BORG is a very low-level custom format only
-    for borg usage.
-  - borg create: add the "slashdot hack" to strip path prefixes in created archives
+  - borg import/export-tar --tar-format=GNU/PAX/BORG, support ctime/atime PAX
+    headers. GNU and PAX are standard formats, while BORG is a very low-level
+    custom format only for borg usage.
+  - borg create: add the "slashdot hack" to strip path prefixes in created
+    archives
+  - borg repo-space: optionally, you can allocate some reserved space in the
+    repo to free in "file system full" conditions.
   - borg version: show local/remote borg version
 
 - removed commands / options:
 
-  - removed -P (aka --prefix) option, use -a (aka --match-archives) instead, e.g.: -a 'PREFIX*'
+  - removed -P (aka --prefix) option, use -a (aka --match-archives) instead,
+    e.g.: -a 'PREFIX*'
   - borg upgrade (was only relevant for attic / old borg)
   - removed deprecated cli options
-  - remove recreate --recompress option, the repo-wide "rcompress" is more efficient.
+  - remove recreate --recompress option, the repo-wide "repo-compress" is
+    more efficient.
+  - remove borg config command (it only worked locally anyway)
+  - repository storage quota limit (might come back if we find a more useful
+    implementation)
+  - repository append-only mode (might come back later, likely implemented
+    very differently)
 
 Other changes
 ~~~~~~~~~~~~~
 
-- BORG_CACHE_IMPL defaults to "adhocwithfiles" now, not using a persistent chunks cache anymore
+- create: added retries for input files (e.g. if there is a read error or
+  file changed while reading)
+- BORG_CACHE_IMPL defaults to "adhocwithfiles" now, not using a persistent
+  chunks cache anymore, solving all issues related to chunks cache sync.
 - improve acl_get / acl_set error handling, refactor acl code
 - crypto: use a one-step kdf for session keys
 - use less setup.py, use pip, build and make.py
-- using platformdirs python package to determine locations for configs and caches
-- show files / archives with local timezone offsets, store archive timestamps with tz offset
+- using platformdirs python package to determine locations for configs and
+  caches
+- show files / archives with local timezone offsets, store archive timestamps
+  with tz offset
 - make user/group/uid/gid optional in archived files
-- do not store .borg_part files in final archive, simplify statistics (no parts stats any more)
-- avoid orphan chunks on input files with OSErrors
-- make sure archive name/comment, stuff that get into JSON is pure valid utf-8 (no surrogate escapes)
+- make sure archive name/comment, stuff that get into JSON is pure valid
+  utf-8 (no surrogate escapes)
 - new remote and progress logging (tunneled through RPC result channel)
 - internal data format / processing changes
 
-  - using msgpack spec 2.0 now, cleanly differentiating between text and binary bytes.
-    the older msgpack spec attic and borg < 2.0 used did not have the binary type, so
-    it was not pretty...
-    also using the msgpack Timestamp data type instead of self-made bigint stuff.
-  - archives: simpler, more symmetric handling of hardlinks ("hlid", all hardlinks have same
-    chunks list, if any). the old way was just a big pain (e.g. for partial extracting),
-    ugly and spread all over the code. the new way simplified the code a lot.
+  - using msgpack spec 2.0 now, cleanly differentiating between text and
+    binary bytes. the older msgpack spec attic and borg < 2.0 used did not
+    have the binary type, so it was not pretty...
+    also using the msgpack Timestamp data type instead of self-made bigint
+    stuff.
+  - archives: simpler, more symmetric handling of hardlinks ("hlid", all
+    hardlinks have same chunks list, if any). the old way was just a big
+    pain (e.g. for partial extracting), ugly and spread all over the code.
+    the new way simplified the code a lot.
   - item metadata: clean up, remove, rename, fix, precompute stuff
   - chunks have separate encrypted metadata (size, csize, ctype, clevel).
-    this saves time for borg rcompress/recreate when recompressing to same compressor, but other level.
-    this also makes it possible to query size or csize without reading/transmitting/decompressing
-    the chunk.
-  - remove legacy zlib compression header hack, so zlib works like all the other compressors.
-    that hack was something we had to do back in the days because attic backup did not have
-    a compression header at all (because it only supported zlib).
-  - got rid of "csize" (compressed size of a chunk) in chunks index and archives.
-    this often was just "in the way" and blocked the implementation of other (re)compression
-    related features.
-  - massively increase the archive metadata stream size limitation (so it is practically
-    not relevant any more)
+    this saves time for borg repo-compress/recreate when recompressing to same
+    compressor, but other level. this also makes it possible to query size or
+    csize without reading/transmitting/decompressing the chunk.
+  - remove legacy zlib compression header hack, so zlib works like all the
+    other compressors. that hack was something we had to do back in the days
+    because attic backup did not have a compression header at all (because it
+    only supported zlib).
+  - got rid of "csize" (compressed size of a chunk) in chunks index and
+    archives. this often was just "in the way" and blocked the implementation
+    of other (re)compression related features.
+  - massively increase the archive metadata stream size limitation (so it is
+    practically not relevant any more)
 
 - source code changes
 
-  - borg 1.x borg.archiver (and also the related tests in borg.testsuite.archiver) monster
-    modules got split into packages of modules, now usually 1 module per borg cli command.
-  - using "black" (automated pep8 source code formatting), this reformatted ALL the code
+  - borg 1.x borg.archiver (and also the related tests) monster modules got
+    split into packages of modules, now usually 1 module per borg cli command.
+  - using "black" (automated pep8 source code formatting), this reformatted
+    ALL the code
   - added infrastructure so we can use "mypy" for type checking
 
 - python, packaging and library changes
 
   - minimum requirement: Python 3.9
-  - we unbundled all 3rd party code and require the respective libraries to be
-    available and installed. this makes packaging easier for dist package maintainers.
-  - discovery is done via pkg-config or (if that does not work) BORG_*_PREFIX env vars.
+  - we unbundled all 3rd party code and require the respective libraries to
+    be available and installed. this makes packaging easier for dist package
+    maintainers.
+  - discovery is done via pkg-config or (if that does not work) BORG_*_PREFIX
+    env vars.
   - our setup*.py is now much simpler, a lot moved to pyproject.toml now.
-  - we had to stop supporting LibreSSL (e.g. on OpenBSD) due to their different API.
-    borg on OpenBSD now also uses OpenSSL.
+  - we had to stop supporting LibreSSL (e.g. on OpenBSD) due to their
+    different API. borg on OpenBSD now also uses OpenSSL.
 
 - getting rid of legacy stuff
 
   - removed some code only needed to deal with very old attic or borg repos.
-    users are expected to first upgrade to borg 1.2 before jumping to borg 2.0,
-    thus we do not have to deal with any ancient stuff any more.
-  - removed archive and manifest TAMs, using simpler approach with typed repo objects.
+    users are expected to first upgrade to borg 1.2 before jumping to borg
+    2.0, thus we do not have to deal with any ancient stuff any more.
+  - removed archive and manifest TAMs, using simpler approach with typed repo
+    objects.