Skip to content

Conversation

@lunny
Copy link
Member

@lunny lunny commented Jan 6, 2026

  • Introduce ObjectInfo to represent a git object information.
  • Introduce ObjectPool interface to replace Batch. The new interface have ObjectInfo, Object and Close methods.
  • Implement batchObjectPool which will invoke git catfile --batch or git catfile --batch-check when necessary.

Next Steps(not this PR):

  • Try to move Limit Reader to readCloser
  • Move Discard operation to readCloser's Close method
  • Handle possible broken sub batch processes.
  • Use io.ReadCloser instead of ReadCloseDiscarder after above updates
  • Use ObjectPool as a field of TreeCommon and Blob instead of Repository
  • Introduce a new ObjectPool implementation based on git catfile --batch-command

@GiteaBot GiteaBot added the lgtm/need 2 This PR needs two approvals by maintainers to be considered for merging. label Jan 6, 2026
@github-actions github-actions bot added the modifies/go Pull requests that update Go code label Jan 6, 2026
@github-actions github-actions bot added the modifies/api This PR adds API routes or modifies them label Jan 6, 2026
@lunny lunny added the type/refactoring Existing code has been cleaned up. There should be no new functionality. label Jan 7, 2026
@lunny lunny changed the title EXP: object pool refactor 2 Introducing object pool interface and refactor related code Jan 7, 2026
@wxiaoguang
Copy link
Contributor

Problems:

  1. It is very fragile to keep asking developer to call contentReader.Close() // close reader to avoid open a new process in the same goroutine
  2. The error system is messy enough, it shouldn't keep adding new catfile.IsErrObjectNotFound checks
  3. Not thread-safe
  4. log.Warn("Opening more than one cat file batch in the same goroutine for: %s", b.repoPath) such warning sometimes helps, but what if in some cases it really needs two processes? Such warning only misleads users.
  5. objectPool, err := catfile.NewBatchObjectPool(ctx, repoPath) + repo.objectPool != nil will also cause random failure during tests like Refactor CatFile batch implementation and introduce batch-command for git 2.36 #34651, the root problem is interface nil value
  6. ...

-> Refactor cat-file batch operations and support --batch-command approach #35775

@wxiaoguang
Copy link
Contributor

@lafriks the comment above should be constructive enough IMHO. Please help to review " Refactor cat-file batch operations and support --batch-command approach #35775 "

@lunny
Copy link
Member Author

lunny commented Jan 7, 2026

Problems:

  1. It is very fragile to keep asking developer to call contentReader.Close() // close reader to avoid open a new process in the same goroutine

The Close is prepare to move the Discard to the Close for the next step. The current design is more fragile to ask developer to invoke Discard again and again.

  1. The error system is messy enough, it shouldn't keep adding new catfile.IsErrObjectNotFound checks

This is because catfile is a sub package which cannot use git.ErrNotFound. If we move it to git package, the problem will not exist.

  1. Not thread-safe

Adding a lock is straightforward, but is it actually correct to share a Repository instance across multiple goroutines?

  1. log.Warn("Opening more than one cat file batch in the same goroutine for: %s", b.repoPath) such warning sometimes helps, but what if in some cases it really needs two processes? Such warning only misleads users.

Related to 3, I couldn't find when it need two processes. If it needs, why not create two Repository?

  1. objectPool, err := catfile.NewBatchObjectPool(ctx, repoPath) + repo.objectPool != nil will also cause random failure during tests like Refactor CatFile batch implementation and introduce batch-command for git 2.36 #34651, the root problem is interface nil value

Sorry, I don't understand what you mean. I have rerun the CI multiple times, it will always succeed now and never encouter such failure. #34651 should have the problem but the code base are different.

  1. ...

-> Refactor cat-file batch operations and support --batch-command approach #35775

PR #35775 has been accepted by me. However, this PR introduces a new concept, ObjectPool, which goes beyond the scope of #35775. The goal of ObjectPool is to decouple Repository from any single concrete object-query mechanism. With this abstraction, Git object metadata and content can be retrieved through multiple implementations, such as direct Git commands, batch commands, go-git, or a remote service. In contrast, #35775 only addresses the --batch-command implementation.

@wxiaoguang
Copy link
Contributor

Problems:

  1. It is very fragile to keep asking developer to call contentReader.Close() // close reader to avoid open a new process in the same goroutine

The Close is prepare to move the Discard to the Close for the next step. The current design is more fragile to ask developer to invoke Discard again and again.

The old design is less fragile than this PR's newly introduced "Close"

  • Old design: only call Discard
  • This PR: need to call Discard, and also need to remember to add many new "Close"

I agree with that "The Close is prepare to move the Discard to the Close for the next step". That's also my thought and I left a comment in 35775 "// It needs to refactor to a fully managed Reader stream in the future"

  1. The error system is messy enough, it shouldn't keep adding new catfile.IsErrObjectNotFound checks

This is because catfile is a sub package which cannot use git.ErrNotFound. If we move it to git package, the problem will not exist.

It doesn't need to be in a new package. It only makes the system unnecessarily complex.

That's why I removed the "catfile" package in 35775, then everything goes simple again, with no drawback.

  1. Not thread-safe

Adding a lock is straightforward, but is it actually correct to share a Repository instance across multiple goroutines?

  1. log.Warn("Opening more than one cat file batch in the same goroutine for: %s", b.repoPath) such warning sometimes helps, but what if in some cases it really needs two processes? Such warning only misleads users.

Related to 3, I couldn't find when it need two processes. If it needs, why not create two Repository?

If we are sure about those, and don't want to have these cases, then we need to aggressively check mistakes: report panics for developers to avoid such mistakes, which will be very difficult to debug.

  1. objectPool, err := catfile.NewBatchObjectPool(ctx, repoPath) + repo.objectPool != nil will also cause random failure during tests like Refactor CatFile batch implementation and introduce batch-command for git 2.36 #34651, the root problem is interface nil value

Sorry, I don't understand what you mean. I have rerun the CI multiple times, it will always succeed now and never encouter such failure. #34651 should have the problem but the code base are different.

It successds now because your NewBatchObjectPool never returns error.

In Golang, interface nil isn't nil. So the if xxx != nil { xxx.Close } will panic if NewBatchObjectPool returns an error.

  1. ...

-> Refactor cat-file batch operations and support --batch-command approach #35775

PR #35775 has been accepted by me. However, this PR introduces a new concept, ObjectPool, which goes beyond the scope of #35775. The goal of ObjectPool is to decouple Repository from any single concrete object-query mechanism. With this abstraction, Git object metadata and content can be retrieved through multiple implementations, such as direct Git commands, batch commands, go-git, or a remote service. In contrast, #35775 only addresses the --batch-command implementation.

I don't see ObjectPool really beyond 35775's CatFileBatch interface. The ObjectPool only stores more "ctx" into structs and will cause more problems.

If you really like ObjectPool, you can add a few lines to wrap CatFileBatch to ObjectPool, they do not have any real difference.

@wxiaoguang
Copy link
Contributor

wxiaoguang commented Jan 7, 2026

  1. ...

By the way, there are also other problems but I just don't want to write them more one by one.

For example: when error occurs, you can't just do Close and set inUse=false to make it reusable. When error occurs, the git protocol has already been corrupted, reusing it will only cause deadlocks (Almost all deadlock bugs are fixed by me, right?)

Incorrectly handling git process error is a longstanding problem , the old code is not right either, but we should not make it worse.

@lunny
Copy link
Member Author

lunny commented Jan 7, 2026

  1. ...

By the way, there are also other problems but I just don't want to write them more one by one.

For example: when error occurs, you can't just do Close and set inUse=false to make it reusable. When error occurs, the git protocol has already been corrupted, reusing it will only cause deadlocks (Almost all deadlock bugs are fixed by me, right?)

Incorrectly handling git process error is a longstanding problem , the old code is not right either, but we should not make it worse.

I have considered this issue. At the moment, when an error occurs, the repository is closed, and all related subprocesses are terminated as well—regardless of whether they are individually broken or not. As a result, the page will display an error, but resources will not leak.

I agree that for long-running tasks, relying solely on inUse is insufficient when a subprocess becomes broken. However, this is not a problem introduced by this PR; it has existed for a long time. I have therefore added it to the follow-up work planned as the next step after this PR.

@lunny
Copy link
Member Author

lunny commented Jan 7, 2026

Problems:

  1. It is very fragile to keep asking developer to call contentReader.Close() // close reader to avoid open a new process in the same goroutine

The Close is prepare to move the Discard to the Close for the next step. The current design is more fragile to ask developer to invoke Discard again and again.

The old design is less fragile than this PR's newly introduced "Close"

  • Old design: only call Discard
  • This PR: need to call Discard, and also need to remember to add many new "Close"

I agree with that "The Close is prepare to move the Discard to the Close for the next step". That's also my thought and I left a comment in 35775 "// It needs to refactor to a fully managed Reader stream in the future"

  1. The error system is messy enough, it shouldn't keep adding new catfile.IsErrObjectNotFound checks

This is because catfile is a sub package which cannot use git.ErrNotFound. If we move it to git package, the problem will not exist.

It doesn't need to be in a new package. It only makes the system unnecessarily complex.

That's why I removed the "catfile" package in 35775, then everything goes simple again, with no drawback.

  1. Not thread-safe

Adding a lock is straightforward, but is it actually correct to share a Repository instance across multiple goroutines?

  1. log.Warn("Opening more than one cat file batch in the same goroutine for: %s", b.repoPath) such warning sometimes helps, but what if in some cases it really needs two processes? Such warning only misleads users.

Related to 3, I couldn't find when it need two processes. If it needs, why not create two Repository?

If we are sure about those, and don't want to have these cases, then we need to aggressively check mistakes: report panics for developers to avoid such mistakes, which will be very difficult to debug.

But sometimes, it needs more than 1 cat batch in the same go routine. Maybe it's a bug or resource leak. We can keep the warning here until we find and fix all the possible wrong usage.

  1. objectPool, err := catfile.NewBatchObjectPool(ctx, repoPath) + repo.objectPool != nil will also cause random failure during tests like Refactor CatFile batch implementation and introduce batch-command for git 2.36 #34651, the root problem is interface nil value

Sorry, I don't understand what you mean. I have rerun the CI multiple times, it will always succeed now and never encouter such failure. #34651 should have the problem but the code base are different.

It successds now because your NewBatchObjectPool never returns error.

In Golang, interface nil isn't nil. So the if xxx != nil { xxx.Close } will panic if NewBatchObjectPool returns an error.

  1. ...

-> Refactor cat-file batch operations and support --batch-command approach #35775

PR #35775 has been accepted by me. However, this PR introduces a new concept, ObjectPool, which goes beyond the scope of #35775. The goal of ObjectPool is to decouple Repository from any single concrete object-query mechanism. With this abstraction, Git object metadata and content can be retrieved through multiple implementations, such as direct Git commands, batch commands, go-git, or a remote service. In contrast, #35775 only addresses the --batch-command implementation.

I don't see ObjectPool really beyond 35775's CatFileBatch interface. The ObjectPool only stores more "ctx" into structs and will cause more problems.

If you really like ObjectPool, you can add a few lines to wrap CatFileBatch to ObjectPool, they do not have any real difference.

Yes, if #35775 merged, I could rework this PR to do a wrap.

@wxiaoguang
Copy link
Contributor

wxiaoguang commented Jan 7, 2026

Yes, if #35775 merged, I could rework this PR to do a wrap.

The higher priority tasks are fixing these FIXMEs in 35775

Especially the "content reader" problem: it should be fully managed by the framework, callers should never need to call Close or Discard manually.

@lunny
Copy link
Member Author

lunny commented Jan 7, 2026

Yes, if #35775 merged, I could rework this PR to do a wrap.

The higher priority tasks are fixing these FIXMEs in 35775

Especially the "content reader" problem: it should be fully managed by the framework, callers should never need to call Close or Discard manually.

If neither Close nor Discard is called, the resource cleanup has to be performed in Cancel. I don’t see a clear difference in behavior between these approaches. A callback function may avoid some drawbacks but it will also introduce other problems like readability or flexibility.

@wxiaoguang
Copy link
Contributor

managedReader = catfilebatch.QueryContent() or your Object()
managedReader.Read(...)

anotherReader = catfilebatch.QueryContent() { mark managedReader as discarded }
anotherReader.Read(...) // OK 

managedReader.Read(...) // panic to tell developers that you have made mistakes

@lunny
Copy link
Member Author

lunny commented Jan 11, 2026

Too many conflicts.

@lunny lunny closed this Jan 11, 2026
@lunny lunny deleted the lunny/refactor_object_pool2 branch January 11, 2026 02:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

lgtm/need 2 This PR needs two approvals by maintainers to be considered for merging. modifies/api This PR adds API routes or modifies them modifies/go Pull requests that update Go code type/refactoring Existing code has been cleaned up. There should be no new functionality.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants