-
-
Notifications
You must be signed in to change notification settings - Fork 5.9k
Open
Labels
type/proposalThe new feature has not been accepted yet but needs to be discussed first.The new feature has not been accepted yet but needs to be discussed first.
Description
Feature Description
Background
Currently, Gitea creates a new git cat-file --batch
subprocess for each request when handling git operations. While this approach is straightforward, it leads to the following issues in high-concurrency scenarios:
- High system overhead due to frequent subprocess creation/destruction
- Increased response latency as each request requires subprocess initialization
- Potential overconsumption of system resources (such as file descriptors)
- Non-gogit version of Gitea in Windows is slow for git related operations
Proposal Overview
Design and implement a lightweight git cat-file --batch
subprocess manager to improve performance and resource utilization through subprocess reuse. Key features include:
- Maintaining subprocess pools organized by repository path
- Dynamically allocating idle subprocesses to handle requests
- Automatically recycling long-idle subprocesses
- Gracefully handling high-load situations
Detailed Design
Subprocess Manager Structure
type GitCatFileManager struct {
// Subprocess pools indexed by repository path
procPools map[string]*ProcPool
mutex sync.RWMutex
maxProcsPerRepo int // Maximum number of subprocesses per repository
idleTimeout time.Duration // Idle timeout period
}
type ProcPool struct {
repoPath string
processes []*GitCatFileProcess
mutex sync.Mutex
}
type GitCatFileProcess struct {
cmd *exec.Cmd
stdin io.WriteCloser
stdout io.ReadCloser
lastUsed time.Time
inUse bool
mutex sync.Mutex
}
Core Functionality Implementation
Acquiring a Subprocess
func (m *GitCatFileManager) Get(repoPath string) (*GitCatFileProcess, error) {
m.mutex.RLock()
pool, exists := m.procPools[repoPath]
m.mutex.RUnlock()
if !exists {
m.mutex.Lock()
// Double-check to avoid race conditions
pool, exists = m.procPools[repoPath]
if !exists {
pool = &ProcPool{repoPath: repoPath}
m.procPools[repoPath] = pool
}
m.mutex.Unlock()
}
return pool.getProcess()
}
func (p *ProcPool) getProcess() (*GitCatFileProcess, error) {
p.mutex.Lock()
defer p.mutex.Unlock()
// Look for an idle process
for _, proc := range p.processes {
if !proc.inUse {
proc.inUse = true
proc.lastUsed = time.Now()
return proc, nil
}
}
// Check if maximum limit has been reached
if len(p.processes) >= maxProcsPerRepo {
return nil, errors.New("reached max processes limit for repository")
}
// Create a new process
proc, err := newGitCatFileProcess(p.repoPath)
if err != nil {
return nil, err
}
p.processes = append(p.processes, proc)
return proc, nil
}
Creating a New Subprocess
func newGitCatFileProcess(repoPath string) (*GitCatFileProcess, error) {
cmd := exec.Command("git", "-C", repoPath, "cat-file", "--batch")
stdin, err := cmd.StdinPipe()
if err != nil {
return nil, err
}
stdout, err := cmd.StdoutPipe()
if err != nil {
stdin.Close()
return nil, err
}
if err := cmd.Start(); err != nil {
stdin.Close()
stdout.Close()
return nil, err
}
return &GitCatFileProcess{
cmd: cmd,
stdin: stdin,
stdout: stdout,
lastUsed: time.Now(),
inUse: true,
}, nil
}
Releasing a Subprocess
func (m *GitCatFileManager) Release(proc *GitCatFileProcess) {
proc.mutex.Lock()
proc.inUse = false
proc.lastUsed = time.Now()
proc.mutex.Unlock()
}
Periodic Cleanup
func (m *GitCatFileManager) StartCleaner(interval time.Duration) {
ticker := time.NewTicker(interval)
go func() {
for range ticker.C {
m.cleanIdleProcesses()
}
}()
}
func (m *GitCatFileManager) cleanIdleProcesses() {
now := time.Now()
m.mutex.Lock()
defer m.mutex.Unlock()
for repoPath, pool := range m.procPools {
pool.mutex.Lock()
activeProcs := make([]*GitCatFileProcess, 0, len(pool.processes))
for _, proc := range pool.processes {
proc.mutex.Lock()
if !proc.inUse && now.Sub(proc.lastUsed) > m.idleTimeout {
// Close long-idle processes
proc.stdin.Close()
proc.cmd.Process.Kill()
proc.cmd.Wait() // Avoid zombie processes
proc.mutex.Unlock()
} else {
proc.mutex.Unlock()
activeProcs = append(activeProcs, proc)
}
}
pool.processes = activeProcs
pool.mutex.Unlock()
// Remove empty process pools
if len(pool.processes) == 0 {
delete(m.procPools, repoPath)
}
}
}
start the manager
// Global instance
var gitCatFileManager = NewGitCatFileManager(
10, // Maximum subprocesses per repository
5*time.Minute, // Idle timeout period
)
func init() {
// Start the cleanup goroutine, checking once per minute
gitCatFileManager.StartCleaner(1 * time.Minute)
}
Implementation Considerations
- Error Handling: Detect and handle subprocess abnormal exit situations
- Thread Safety: Use appropriate mutex locks to ensure concurrency safety
- Resource Limits: Add a global maximum process limit to prevent resource exhaustion
- Monitoring Metrics: Add monitoring for subprocess pool usage to facilitate troubleshooting
Performance Expectations
- Reduced Latency: Most requests use already-initialized subprocesses, avoiding startup overhead
- Increased Throughput: Reduced system-level call overhead in high-concurrency scenarios
- Lowered Resource Consumption: Control of total subprocess count prevents excessive resource usage
Drawbacks
- Increased Complexity: The solution adds complexity to Gitea's codebase with new data structures, synchronization mechanisms, and lifecycle management that will need to be maintained.
- Memory Footprint: Long-running subprocesses will consume more memory over time compared to short-lived ones. Each cached subprocess maintains open file handles and memory buffers.
- UnReleased sub process maybe stuck forever, there should be timeout for a subprocess session.
- When a
git cat-file --batch
run for a long time and repository updated, what will happen.
TODO
- Implement a basic version and conduct performance benchmark tests
- Consider adding subprocess health check mechanisms
- Integrate with Gitea's monitoring and trace system
Metadata
Metadata
Assignees
Labels
type/proposalThe new feature has not been accepted yet but needs to be discussed first.The new feature has not been accepted yet but needs to be discussed first.