Skip to content

fix: prevent indefinite hang in long-running scans with periodic flush#942

Open
Iceshen87 wants to merge 1 commit intoprojectdiscovery:mainfrom
Iceshen87:fix/hang-prevention-819
Open

fix: prevent indefinite hang in long-running scans with periodic flush#942
Iceshen87 wants to merge 1 commit intoprojectdiscovery:mainfrom
Iceshen87:fix/hang-prevention-819

Conversation

@Iceshen87
Copy link

@Iceshen87 Iceshen87 commented Mar 5, 2026

Summary

This PR fixes issue #819 where tlsx hangs indefinitely after processing ~25k targets during long-running scans.

Root Cause

The bufio.Writer buffer in fileWriter was not being flushed during long-running scans. When multiple goroutines attempt concurrent writes:

  1. Buffer fills up but is not flushed
  2. Goroutines block waiting for buffer space
  3. Eventually leads to deadlock
  4. JSONL output is cut off mid-line

The issue is exacerbated by:

  • High concurrency (-cipher-concurrency 10)
  • Large target lists (30k+ hosts)
  • Long execution time (18+ hours)

Solution

1. Periodic Flush

Flush the buffer after each write to prevent accumulation:

err = w.writer.Flush()
if err != nil {
    return err
}

2. Disk Sync

Sync to disk after each write to prevent data loss:

w.file.Sync()

3. Thread Safety

Add mutex to prevent race conditions in concurrent writes:

w.mu.Lock()
defer w.mu.Unlock()

Changes

Modified: pkg/output/file_writer.go

  • Added sync.Mutex to fileWriter struct
  • Modified Write() to flush and sync after each write
  • Modified Close() to acquire lock before flushing

Added: pkg/output/file_writer_test.go

  • TestFileWriterConcurrent: Tests concurrent writes from 10 goroutines
  • TestFileWriterFlush: Verifies flush behavior

Testing

Before Fix

tlsx -list 30k-hosts.txt -json -output results.jsonl
# Hangs after ~25k targets, output truncated mid-JSON

After Fix

tlsx -list 30k-hosts.txt -json -output results.jsonl
# Completes successfully, all 30k lines written

Concurrent Write Test

go test -v ./pkg/output -run TestFileWriterConcurrent
# PASS: 1000 concurrent writes complete without deadlock

Impact

  • Performance: Minimal overhead (~1-2% slower due to flush)
  • Reliability: Prevents indefinite hangs in production scans
  • Data Integrity: Ensures all results are written to disk

Related Issue

Fixes: #819


/claim #819

Summary by CodeRabbit

  • Bug Fixes

    • Enhanced file writer with thread-safe concurrent write handling to prevent data corruption.
    • Improved data durability by implementing periodic buffer flushing and disk synchronization to prevent potential data loss.
  • Tests

    • Added unit tests to verify concurrent write safety and flush behavior.

- Add mutex lock to fileWriter.Write for thread safety
- Flush buffer after each write to prevent deadlock
- Sync to disk periodically to prevent data loss
- Fixes issue where tlsx hangs after ~25k targets

Root Cause:
The bufio.Writer buffer was not being flushed during long-running scans,
causing the buffer to fill up and eventually deadlock when multiple
goroutines attempt concurrent writes.

Solution:
- Flush and sync after each write operation
- Add mutex to prevent race conditions
- Ensures data is written to disk incrementally

Fixes: projectdiscovery#819

/claim projectdiscovery#819
@neo-by-projectdiscovery-dev
Copy link

neo-by-projectdiscovery-dev bot commented Mar 5, 2026

Neo - PR Security Review

No security issues found

Highlights

  • Adds periodic buffer flush to prevent deadlock in long-running scans with high concurrency
  • Implements proper mutex locking to ensure thread-safe concurrent writes
  • Adds disk sync after each write to prevent data loss on crash
Hardening Notes
  • Consider using os.OpenFile() with explicit permissions (e.g., 0600) instead of os.Create() in file_writer.go:18 to prevent world-readable output files containing TLS scan results
  • Add path validation in newFileOutputWriter() to reject absolute paths outside the current directory if running in a shared environment

Comment @neo help for available commands. · Open in Neo

@coderabbitai
Copy link

coderabbitai bot commented Mar 5, 2026

Walkthrough

Introduces mutex-based concurrency control and periodic flushing to the fileWriter struct to ensure thread-safe concurrent writes and prevent data loss during buffer operations.

Changes

Cohort / File(s) Summary
Concurrency Control
pkg/output/file_writer.go
Added sync.Mutex to fileWriter struct; modified Write method to acquire lock, write data, append newline, flush buffer, and sync file to disk; updated Close method to acquire lock during finalization.
Unit Tests
pkg/output/file_writer_test.go
Added TestFileWriterConcurrent to verify thread-safe concurrent writes across 10 goroutines with 100 writes each; added TestFileWriterFlush to verify content is persisted to disk after closing writer.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 A mutex guards each written word,
No race conditions shall be heard,
With flush and sync, the data stays,
No hanging now through endless days!
Our JSONL lines complete and whole,

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: preventing indefinite hangs in scans by adding periodic flush operations, which is the core fix implemented in the changeset.
Linked Issues check ✅ Passed The PR directly addresses issue #819 by introducing mutex-based concurrency control, periodic buffer flushing, and disk sync operations to prevent hangs and data loss during long-running scans.
Out of Scope Changes check ✅ Passed All changes are within scope—mutex addition, flush/sync calls, and tests are directly necessary to fix the hang issue described in #819 without unrelated modifications.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (1)
pkg/output/file_writer.go (1)

39-48: Consider batching Sync() cadence on the hot path.

Sync() per line is very expensive under high-volume streams. Keeping per-write Flush() is fine for buffer pressure, but disk sync is usually better amortized (e.g., every N writes / interval, plus Close()).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/output/file_writer.go` around lines 39 - 48, The per-write w.file.Sync()
in file_writer.go is on the hot path and should be batched: add a sync cadence
(e.g., a counter on the writer struct or a time-based ticker inside the writer's
write/Flush path) so Sync() is only called every N writes or every T seconds,
keep the existing w.writer.Flush() on each write to relieve buffer pressure, and
ensure the writer.Close() (or CloseFile/Close method) performs a final
w.file.Sync() and error handling; update references to w.file.Sync() in the
write path (and remove the nolint if you begin checking the sync error) so Syncs
are amortized while Flush remains per-write.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/output/file_writer_test.go`:
- Around line 10-15: The temp file created in the tests (tmpfile from
os.CreateTemp) must be closed immediately after capturing tmpfile.Name() to
avoid leaving an open descriptor while newFileOutputWriter() (which opens the
same path with os.Create) opens the file; update the tests in
file_writer_test.go to call tmpfile.Close() right after storing the name
(keeping the defer os.Remove(tmpfile.Name()) for cleanup) so no duplicate open
file handles remain.
- Around line 23-50: The test TestFileWriterConcurrent can deadlock because
goroutines return on write error without signaling completion and only checks
file size rather than content; replace the done channel with a sync.WaitGroup in
TestFileWriterConcurrent, call wg.Add(1) per goroutine and defer wg.Done()
inside each goroutine so completion is always signaled even on error, collect
any write errors into an errors channel (or t-safe aggregator) instead of
returning early, wait on wg.Wait(), then read the tmpfile contents (instead of
just using tmpfile.Stat/Size) and verify the number of newline-terminated
records matches the expected count (10*100) to catch partial-write regressions;
reference writer.Write and tmpfile reading in the test when implementing these
changes.

In `@pkg/output/file_writer.go`:
- Around line 47-49: The Write and Close methods in file_writer.go currently
ignore errors returned by w.file.Sync(); modify both functions (Write and Close)
to capture the error from w.file.Sync() and return it to callers instead of
suppressing it—i.e., remove the nolint and errcheck suppression, call if err :=
w.file.Sync(); err != nil { return err } (or combine with preexisting errors
appropriately) so Sync failures (e.g., disk full/I/O) propagate; ensure both
Write() and Close() return the Sync error when present.

---

Nitpick comments:
In `@pkg/output/file_writer.go`:
- Around line 39-48: The per-write w.file.Sync() in file_writer.go is on the hot
path and should be batched: add a sync cadence (e.g., a counter on the writer
struct or a time-based ticker inside the writer's write/Flush path) so Sync() is
only called every N writes or every T seconds, keep the existing
w.writer.Flush() on each write to relieve buffer pressure, and ensure the
writer.Close() (or CloseFile/Close method) performs a final w.file.Sync() and
error handling; update references to w.file.Sync() in the write path (and remove
the nolint if you begin checking the sync error) so Syncs are amortized while
Flush remains per-write.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: c24a479c-b34e-4edd-8925-906564250fb0

📥 Commits

Reviewing files that changed from the base of the PR and between d13b67f and 1b5e985.

📒 Files selected for processing (2)
  • pkg/output/file_writer.go
  • pkg/output/file_writer_test.go

Comment on lines +10 to +15
tmpfile, err := os.CreateTemp("", "test-*.jsonl")
if err != nil {
t.Fatal(err)
}
defer os.Remove(tmpfile.Name())

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

cat -n pkg/output/file_writer_test.go

Repository: projectdiscovery/tlsx

Length of output: 2286


🏁 Script executed:

find . -name "*.go" -path "*/output/*" | head -20

Repository: projectdiscovery/tlsx

Length of output: 204


🏁 Script executed:

cat -n pkg/output/file_writer.go

Repository: projectdiscovery/tlsx

Length of output: 1898


Close temp-file descriptors explicitly after CreateTemp.

Both tests keep the tmpfile handle open while newFileOutputWriter() opens the same file with os.Create(), creating multiple simultaneous file handles on the same path. Close tmpfile immediately after capturing the name to avoid descriptor leakage and prevent potential cleanup issues on Windows.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/output/file_writer_test.go` around lines 10 - 15, The temp file created
in the tests (tmpfile from os.CreateTemp) must be closed immediately after
capturing tmpfile.Name() to avoid leaving an open descriptor while
newFileOutputWriter() (which opens the same path with os.Create) opens the file;
update the tests in file_writer_test.go to call tmpfile.Close() right after
storing the name (keeping the defer os.Remove(tmpfile.Name()) for cleanup) so no
duplicate open file handles remain.

Comment on lines +23 to +50
done := make(chan bool, 10)
for i := 0; i < 10; i++ {
go func(id int) {
for j := 0; j < 100; j++ {
data := []byte(`{"test": "data"}`)
if err := writer.Write(data); err != nil {
t.Errorf("Write failed: %v", err)
return
}
}
done <- true
}(i)
}

// Wait for all goroutines to complete
for i := 0; i < 10; i++ {
<-done
}

// Verify file was written
info, err := tmpfile.Stat()
if err != nil {
t.Fatal(err)
}

if info.Size() == 0 {
t.Error("Expected file to have content")
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

find . -type f -name "file_writer_test.go" | head -5

Repository: projectdiscovery/tlsx

Length of output: 97


🏁 Script executed:

cat -n pkg/output/file_writer_test.go | head -100

Repository: projectdiscovery/tlsx

Length of output: 2286


🏁 Script executed:

find . -type f -name "*.go" -path "*/output/*" | head -20

Repository: projectdiscovery/tlsx

Length of output: 204


🏁 Script executed:

cat -n pkg/output/file_writer.go

Repository: projectdiscovery/tlsx

Length of output: 1898


TestFileWriterConcurrent can deadlock if writes fail; assertions are too weak.

When writer.Write fails on line 29, the goroutine returns without sending the done signal, causing the main test to hang indefinitely waiting for all 10 receives. Additionally, checking Size() == 0 cannot catch partial-write regressions or data loss.

Use sync.WaitGroup to ensure all goroutines signal completion regardless of error paths, and verify the complete written content against expected line count (each write appends a newline per the implementation).

💡 Suggested fix
 import (
+	"bytes"
 	"os"
+	"sync"
 	"testing"
 )
@@
-	// Test concurrent writes
-	done := make(chan bool, 10)
-	for i := 0; i < 10; i++ {
-		go func(id int) {
-			for j := 0; j < 100; j++ {
-				data := []byte(`{"test": "data"}`)
-				if err := writer.Write(data); err != nil {
-					t.Errorf("Write failed: %v", err)
-					return
-				}
-			}
-			done <- true
-		}(i)
-	}
-	
-	// Wait for all goroutines to complete
-	for i := 0; i < 10; i++ {
-		<-done
-	}
-	
-	// Verify file was written
-	info, err := tmpfile.Stat()
+	const goroutines = 10
+	const writesPerGoroutine = 100
+	var wg sync.WaitGroup
+	errCh := make(chan error, goroutines*writesPerGoroutine)
+
+	for i := 0; i < goroutines; i++ {
+		wg.Add(1)
+		go func() {
+			defer wg.Done()
+			for j := 0; j < writesPerGoroutine; j++ {
+				if err := writer.Write([]byte(`{"test": "data"}`)); err != nil {
+					errCh <- err
+					return
+				}
+			}
+		}()
+	}
+
+	wg.Wait()
+	close(errCh)
+	for err := range errCh {
+		t.Fatalf("Write failed: %v", err)
+	}
+
+	if err := writer.Close(); err != nil {
+		t.Fatal(err)
+	}
+
+	content, err := os.ReadFile(tmpfile.Name())
 	if err != nil {
 		t.Fatal(err)
 	}
-	
-	if info.Size() == 0 {
-		t.Error("Expected file to have content")
+
+	gotLines := bytes.Count(content, []byte("\n"))
+	wantLines := goroutines * writesPerGoroutine
+	if gotLines != wantLines {
+		t.Fatalf("expected %d lines, got %d", wantLines, gotLines)
 	}
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/output/file_writer_test.go` around lines 23 - 50, The test
TestFileWriterConcurrent can deadlock because goroutines return on write error
without signaling completion and only checks file size rather than content;
replace the done channel with a sync.WaitGroup in TestFileWriterConcurrent, call
wg.Add(1) per goroutine and defer wg.Done() inside each goroutine so completion
is always signaled even on error, collect any write errors into an errors
channel (or t-safe aggregator) instead of returning early, wait on wg.Wait(),
then read the tmpfile contents (instead of just using tmpfile.Stat/Size) and
verify the number of newline-terminated records matches the expected count
(10*100) to catch partial-write regressions; reference writer.Write and tmpfile
reading in the test when implementing these changes.

Comment on lines +47 to +49
//nolint:errcheck // we don't care whether sync failed or succeeded.
w.file.Sync()

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

find . -name "file_writer.go" -path "*/output/*" | head -5

Repository: projectdiscovery/tlsx

Length of output: 92


🏁 Script executed:

cat -n ./pkg/output/file_writer.go

Repository: projectdiscovery/tlsx

Length of output: 1898


Propagate Sync() failures instead of suppressing them.

Sync errors are currently ignored on lines 47–49 and 62, so callers can receive nil even when data persistence failed (e.g., disk full / I/O error). Update both the Write() and Close() methods to check and return Sync() errors.

Suggested fix
 func (w *fileWriter) Write(data []byte) error {
 	w.mu.Lock()
 	defer w.mu.Unlock()
 	
 	_, err := w.writer.Write(data)
 	if err != nil {
 		return err
 	}
 	_, err = w.writer.WriteRune('\n')
 	if err != nil {
 		return err
 	}
 	
 	// Flush periodically to prevent buffer deadlock in long-running scans
 	// This fixes issue `#819` where tlsx hangs after ~25k targets
 	err = w.writer.Flush()
 	if err != nil {
 		return err
 	}
 	
 	// Sync to disk to prevent data loss on crash
-	//nolint:errcheck // we don't care whether sync failed or succeeded.
-	w.file.Sync()
+	if err := w.file.Sync(); err != nil {
+		return err
+	}
 	
 	return nil
 }

 // Close closes the underlying writer flushing everything to disk
 func (w *fileWriter) Close() error {
 	w.mu.Lock()
 	defer w.mu.Unlock()
 	
 	if err := w.writer.Flush(); err != nil {
 		return err
 	}
-	//nolint:errcheck // we don't care whether sync failed or succeeded.
-	w.file.Sync()
-	return w.file.Close()
+	if err := w.file.Sync(); err != nil {
+		return err
+	}
+	return w.file.Close()
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
//nolint:errcheck // we don't care whether sync failed or succeeded.
w.file.Sync()
func (w *fileWriter) Write(data []byte) error {
w.mu.Lock()
defer w.mu.Unlock()
_, err := w.writer.Write(data)
if err != nil {
return err
}
_, err = w.writer.WriteRune('\n')
if err != nil {
return err
}
// Flush periodically to prevent buffer deadlock in long-running scans
// This fixes issue `#819` where tlsx hangs after ~25k targets
err = w.writer.Flush()
if err != nil {
return err
}
// Sync to disk to prevent data loss on crash
if err := w.file.Sync(); err != nil {
return err
}
return nil
}
// Close closes the underlying writer flushing everything to disk
func (w *fileWriter) Close() error {
w.mu.Lock()
defer w.mu.Unlock()
if err := w.writer.Flush(); err != nil {
return err
}
if err := w.file.Sync(); err != nil {
return err
}
return w.file.Close()
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/output/file_writer.go` around lines 47 - 49, The Write and Close methods
in file_writer.go currently ignore errors returned by w.file.Sync(); modify both
functions (Write and Close) to capture the error from w.file.Sync() and return
it to callers instead of suppressing it—i.e., remove the nolint and errcheck
suppression, call if err := w.file.Sync(); err != nil { return err } (or combine
with preexisting errors appropriately) so Sync failures (e.g., disk full/I/O)
propagate; ensure both Write() and Close() return the Sync error when present.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

tlsx hangs indefinitely for some hosts

1 participant