Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 12 additions & 1 deletion devservices/utils/supervisor.py
Original file line number Diff line number Diff line change
Expand Up @@ -222,7 +222,12 @@ def _is_program_running(self, program_name: str) -> bool:
if not isinstance(state, int):
return False
return state == SupervisorProcessState.RUNNING
except xmlrpc.client.Fault:
except (
xmlrpc.client.Fault,
SupervisorConnectionError,
socket.error,
ConnectionRefusedError,
):
# If we can't get the process info, assume it's not running
return False

Expand Down Expand Up @@ -295,6 +300,9 @@ def stop_supervisor_daemon(self) -> None:
self._get_rpc_client().supervisor.shutdown()
except xmlrpc.client.Fault as e:
raise SupervisorError(f"Failed to stop supervisor: {e.faultString}")
except (SupervisorConnectionError, socket.error, ConnectionRefusedError):
# Supervisor is already down, nothing to do
pass

def start_process(self, name: str) -> None:
if self._is_program_running(name):
Comment on lines 300 to 308
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: The start_process method lacks error handling for connection failures, creating a race condition where a thread can crash if the supervisor becomes unavailable after the initial status check.
Severity: HIGH | Confidence: High

🔍 Detailed Analysis

A race condition exists in the start_process method. The function first checks if a process is running and then attempts to start it. However, the supervisor daemon could become unavailable between these two operations. Because the underlying XML-RPC client connects lazily, a socket.error or ConnectionRefusedError can be raised during the start attempt. The current implementation only handles xmlrpc.client.Fault, leaving these connection errors unhandled. This is especially problematic as start_process is called from concurrent threads in up.py, which could lead to thread crashes.

💡 Suggested Fix

Update the try...except block in the start_process method to also catch SupervisorConnectionError, socket.error, and ConnectionRefusedError, similar to the error handling already implemented in the stop_process method. This will prevent thread crashes if the supervisor daemon is unreachable.

🤖 Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent.
Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not
valid.

Location: devservices/utils/supervisor.py#L300-L308

Potential issue: A race condition exists in the `start_process` method. The function
first checks if a process is running and then attempts to start it. However, the
supervisor daemon could become unavailable between these two operations. Because the
underlying XML-RPC client connects lazily, a `socket.error` or `ConnectionRefusedError`
can be raised during the start attempt. The current implementation only handles
`xmlrpc.client.Fault`, leaving these connection errors unhandled. This is especially
problematic as `start_process` is called from concurrent threads in `up.py`, which could
lead to thread crashes.

Did we get this right? 👍 / 👎 to inform future reviews.
Reference ID: 7674921

Expand All @@ -315,6 +323,9 @@ def stop_process(self, name: str) -> None:
raise SupervisorProcessError(
f"Failed to stop process {name}: {e.faultString}"
)
except (SupervisorConnectionError, socket.error, ConnectionRefusedError):
# Supervisor is not available, process is already stopped
pass

def get_program_command(self, program_name: str) -> str:
opts = ServerOptions()
Expand Down
Loading