Skip to content

Conversation

@wdpypere
Copy link
Contributor

@wdpypere wdpypere commented Jun 30, 2022

  • make limit of scancel commands configurable
  • set limit of amount of user del
  • improve logging about the scancel and user del limits
  • don't create scancel commands for users that don't exist or don't have any jobs
  • fix a silent failure

fixes #127

for user in remove_users:
job_cancel_commands[user].append(create_remove_user_jobs_command(user=user, cluster=cluster))
active_jobs = get_slurm_sacct_active_jobs_for_user(user)
if active_jobs:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a lot of boilerplate, but this is the main change.

@wdpypere
Copy link
Contributor Author

so I'm running into a failed test because of:

======================================================================
ERROR: test_slurm_user_accounts (test.slurm_sync.SlurmSyncTestGent)
Test that the commands to create, change and remove users are correctly generated.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/wdpypere/workspace/vsc-administration/test/slurm_sync.py", line 212, in test_slurm_user_accounts
    (job_cancel_commands, commands, remove_user_commands) = slurm_user_accounts(vo_members, active_accounts, slurm_user_info, ["banette"])
  File "lib/vsc/administration/slurm/sync.py", line 376, in slurm_user_accounts
    active_jobs = get_slurm_sacct_active_jobs_for_user(user)
  File "lib/vsc/administration/slurm/sacctmgr.py", line 543, in get_slurm_sacct_active_jobs_for_user
    (exitcode, contents) = asyncloop([SLURM_SACCT, "-L", "-P", "-s", "r", "-u", user])
  File "/home/wdpypere/workspace/vsc-administration/.eggs.py3/vsc_base-3.3.1-py3.6.egg/vsc/utils/run.py", line 151, in run
    return r._run()
  File "/home/wdpypere/workspace/vsc-administration/.eggs.py3/vsc_base-3.3.1-py3.6.egg/vsc/utils/run.py", line 257, in _run
    self._run_pre()
  File "/home/wdpypere/workspace/vsc-administration/.eggs.py3/vsc_base-3.3.1-py3.6.egg/vsc/utils/run.py", line 275, in _run_pre
    self._init_process()
  File "/home/wdpypere/workspace/vsc-administration/.eggs.py3/vsc_base-3.3.1-py3.6.egg/vsc/utils/run.py", line 370, in _init_process
    self._process = self._process_module.Popen(self._shellcmd, **self._popen_named_args)
  File "/usr/lib64/python3.6/subprocess.py", line 729, in __init__
    restore_signals, start_new_session)
  File "/usr/lib64/python3.6/subprocess.py", line 1364, in _execute_child
    raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: '/usr/bin/sacct': '/usr/bin/sacct'

@wdpypere
Copy link
Contributor Author

I would like mock or patch or whatever get_slurm_sacct_active_jobs_for_user to give a fixed output instead of call sacct but I don't know how to do that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

sync_slurm_acct issue with too many cancel jobs

2 participants