stage.0: Module function cephprocesses.wait threw an exception. Exception: 'openattic' #657

Martin-Weiss · 2017-09-20T07:12:15Z

On an already deployed cluster I get this error when executing stage.0:

  Name: wait until the cluster has recovered before processing ses-5-single.emea.utopia.novell.com - Function: salt.state - Result: Changed Started: - 09:08:20.869260 Duration: 6408.758 ms
----------
          ID: check if all processes are still running after processing ses-5-single.emea.utopia.novell.com
    Function: salt.state
      Result: False
     Comment: Run failed on minions: ses-5-single.emea.utopia.novell.com
              Failures:
                  ses-5-single.emea.utopia.novell.com:
                  ----------
                            ID: wait processes
                      Function: module.run
                          Name: cephprocesses.wait
                        Result: False
                       Comment: Module function cephprocesses.wait threw an exception. Exception: 'openattic'
                       Started: 09:08:27.525172
                      Duration: 83.194 ms
                       Changes:

                  Summary for ses-5-single.emea.utopia.novell.com
                  ------------
                  Succeeded: 0
                  Failed:    1
                  ------------
                  Total states run:     1
                  Total run time:  83.194 ms
     Started: 09:08:27.278118
    Duration: 350.72 ms
     Changes:

Summary for ses-5-single.emea.utopia.novell.com_master
-------------
Succeeded: 14 (changed=8)
Failed:     1
-------------

The openattic.service us up an running well and I can access openATTIC with a web-browser without any problems.

Any idea what might be wrong, here?

The text was updated successfully, but these errors were encountered:

swiftgist · 2017-09-20T08:46:44Z

Did your cluster go into HEALTH_ERR? The steps in Stage 0 for a minion are serialized for an already running cluster. It's in /srv/salt/ceph/stage/0/minion/default.sls. The ceph.wait state is simply paranoia on our part that the previous update on some minion caused an issue and the cluster did not recover. We bail out.

Unfortunately, the HEALTH_ERR status isn't terribly granular so we do not have a systematic guarantee of correlating cause and effect.

Martin-Weiss · 2017-09-20T09:38:18Z

Did your cluster go into HEALTH_ERR?

No - the cluster and all services are up and running well and I am testing to go through stages 0..5 where I do not expect failures in case I did not change anything.

Any idea how to get more debug information, here?

jschmid1 · 2017-09-20T12:35:33Z

@Martin-Weiss

There is a module called cephprocesses.py which checks for services to be up.

You can try it either on the respective node with:

salt-call cephprocesses.check

or target the node directly

salt '$thenode' cephprocesses.check

or condensed in a runner that checks all services for all roles on all nodes.

salt-run cephprocesses.check

Appending a -l debug might give us more insight.

Martin-Weiss · 2017-09-20T14:43:23Z

salt-call cephprocesses.check

result:

ses-5-single:~ # salt-call cephprocesses.check
[ERROR   ] An un-handled exception was caught by salt's global exception handler:
KeyError: 'openattic'
Traceback (most recent call last):
  File "/usr/bin/salt-call", line 11, in <module>
    salt_call()
  File "/usr/lib/python2.7/site-packages/salt/scripts.py", line 379, in salt_call
    client.run()
  File "/usr/lib/python2.7/site-packages/salt/cli/call.py", line 58, in run
    caller.run()
  File "/usr/lib/python2.7/site-packages/salt/cli/caller.py", line 134, in run
    ret = self.call()
  File "/usr/lib/python2.7/site-packages/salt/cli/caller.py", line 197, in call
    ret['return'] = func(*args, **kwargs)
  File "/var/cache/salt/minion/extmods/modules/cephprocesses.py", line 36, in check
    for process in processes[role]:
KeyError: 'openattic'
Traceback (most recent call last):
  File "/usr/bin/salt-call", line 11, in <module>
    salt_call()
  File "/usr/lib/python2.7/site-packages/salt/scripts.py", line 379, in salt_call
    client.run()
  File "/usr/lib/python2.7/site-packages/salt/cli/call.py", line 58, in run
    caller.run()
  File "/usr/lib/python2.7/site-packages/salt/cli/caller.py", line 134, in run
    ret = self.call()
  File "/usr/lib/python2.7/site-packages/salt/cli/caller.py", line 197, in call
    ret['return'] = func(*args, **kwargs)
  File "/var/cache/salt/minion/extmods/modules/cephprocesses.py", line 36, in check
    for process in processes[role]:
KeyError: 'openattic'

on the node itself (this is on the admin node)


ses-5-single:~ # salt 'ses-5-single*' cephprocesses.check
ses-5-single.emea.utopia.novell.com:
    The minion function caused an exception: Traceback (most recent call last):
      File "/usr/lib/python2.7/site-packages/salt/minion.py", line 1445, in _thread_return
        return_data = executor.execute()
      File "/usr/lib/python2.7/site-packages/salt/executors/direct_call.py", line 28, in execute
        return self.func(*self.args, **self.kwargs)
      File "/var/cache/salt/minion/extmods/modules/cephprocesses.py", line 36, in check
        for process in processes[role]:
    KeyError: 'openattic'

ses-5-single:~ # salt-run cephprocesses.check
True

Output of salt-call cephprocesses.check attached:

cephprocess-debug.txt

jschmid1 · 2017-09-20T15:03:37Z

for process in processes[role]:
KeyError: 'openattic'

this confirms that #661 will fix your issue

Martin-Weiss · 2017-09-20T15:37:46Z

Manually applied the change in #661 and executed salt "*" saltutil.sync_all.

--> After this the error is gone! THANKS!

khodayard · 2019-03-30T13:26:31Z

I'm having same issue with prometheus:

salt cl5.opn.shft cephprocesses.check

cl5.opn.shft:
The minion function caused an exception: Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/salt/minion.py", line 1445, in _thread_return
return_data = executor.execute()
File "/usr/lib/python2.7/site-packages/salt/executors/direct_call.py", line 28, in execute
return self.func(*self.args, **self.kwargs)
File "/var/cache/salt/minion/extmods/modules/cephprocesses.py", line 89, in check
if pdict_exe in processes[role] or pdict_name in processes[role]:
KeyError: 'prometheus

jschmid1 · 2019-04-03T22:58:43Z

Which deepsea version are you using @khodayard ?

khodayard · 2019-04-04T11:31:21Z

Versions Report
cl5:~ # salt-run deepsea.version
0.8.9+git.0.c638bee79
cl5:~ # rpm -qi salt-minion
Name : salt-minion
Version : 2016.11.4
Release : 8.1
Architecture: x86_64
Install Date: Mon Mar 25 12:31:18 2019
Group : System/Management
Size : 37807
License : Apache-2.0
Signature : RSA/SHA256, Mon Aug 7 15:31:24 2017, Key ID b88b2fd43dbdc284
Source RPM : salt-2016.11.4-8.1.src.rpm
Build Date : Mon Aug 7 15:30:15 2017
Build Host : cloud125
Relocations : (not relocatable)
Packager : http://bugs.opensuse.org
Vendor : openSUSE
URL : http://saltstack.org/
Summary : The client component for Saltstack
Description :
Salt minion is queried and controlled from the master.
Listens to the salt master and execute the commands.
Distribution: openSUSE Leap 42.3
cl5:~ # rpm -qi salt-master
Name : salt-master
Version : 2016.11.4
Release : 8.1
Architecture: x86_64
Install Date: Mon Mar 25 12:31:18 2019
Group : System/Management
Size : 1662854
License : Apache-2.0
Signature : RSA/SHA256, Mon Aug 7 15:31:24 2017, Key ID b88b2fd43dbdc284
Source RPM : salt-2016.11.4-8.1.src.rpm
Build Date : Mon Aug 7 15:30:15 2017
Build Host : cloud125
Relocations : (not relocatable)
Packager : http://bugs.opensuse.org
Vendor : openSUSE
URL : http://saltstack.org/
Summary : The management component of Saltstack with zmq protocol supported
Description :
The Salt master is the central server to which all minions connect.
Enabled commands to remote systems to be called in parallel rather
than serially.
Distribution: openSUSE Leap 42.3
cl5:~ #

jschmid1 · 2019-04-04T15:42:05Z

@khodayard There is no role-grafana or role-prometheus in 0.8.x yet. If you just remove that entry from the policy.cfg DeepSea will deploy your monitoring stack on the master.

khodayard · 2019-04-05T03:56:59Z

@jschmid1 thank you for your response. this is my policy.cfg now:

:~ # cat /srv/pillar/ceph/proposals/policy.cfg
role-master/cluster/cl5.opn.shft.sls
role-admin/cluster/.sls
cluster-ceph/cluster/.sls
role-mon/cluster/.sls
role-mgr/cluster/.sls
role-mds/cluster/.sls
role-igw/cluster/.sls
role-rgw/cluster/.sls
role-ganesha/cluster/.sls
role-openattic/cluster/.sls
config/stack/default/global.yml
config/stack/default/ceph/cluster.yml
profile-default/cluster/.sls
profile-default/stack/default/ceph/minions/*.yml

but I'm getting the same result:
Ended stage: ceph.stage.0 succeeded=14/42 failed=2/42 time=76.4s

Failures summary:

ceph.metapackage (/srv/salt/ceph/metapackage):
cl5.opn.shft:
ceph.processes (/srv/salt/ceph/processes):
cl5.opn.shft:
wait for all processes: Module function cephprocesses.wait threw an exception. Exception: 'prometheus'

I've even tried to upgrade deepsea to the latest version from github but it failed and I had to revert a snapshot:

:~ # deepsea stage run ceph.stage.0
Traceback (most recent call last):
File "/usr/bin/deepsea", line 9, in
load_entry_point('deepsea==0.9.16+24.g715e0713', 'console_scripts', 'deepsea')()
File "/usr/lib/python3.4/site-packages/pkg_resources/init.py", line 558, in load_entry_point
return get_distribution(dist).load_entry_point(group, name)
File "/usr/lib/python3.4/site-packages/pkg_resources/init.py", line 2682, in load_entry_point
return ep.load()
File "/usr/lib/python3.4/site-packages/pkg_resources/init.py", line 2355, in load
return self.resolve()
File "/usr/lib/python3.4/site-packages/pkg_resources/init.py", line 2361, in resolve
module = import(self.module_name, fromlist=['name'], level=0)
File "/usr/lib/python3.4/site-packages/deepsea/main.py", line 10, in
from .deepsea import main
File "/usr/lib/python3.4/site-packages/deepsea/deepsea.py", line 22, in
from .monitor import Monitor
File "/usr/lib/python3.4/site-packages/deepsea/monitor.py", line 17, in
from .salt_event import SaltEventProcessor
File "/usr/lib/python3.4/site-packages/deepsea/salt_event.py", line 11, in
import salt.config
ImportError: No module named 'salt'

Thanks again.

khodayard · 2019-04-05T04:54:28Z

@jschmid1 Would you please take a look at #1599 ? that's my main problem that I'm trying to fix using this workaround :)

jschmid1 · 2019-04-05T14:17:04Z

Make sure to run stage.2 after changing the policy.cfg

khodayard · 2019-04-06T16:24:34Z

Running stage.2 fixed that problem, thank you.

jschmid1 mentioned this issue Sep 20, 2017

Add openattic to list of checked processes #661

Merged

swiftgist closed this as completed Oct 3, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

stage.0: Module function cephprocesses.wait threw an exception. Exception: 'openattic' #657

stage.0: Module function cephprocesses.wait threw an exception. Exception: 'openattic' #657

Martin-Weiss commented Sep 20, 2017

swiftgist commented Sep 20, 2017

Martin-Weiss commented Sep 20, 2017

jschmid1 commented Sep 20, 2017 •

edited

Loading

Martin-Weiss commented Sep 20, 2017

jschmid1 commented Sep 20, 2017

Martin-Weiss commented Sep 20, 2017

khodayard commented Mar 30, 2019

jschmid1 commented Apr 3, 2019

khodayard commented Apr 4, 2019

jschmid1 commented Apr 4, 2019

khodayard commented Apr 5, 2019

khodayard commented Apr 5, 2019

jschmid1 commented Apr 5, 2019

khodayard commented Apr 6, 2019

stage.0: Module function cephprocesses.wait threw an exception. Exception: 'openattic' #657

stage.0: Module function cephprocesses.wait threw an exception. Exception: 'openattic' #657

Comments

Martin-Weiss commented Sep 20, 2017

swiftgist commented Sep 20, 2017

Martin-Weiss commented Sep 20, 2017

jschmid1 commented Sep 20, 2017 • edited Loading

Martin-Weiss commented Sep 20, 2017

jschmid1 commented Sep 20, 2017

Martin-Weiss commented Sep 20, 2017

khodayard commented Mar 30, 2019

salt cl5.opn.shft cephprocesses.check

jschmid1 commented Apr 3, 2019

khodayard commented Apr 4, 2019

jschmid1 commented Apr 4, 2019

khodayard commented Apr 5, 2019

khodayard commented Apr 5, 2019

jschmid1 commented Apr 5, 2019

khodayard commented Apr 6, 2019

jschmid1 commented Sep 20, 2017 •

edited

Loading