-
Notifications
You must be signed in to change notification settings - Fork 1
Description
Ideas from Nate email
These could be further enhanced by also querying the live tables to generate a row for the current state information, but I didn't attempt to get that working yet.
Scripts can be found here:
This first script sorts the csm_node_state_history_table by node_name, then history_time and calculates how long the node was in that state.
cat /u/besawn/bash/csm_db_node_state_history_duration_csv
/bin/psql -U postgres csmdb -qt -A -F "," -c "select history_time,node_name,state,lead(history_time) OVER (partition by node_name order by history_time) - history_time duration from csm_node_state_history"
cat /u/besawn/bash/csm_db_node_state_history_sum_csv
This script then calculates the total time each node spent in each state.
/bin/psql -U postgres csmdb -A -F "," -c "select node_name,state,sum(a.duration) from (select history_time,node_name,state,lead(history_time) over (partition by node_name order by history_time) - history_time duration from csm_node_state_history) a GROUP BY node_name,state order by node_name,state"
Some sample data obtained by doing a bunch of state changes on the air cooled nodes:
With no filters, all nodes and states are returned, by adding some grep filters, you can see information by node and state.
Currently the scripts generate csv output intended for further processing. I can adjust the psql flags to make the output more human readable if needed.
I think the scripts are generic enough that they can be run directly on any cluster, as long as the DB is named csmdb.
How much time did c650f99p08 spend in a state for every historical state transition?
/u/besawn/bash/csm_db_node_state_history_duration_csv | grep c650f99p08
2018-06-18 13:41:28.992176,c650f99p08,DISCOVERED,00:01:31.947354
2018-06-18 13:43:00.93953,c650f99p08,IN_SERVICE,00:00:19.309742
2018-06-18 13:43:20.249272,c650f99p08,OUT_OF_SERVICE,00:00:07.50034
2018-06-18 13:43:27.749612,c650f99p08,ADMIN_RESERVED,00:00:08.36665
2018-06-18 13:43:36.116262,c650f99p08,OUT_OF_SERVICE,00:00:04.876771
2018-06-18 13:43:40.993033,c650f99p08,IN_SERVICE,00:00:17.127929
2018-06-18 13:43:58.120962,c650f99p08,SOFT_FAILURE,00:00:51.21405
2018-06-18 13:44:49.335012,c650f99p08,IN_SERVICE,00:00:37.411699
2018-06-18 13:45:26.746711,c650f99p08,OUT_OF_SERVICE,00:00:03.937137
2018-06-18 13:45:30.683848,c650f99p08,IN_SERVICE,00:00:13.429394
2018-06-18 13:45:44.113242,c650f99p08,SOFT_FAILURE,00:00:22.572454
2018-06-18 13:46:06.685696,c650f99p08,ADMIN_RESERVED,00:00:05.061658
2018-06-18 13:46:11.747354,c650f99p08,IN_SERVICE,
How much time did c650f99p08 spend in SOFT_FAILURE for every historical state transition?
/u/besawn/bash/csm_db_node_state_history_duration_csv | grep c650f99p08 | grep SOFT_FAILURE
2018-06-18 13:43:58.120962,c650f99p08,SOFT_FAILURE,00:00:51.21405
2018-06-18 13:45:44.113242,c650f99p08,SOFT_FAILURE,00:00:22.572454
How much total time did c650f99p08 spend in SOFT_FAILURE?
/u/besawn/bash/csm_db_node_state_history_sum_csv | grep c650f99p08 | grep SOFT_FAILURE
c650f99p08,SOFT_FAILURE,00:01:13.786504
How much total time did c650f99p08 spend in all of the different states?
/u/besawn/bash/csm_db_node_state_history_sum_csv | grep c650f99p08
c650f99p08,DISCOVERED,00:01:31.947354
c650f99p08,IN_SERVICE,00:01:27.278764
c650f99p08,ADMIN_RESERVED,00:00:13.428308
c650f99p08,SOFT_FAILURE,00:01:13.786504
c650f99p08,OUT_OF_SERVICE,00:00:16.314248