diff --git a/doc/hld/zvmsdk_rate_limit.md b/doc/hld/zvmsdk_rate_limit.md new file mode 100644 index 000000000..0a8285fde --- /dev/null +++ b/doc/hld/zvmsdk_rate_limit.md @@ -0,0 +1,93 @@ +# Rate Limit In Z/VM SDKSERVER +- [Rate Limit In Z/VM SDKSERVER](#rate-limit-in-zvm-sdkserver) + - [Context](#context) + - [Objective](#objective) + - [Background](#background) + - [Design](#design) + - [Overview](#overview) + - [Interfaces](#interfaces) + - [Implmentation Details](#implmentation-details) + - [Sequence diagram](#sequence-diagram) + - [How oustanding requests limit is checked ?](#how-oustanding-requests-limit-is-checked-) + - [How rate limit is checked ?](#how-rate-limit-is-checked-) + - [What happens when limit is reached ?](#what-happens-when-limit-is-reached-) + - [Can a request by postponed indefinitely ?](#can-a-request-by-postponed-indefinitely-) + - [Configuration parameters introduced](#configuration-parameters-introduced) + - [Technical Debt](#technical-debt) + +## Context +### Objective +The objective of this change is to enable rate-limiting capabilities in the z/vm sdkserver. This is needed so that rate at which SMAPI requests are made can be controlled, thereby avoiding any issue in SMAPI due to unprocessable amount of requests. + +### Background +Recently coredump of SMAPI/SMCLI were observed in one of the customer's deployment of ICIC. The issue was triggered because of unprecedented number of requests sent to SMAPI in a very short time, i.e. almost simultaneously. This caused the SMAPI server process to crash. After discussions, few approaches/solutions were considered, for example: + +1. Apply rate limit and outstanding limit on the requests sent to SMAPI by sdkserver +2. Use event notifications to get the VM state info etc. instead of using periodic polling +3. Enance the SMAPI to accept payload with multiple VM IDs instead of separate request for each VM + +This document is for applying rate limit. + +## Design +### Overview +The implementation of rate limit filter and max outstanding requests filter is applied to sdkserver. New configuration parameters are added to control the introduced behavior. + +### Interfaces +This involves the communication between hypervisor module of nova-compute service and (z/vm) sdkserver. This is applicable to Z/VM compute nodes only. The sdkserver listens for requests from z/vm driver on 8080 (configurable) port over TCP stream. + +### Implmentation Details + +#### Sequence diagram + +```mermaid +sequenceDiagram +NOVA COMPUTE ->> SdkserverSocket: Send request to "http://127.0.0.1:8080" +Note over NOVA COMPUTE, SdkserverSocket: Function name & arguments, e.g.
["guest_get_info", ["HRXX0018"], {}] + +SdkserverSocket ->> Queue: Accept client connection
and put(blocking) to queue +Queue ->>+ WorkerThread: Process request +WorkerThread ->> Outstanding limit check: Check if outstanding requests
are within limit +WorkerThread ->> Rate limit check: Check if number of requests in current
window are within limit +WorkerThread -->>- Queue: Limit reached, sleep (1 - 3 seconds)
and put to queue again +Queue ->> WorkerThread: Pick the message again and retry +Note over Queue, WorkerThread: If request gets postponed
for more than certain duration, it fails + +``` + +Few notes: + +* The `sdkserver` opens a server socket and waits for requests from z/vm driver +* The queue is used to keep incoming requests before they can be picked and processed by any worker thread +* After accepting a request, client connection details is `put` to shared (among worker threads) queue. This is a blocking operation and waits if queue has no empty slot. +* After putting the client connection details in queue, worker thread is started. Maximum number of worker threads possible is controlled by `max_worker_count` configuration parameter. Number of threads already running within process is taken into account before starting new worker thread. +* Each worker thread picks (non-blocking) client connection details from the queue and tries to fulfil (hit SMAPI) the request. + +#### How oustanding requests limit is checked ? + +The oustanding requests refer to requests made to SMAPI on behalf of z/vm driver. The `SDKServer` maintains in-memory counters to check if this limit has passed for any of the requests received from z/vm driver. Just before hitting the `SMAPI`, this counter is incremented and after receiving the response (or failure), this counter is decremented. This an atomic counter and is thread-safe. + +#### How rate limit is checked ? + +The **rate** here refers to transactions (requests from z/vm driver) taking place in current time window. The window size can be defined using configuration parameter `smapi_rate_limit_window_size_seconds`. The transactions allowed in one window is defined by `smapi_rate_limit_per_window`. The `SDKServer` maintains in-memory data structures for keeping track of requests in current window span. The requests are tracked against total number of requests and also on basis of function name. The rate limit can be customized for specific functions also. For example, if the configuration parameter `smapi_rate_limit_per_window` is `total:30, guest_list:10, guest_get_info:5`, this means that in any given window span, there cannot be more than 30 requests hitting the SMAPI. And, number of `guest_list` and `guest_get_info` in these 30 requests, cannot be more than 10 and 5 respectively. + +#### What happens when limit is reached ? + +If limit for either outstanding requests or transactions in single time window exceeds configured value, the request is postponed and retried. For postponing, a delay of a randomly selected value between 1000 and 3000 milliseconds is added to processing of corresponding request. After adding the delay, the request is re-queued so that it can be picked for processing by any of the worker threads. + +#### Can a request by postponed indefinitely ? + +No. The `smapi_request_postpone_threshold_seconds` configuration parameter determines maximum duration for which a request can be postponed. If specified duration has passed and that request is still unable to get processed, it will fail with error. + +#### Configuration parameters introduced + +|Parameter|Type|Default Value|Description| +|---------|----|-------------|-----------| +|smapi_rate_limit_window_size_seconds|int|1|The duration to consider for rate limit window. For example, if we want 30 requests per 5 seconds, then specify 5 as the value here. Setting this value to 0 disables the rate-limit check.| +|smapi_rate_limit_per_window|str|total:30|The configuration for limiting the rate at which functions can be invoked. The "total" count must be specified. Other function limits can be specified optionally, e.g. "smapi_rate_limit_per_window = total:30, guest_list:10, guest_get_info:5, ...". The cumulative value specified for other function keywords should be less than the "total" value. The option to specify custom limit for certain functions can be helpful in limiting functions which are more resource/time intensive. If the rate limit has been reached, the request received by client is postponed for execution. A request can be postponed for anywhere between 1 and 3 seconds.| +|smapi_max_outstanding_requests|int|10|Maximum number of requests allowed for which response is not yet received. If current oustanding requests exceeds allowed limit, the request is postponed for processing in future. Under this situation, a request can be postponed for a duration between 1 second and 3 seconds. Setting this value to 0 disables the outstanding requests check.| +|smapi_request_postpone_threshold_seconds|int|60|Maximum duration in seconds for which a request sent by client can be postponed due to rate limit or outstanding requests threshold| + +### Technical Debt + +Currently, there is no mechanism to distinguish between requests associated with a user triggered operation and those associated with background polling tasks. It is of significance that user operation triggered requests are given high priority and not be postponed if possible. Otherwise, the user operation would be sluggish which might not be desirable. +The difficulty faced in solving this is that as of now, there is not an easy way to distinguish function calls made in user triggered and periodic tasks operations. This will need more analysis and efforts. diff --git a/doc/source/zvmsdk.conf.sample b/doc/source/zvmsdk.conf.sample index efe7bcfce..af6b154b3 100644 --- a/doc/source/zvmsdk.conf.sample +++ b/doc/source/zvmsdk.conf.sample @@ -209,6 +209,31 @@ # This param is optional #max_worker_count=64 +# The duration to consider for rate limit window. For example, if we want 30 requests per 5 seconds, then +# specify 5 as the value here. Setting this value to 0 disables the rate-limit check. +# +# This param is optional. Default value is "1" +#smapi_rate_limit_window_size_seconds=1 + +# The configuration for limiting the rate at which functions can be invoked. +# The "total" count must be specified. Other function limits can be specified optionally, +# e.g. "smapi_rate_limit_per_window = total:30, guest_list:10, guest_get_info:5, ..." +# The cumulative value specified for other function keywords should be less than the "total" +# value. The option to specify custom limit for certain functions can be helpful in limiting +# functions which are more resource/time intensive. If the rate limit has been reached, the +# request received by client is postponed for execution. A request can be postponed for anywhere +# between 1 and 3 seconds. +# +# This parameter is optional. Default value is "total:30" +#smapi_rate_limit_per_window=total:30 + +# Maximum number of requests allowed for which response is not yet received. +# If current oustanding requests exceeds allowed limit, the request is postponed for processing +# in future. Under this situation, a request can be postponed for a duration between 1 second +# and 3 seconds. Setting this value to 0 disables the outstanding requests check. +# +# This parameter is optional. Default value is 10. +#smapi_max_outstanding_requests=10 # # The size of request queue in SDK server. diff --git a/zvmsdk/config.py b/zvmsdk/config.py index 4b073fe6b..1e35e78d7 100755 --- a/zvmsdk/config.py +++ b/zvmsdk/config.py @@ -17,6 +17,7 @@ import configparser import os +from zvmsdk import constants class Opt(object): @@ -509,6 +510,50 @@ def __init__(self, opt_name, section='default', These worker threads would work concurrently to handle requests from client. This value should be adjusted according to the system resource and workload. +''' + ), + Opt('smapi_rate_limit_window_size_seconds', + section='sdkserver', + opt_type='int', + default=1, + help=''' +The duration to consider for rate limit window. For example, if we want 30 requests per 5 seconds, then +specify 5 as the value here. Setting this value to 0 disables the rate-limit check. +''' + ), + Opt('smapi_rate_limit_per_window', + section='sdkserver', + opt_type='str', + default=f'total:{constants.SMAPI_RATE_LIMIT_DEFAULT_TOTAL}', + help=''' +The configuration for limiting the rate at which functions can be invoked. +The "total" count must be specified. Other function limits can be specified optionally, +e.g. "smapi_rate_limit_per_window = total:30, guest_list:10, guest_get_info:5, ..." +The cumulative value specified for other function keywords should be less than the "total" +value. The option to specify custom limit for certain functions can be helpful in limiting +functions which are more resource/time intensive. If the rate limit has been reached, the +request received by client is postponed for execution. A request can be postponed for anywhere +between 1 and 3 seconds. +''' + ), + Opt('smapi_max_outstanding_requests', + section='sdkserver', + opt_type='int', + default='10', + help=''' +Maximum number of requests allowed for which response is not yet received. +If current oustanding requests exceeds allowed limit, the request is postponed for processing +in future. Under this situation, a request can be postponed for a duration between 1 second +and 3 seconds. Setting this value to 0 disables the outstanding requests check. +''' + ), + Opt('smapi_request_postpone_threshold_seconds', + section='sdkserver', + opt_type='int', + default='60', + help=''' +Maximum duration in seconds for which a request sent by client can be +postponed due to rate limit or outstanding requests threshold ''' ), # database options diff --git a/zvmsdk/constants.py b/zvmsdk/constants.py index 8e5377621..6fee7b451 100755 --- a/zvmsdk/constants.py +++ b/zvmsdk/constants.py @@ -150,3 +150,5 @@ HYPERVISOR_HOSTNAME_SUFFIX_FILE = '.zvmsdk_hypervisor_hostname_suffix' RESERVED_STOR_PATTERN = 'COMMAND DEF(I|IN|INE)? ST(O|OR|ORA|ORAG|ORAGE)? RESERVED .+' + +SMAPI_RATE_LIMIT_DEFAULT_TOTAL = 30 diff --git a/zvmsdk/sdkserver.py b/zvmsdk/sdkserver.py index de48e50ca..8cbafb1b4 100644 --- a/zvmsdk/sdkserver.py +++ b/zvmsdk/sdkserver.py @@ -22,12 +22,15 @@ import sys import threading import traceback +from time import time, sleep +from random import randrange from zvmsdk import api from zvmsdk import config from zvmsdk import exception from zvmsdk import log from zvmsdk import returncode +from zvmsdk import constants if six.PY3: import queue as Queue @@ -46,6 +49,18 @@ def __init__(self): self.server_socket = None self.request_queue = Queue.Queue(maxsize= CONF.sdkserver.request_queue_size) + + rate_limit_config_dict = {} + for entry in [cnf_entry.strip() for cnf_entry in CONF.sdkserver.smapi_rate_limit_per_window.split(',')]: + fields = entry.split(':') + rate_limit_config_dict[fields[0].strip()] = int(fields[1].strip()) + + self.outstanding_requests = 0 + self.outstanding_counter_lock = threading.Lock() + self.rate_limit_config = rate_limit_config_dict + self.rate_window_records = {} + self.rate_window_lock = threading.Lock() + self.postponed_requests = {} def log_error(self, msg): thread = threading.current_thread().name @@ -110,13 +125,107 @@ def send_results(self, client, addr, results): else: self.log_debug("(%s:%s) Results sent back to client successfully." % (addr[0], addr[1])) + + def _assert_within_rate_limit(self, client, addr): + """ + Ensures that handling the request does not violate rate-limit + specified configuration. It peeks the data from accepted clients socket, so that + it is still available for processing the request later. If configuration value of + smapi_rate_limit_window_size_seconds is less than or equal to 0, rate limit check + would be bypassed. The check is done on the basis values specified in "smapi_rate_limit_per_window" + in the configuration. The format for its value is "total:N1, FUNC1: N2, FUNC2: N3, ...". + For example, + [sdkserver] + smapi_rate_limit_per_window = total:30, guests_get_nic_info:5, guest_get_info : 20 ... + For all the available function names, refer zvmsdk.api.SDKAPI class + + Raises: + _RateLimitReached: If the configured rate threshold is reached + """ + + if CONF.sdkserver.smapi_rate_limit_window_size_seconds <= 0: + return + + data = client.recv(4096, socket.MSG_PEEK) + data = bytes.decode(data) + + # Any error getting the data sent by client is an error condition + # and SMAPI request won't be made in such cases. So, such cases + # do not apply for rate limit check. Not logging any error message + # as that would be logged during "recv" also. + if not data: + return + + api_data = json.loads(data) + + if not isinstance(api_data, list) or len(api_data) != 3: + return + + # Check called API is supported by SDK + (func_name, api_args, api_kwargs) = api_data + current_window_num = int(time() / CONF.sdkserver.smapi_rate_limit_window_size_seconds) + self.log_debug(f'Checking rate limit filter for {func_name}. Current window = {current_window_num}') + counts = self.rate_window_records.get(current_window_num, {}) + if not counts: + # Cleanup previously existing entries + with self.rate_window_lock: + self.rate_window_records = { + current_window_num: { + 'total': 1, + func_name: 1 + } + } + else: + new_total = counts['total'] + 1 + new_func_name_count = counts.get('func_name', 0) + 1 + allowed_total = int(self.rate_limit_config.get('total', constants.SMAPI_RATE_LIMIT_DEFAULT_TOTAL)) + allowed_func_name_count = int(self.rate_limit_config.get(func_name, sys.maxsize)) + if new_total <= allowed_total and new_func_name_count <= allowed_func_name_count: + self.log_debug(f'Request for {func_name} passes rate limit filter. New total = {new_total}, \ + new {func_name} count = {new_func_name_count}') + with self.rate_window_lock: + self.rate_window_records[current_window_num].update({ + 'total': new_total, + func_name: new_func_name_count + }) + else: + self.log_warn(f'Request for {func_name} cannot be made as rate limit is reached. \ + Current counts are: {self.rate_window_records}') + raise _RateLimitReached(f'Request {func_name}({api_args}, {api_kwargs}) skipped as rate limit is reached') + + def _assert_within_outstanding_limit(self): + """ + Ensures that number of transactions for which response is yet to be received does not + exceed configured value (smapi_max_outstanding_requests). If configured value is less + than or equal to 0, the outstanding limit check is bypassed. + + Raises: + _MaxOutstandingLimitReached: If number of outstanding requests has reached the configured max value + """ + if CONF.sdkserver.smapi_max_outstanding_requests <= 0: + return + + self.log_debug(f'Checking outstanding limit. Max outstanding requests = {CONF.sdkserver.smapi_max_outstanding_requests}') + if self.outstanding_requests > CONF.sdkserver.smapi_max_outstanding_requests: + raise _MaxOutstandingLimitReached( + f'Max outstanding requests limit reached. Current outstanding = {self.outstanding_requests}') def serve_API(self, client, addr): """ Read client request and call target SDK API""" self.log_debug("(%s:%s) Handling new request from client." % (addr[0], addr[1])) results = None + api_request_made = False + request_postponed = False try: + self._assert_within_outstanding_limit() + self._assert_within_rate_limit(client, addr) + + # The rate limit has passed, so clear the postponed requests entry for this connection + # if present. This is for scenario in which request from client was postponed and now + # it is tried again. + self.postponed_requests.pop(f'{addr[0]}:{addr[1]}', None) + data = client.recv(4096) data = bytes.decode(data) # When client failed to send the data or quit before sending the @@ -138,6 +247,7 @@ def serve_API(self, client, addr): # Check called API is supported by SDK (func_name, api_args, api_kwargs) = api_data + self.log_debug("(%s:%s) Request func: %s, args: %s, kwargs: %s" % (addr[0], addr[1], func_name, str(api_args), str(api_kwargs))) @@ -150,7 +260,31 @@ def serve_API(self, client, addr): return # invoke target API function + with self.outstanding_counter_lock: + self.outstanding_requests += 1 + api_request_made = True + self.log_info(f'Requesting API for {func_name} ({api_args}, {api_kwargs})') return_data = api_func(*api_args, **api_kwargs) + except (_RateLimitReached, _MaxOutstandingLimitReached) as e: + request_postponed = True + request_key = f'{addr[0]}:{addr[1]}' + if request_key not in self.postponed_requests: + self.postponed_requests[request_key] = int(time()) + elapsed_seconds = int(time()) - self.postponed_requests[request_key] + if elapsed_seconds > CONF.sdkserver.smapi_request_postpone_threshold_seconds: + self.postponed_requests.pop(request_key, None) + request_postponed = False + results = self.construct_internal_error( + f'Unable to process request as sdkserver is under heavy load. Waited for {elapsed_seconds} seconds') + else: + self.log_error(f'({addr[0]}:{addr[1]} {str(e)}). Limit reached, postponing the request.') + # The sleep is necessary because there are other worker threads waiting on ".get" of request_queue. + # If sleep is not added, it will immediately picked from the queue by any worker thread and + # same rate limit condition would be met. Sleep for random duration is added so that postponed + # requests are not attempted at once. + sleep(randrange(1000, 3000) / 1000) + self.request_queue.put((client, addr)) + except exception.SDKBaseException as e: self.log_error("(%s:%s) %s" % (addr[0], addr[1], traceback.format_exc())) @@ -183,6 +317,10 @@ def serve_API(self, client, addr): 'rc': 0, 'rs': 0, 'errmsg': '', 'output': return_data} + finally: + if api_request_made: + with self.outstanding_counter_lock: + self.outstanding_requests -= 1 # Send back the final results try: if results is not None: @@ -193,11 +331,14 @@ def serve_API(self, client, addr): # before the send() action. self.log_error("(%s:%s) %s" % (addr[0], addr[1], repr(e))) finally: - # Close the connection to make sure the thread socket got - # closed even when it got unexpected exceptions. - self.log_debug("(%s:%s) Finish handling request, closing " - "socket." % (addr[0], addr[1])) - client.close() + # If rate limit is exceeded, the request from client will be processed later. + # So, let's not close the client connection in this case + if not request_postponed: + # Close the connection to make sure the thread socket got + # closed even when it got unexpected exceptions. + self.log_debug("(%s:%s) Finish handling request, closing " + "socket." % (addr[0], addr[1])) + client.close() def worker_loop(self): # The worker thread would continuously fetch request from queue @@ -277,3 +418,11 @@ def start_daemon(): if server.server_socket is not None: server.log_info("Closing the server socket.") server.server_socket.close() + + +class _RateLimitReached(RuntimeError): + pass + + +class _MaxOutstandingLimitReached(RuntimeError): + pass