Skip to content

Qwen3 Instruct-2507 Models don't terminate in vllm when generating structured output #1700

@jako-vie

Description

@jako-vie

Description

Prompting Qwen3 instruct models (e.g. Qwen3-4B-Instruct-2507-FP8 or Qwen3-30B-A3B-Instruct-2507) served via vllm-openai to produce structured output seems to result in the model endlessly generating tokens until it runs into context length limits - even for very simple prompts like "What is 4+4?":

LengthFinishReasonError: Could not parse response content as the length limit was reached - CompletionUsage(completion_tokens=4081, prompt_tokens=15, total_tokens=4096, completion_tokens_details=None, prompt_tokens_details=None)

This has been observed on a server while running Qwen3-30B-A3B-Instruct-2507 on an H100 GPU but could be reproduced locally running Qwen3-4B-Instruct-2507-FP8 on an RTX 2000.

The issue could not be reproduced using regular Qwen3 models (e.g. Qwen3-0.6B), so it seems to be a problem with the instruct models in particular.

Reproduction

Run Qwen3 model in vllm-openai docker container:

docker run --runtime nvidia --gpus all -v ~/.cache/huggingface:/root/.cache/huggingface --env "HUGGING_FACE_HUB_TOKEN=$HF_TOKEN" -p 8000:8000 --ipc=host vllm/vllm-openai:latest --model Qwen/Qwen3-4B-Instruct-2507-FP8 --port=8000 --served-model-name=qwen --max-model-len=4096 --gpu-memory-utilization=0.8 --max-num-seqs=1

Prompt the model to produce structured output, e.g. via curl or openai-python:

from openai import OpenAI
from pydantic import BaseModel

client = OpenAI(base_url="http://localhost:8000/v1", api_key="...")
model = "qwen"
messages=[
    {
        "role": "user",
        "content": "What is 4+4?",
    }
]

# normal completion works fine
response = client.chat.completions.create(
    model=model,
    messages=messages
)
print(response.choices[0].message.content)

# structured output errors
class CalculationResult(BaseModel):
    answer: int   
response = client.beta.chat.completions.parse(
    model=model,
    messages=messages,
    response_format=CalculationResult
)
print(response.choices[0].message.content)

Logs

INFO 10-20 02:49:53 [__init__.py:216] Automatically detected platform cuda.
(APIServer pid=1) INFO 10-20 02:49:58 [api_server.py:1839] vLLM API server version 0.11.0
(APIServer pid=1) INFO 10-20 02:49:58 [utils.py:233] non-default args: {'model': 'Qwen/Qwen3-4B-Instruct-2507-FP8', 'max_model_len': 4096, 'served_model_name': ['qwen'], 'gpu_memory_utilization': 0.8, 'max_num_seqs': 1}
(APIServer pid=1) `torch_dtype` is deprecated! Use `dtype` instead!
(APIServer pid=1) INFO 10-20 02:50:06 [model.py:547] Resolved architecture: Qwen3ForCausalLM
(APIServer pid=1) INFO 10-20 02:50:06 [model.py:1510] Using max model len 4096
(APIServer pid=1) INFO 10-20 02:50:07 [scheduler.py:205] Chunked prefill is enabled with max_num_batched_tokens=2048.
INFO 10-20 02:50:10 [__init__.py:216] Automatically detected platform cuda.
(EngineCore_DP0 pid=92) INFO 10-20 02:50:12 [core.py:644] Waiting for init message from front-end.
(EngineCore_DP0 pid=92) INFO 10-20 02:50:12 [core.py:77] Initializing a V1 LLM engine (v0.11.0) with config: model='Qwen/Qwen3-4B-Instruct-2507-FP8', speculative_config=None, tokenizer='Qwen/Qwen3-4B-Instruct-2507-FP8', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.bfloat16, max_seq_len=4096, download_dir=None, load_format=auto, tensor_parallel_size=1, pipeline_parallel_size=1, data_parallel_size=1, disable_custom_all_reduce=False, quantization=fp8, enforce_eager=False, kv_cache_dtype=auto, device_config=cuda, structured_outputs_config=StructuredOutputsConfig(backend='auto', disable_fallback=False, disable_any_whitespace=False, disable_additional_properties=False, reasoning_parser=''), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None), seed=0, served_model_name=qwen, enable_prefix_caching=True, chunked_prefill_enabled=True, pooler_config=None, compilation_config={"level":3,"debug_dump_path":"","cache_dir":"","backend":"","custom_ops":[],"splitting_ops":["vllm.unified_attention","vllm.unified_attention_with_output","vllm.mamba_mixer2","vllm.mamba_mixer","vllm.short_conv","vllm.linear_attention","vllm.plamo2_mamba_mixer","vllm.gdn_attention","vllm.sparse_attn_indexer"],"use_inductor":true,"compile_sizes":[],"inductor_compile_config":{"enable_auto_functionalized_v2":false},"inductor_passes":{},"cudagraph_mode":[2,1],"use_cudagraph":true,"cudagraph_num_of_warmups":1,"cudagraph_capture_sizes":[4,2,1],"cudagraph_copy_inputs":false,"full_cuda_graph":false,"use_inductor_graph_partition":false,"pass_config":{},"max_capture_size":4,"local_cache_dir":null}
(EngineCore_DP0 pid=92) W1020 02:50:13.172000 92 torch/utils/cpp_extension.py:2425] TORCH_CUDA_ARCH_LIST is not set, all archs for visible cards are included for compilation. 
(EngineCore_DP0 pid=92) W1020 02:50:13.172000 92 torch/utils/cpp_extension.py:2425] If this is not desired, please set os.environ['TORCH_CUDA_ARCH_LIST'] to specific architectures.
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
(EngineCore_DP0 pid=92) INFO 10-20 02:50:14 [parallel_state.py:1208] rank 0 in world size 1 is assigned as DP rank 0, PP rank 0, TP rank 0, EP rank 0
(EngineCore_DP0 pid=92) INFO 10-20 02:50:14 [topk_topp_sampler.py:55] Using FlashInfer for top-p & top-k sampling.
(EngineCore_DP0 pid=92) INFO 10-20 02:50:14 [gpu_model_runner.py:2602] Starting to load model Qwen/Qwen3-4B-Instruct-2507-FP8...
(EngineCore_DP0 pid=92) INFO 10-20 02:50:14 [gpu_model_runner.py:2634] Loading model from scratch...
(EngineCore_DP0 pid=92) INFO 10-20 02:50:14 [cuda.py:366] Using Flash Attention backend on V1 engine.
(EngineCore_DP0 pid=92) INFO 10-20 02:50:15 [weight_utils.py:392] Using model weights format ['*.safetensors']
(EngineCore_DP0 pid=92) INFO 10-20 02:50:15 [weight_utils.py:450] No model.safetensors.index.json found in remote.
Loading safetensors checkpoint shards:   0% Completed | 0/1 [00:00<?, ?it/s]
Loading safetensors checkpoint shards: 100% Completed | 1/1 [00:00<00:00,  1.02it/s]
Loading safetensors checkpoint shards: 100% Completed | 1/1 [00:00<00:00,  1.01it/s]
(EngineCore_DP0 pid=92) 
(EngineCore_DP0 pid=92) INFO 10-20 02:50:16 [default_loader.py:267] Loading weights took 1.03 seconds
(EngineCore_DP0 pid=92) INFO 10-20 02:50:17 [gpu_model_runner.py:2653] Model loading took 4.2299 GiB and 2.154120 seconds
(EngineCore_DP0 pid=92) INFO 10-20 02:50:22 [backends.py:548] Using cache directory: /root/.cache/vllm/torch_compile_cache/6e7e71e247/rank_0_0/backbone for vLLM's torch.compile
(EngineCore_DP0 pid=92) INFO 10-20 02:50:22 [backends.py:559] Dynamo bytecode transform time: 4.76 s
(EngineCore_DP0 pid=92) INFO 10-20 02:50:25 [backends.py:197] Cache the graph for dynamic shape for later use
(EngineCore_DP0 pid=92) INFO 10-20 02:50:40 [backends.py:218] Compiling a graph for dynamic shape takes 17.97 s
(EngineCore_DP0 pid=92) WARNING 10-20 02:50:41 [fp8_utils.py:576] Using default W8A8 Block FP8 kernel config. Performance might be sub-optimal! Config file not found at /usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/quantization/utils/configs/N=6144,K=2560,device_name=NVIDIA_RTX_2000_Ada_Generation_Laptop_GPU,dtype=fp8_w8a8,block_shape=[128,128].json
(EngineCore_DP0 pid=92) WARNING 10-20 02:50:42 [fp8_utils.py:576] Using default W8A8 Block FP8 kernel config. Performance might be sub-optimal! Config file not found at /usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/quantization/utils/configs/N=2560,K=4096,device_name=NVIDIA_RTX_2000_Ada_Generation_Laptop_GPU,dtype=fp8_w8a8,block_shape=[128,128].json
(EngineCore_DP0 pid=92) WARNING 10-20 02:50:42 [fp8_utils.py:576] Using default W8A8 Block FP8 kernel config. Performance might be sub-optimal! Config file not found at /usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/quantization/utils/configs/N=19456,K=2560,device_name=NVIDIA_RTX_2000_Ada_Generation_Laptop_GPU,dtype=fp8_w8a8,block_shape=[128,128].json
(EngineCore_DP0 pid=92) WARNING 10-20 02:50:42 [fp8_utils.py:576] Using default W8A8 Block FP8 kernel config. Performance might be sub-optimal! Config file not found at /usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/quantization/utils/configs/N=2560,K=9728,device_name=NVIDIA_RTX_2000_Ada_Generation_Laptop_GPU,dtype=fp8_w8a8,block_shape=[128,128].json
(EngineCore_DP0 pid=92) INFO 10-20 02:51:08 [monitor.py:34] torch.compile takes 22.73 s in total
(EngineCore_DP0 pid=92) INFO 10-20 02:51:09 [gpu_worker.py:298] Available KV cache memory: 1.72 GiB
(EngineCore_DP0 pid=92) INFO 10-20 02:51:09 [kv_cache_utils.py:1087] GPU KV cache size: 12,544 tokens
(EngineCore_DP0 pid=92) INFO 10-20 02:51:09 [kv_cache_utils.py:1091] Maximum concurrency for 4,096 tokens per request: 3.06x
Capturing CUDA graphs (mixed prefill-decode, PIECEWISE): 100%|██████████| 3/3 [00:01<00:00,  2.00it/s]
Capturing CUDA graphs (decode, FULL): 100%|██████████| 1/1 [00:00<00:00, 19.97it/s]
(EngineCore_DP0 pid=92) INFO 10-20 02:51:11 [gpu_model_runner.py:3480] Graph capturing finished in 2 secs, took 0.04 GiB
(EngineCore_DP0 pid=92) INFO 10-20 02:51:11 [core.py:210] init engine (profile, create kv cache, warmup model) took 54.83 seconds
(APIServer pid=1) INFO 10-20 02:51:12 [loggers.py:147] Engine 000: vllm cache_config_info with initialization after num_gpu_blocks is: 784
(APIServer pid=1) INFO 10-20 02:51:12 [api_server.py:1634] Supported_tasks: ['generate']
(APIServer pid=1) WARNING 10-20 02:51:13 [model.py:1389] Default sampling parameters have been overridden by the model's Hugging Face generation config recommended from the model creator. If this is not intended, please relaunch vLLM instance with `--generation-config vllm`.
(APIServer pid=1) INFO 10-20 02:51:13 [serving_responses.py:137] Using default chat sampling params from model: {'temperature': 0.7, 'top_k': 20, 'top_p': 0.8}
(APIServer pid=1) INFO 10-20 02:51:13 [serving_chat.py:139] Using default chat sampling params from model: {'temperature': 0.7, 'top_k': 20, 'top_p': 0.8}
(APIServer pid=1) INFO 10-20 02:51:13 [serving_completion.py:76] Using default completion sampling params from model: {'temperature': 0.7, 'top_k': 20, 'top_p': 0.8}
(APIServer pid=1) INFO 10-20 02:51:13 [api_server.py:1912] Starting vLLM API server 0 on http://0.0.0.0:8000
(APIServer pid=1) INFO 10-20 02:51:13 [launcher.py:34] Available routes are:
(APIServer pid=1) INFO 10-20 02:51:13 [launcher.py:42] Route: /openapi.json, Methods: GET, HEAD
(APIServer pid=1) INFO 10-20 02:51:13 [launcher.py:42] Route: /docs, Methods: GET, HEAD
(APIServer pid=1) INFO 10-20 02:51:13 [launcher.py:42] Route: /docs/oauth2-redirect, Methods: GET, HEAD
(APIServer pid=1) INFO 10-20 02:51:13 [launcher.py:42] Route: /redoc, Methods: GET, HEAD
(APIServer pid=1) INFO 10-20 02:51:13 [launcher.py:42] Route: /health, Methods: GET
(APIServer pid=1) INFO 10-20 02:51:13 [launcher.py:42] Route: /load, Methods: GET
(APIServer pid=1) INFO 10-20 02:51:13 [launcher.py:42] Route: /ping, Methods: POST
(APIServer pid=1) INFO 10-20 02:51:13 [launcher.py:42] Route: /ping, Methods: GET
(APIServer pid=1) INFO 10-20 02:51:13 [launcher.py:42] Route: /tokenize, Methods: POST
(APIServer pid=1) INFO 10-20 02:51:13 [launcher.py:42] Route: /detokenize, Methods: POST
(APIServer pid=1) INFO 10-20 02:51:13 [launcher.py:42] Route: /v1/models, Methods: GET
(APIServer pid=1) INFO 10-20 02:51:13 [launcher.py:42] Route: /version, Methods: GET
(APIServer pid=1) INFO 10-20 02:51:13 [launcher.py:42] Route: /v1/responses, Methods: POST
(APIServer pid=1) INFO 10-20 02:51:13 [launcher.py:42] Route: /v1/responses/{response_id}, Methods: GET
(APIServer pid=1) INFO 10-20 02:51:13 [launcher.py:42] Route: /v1/responses/{response_id}/cancel, Methods: POST
(APIServer pid=1) INFO 10-20 02:51:13 [launcher.py:42] Route: /v1/chat/completions, Methods: POST
(APIServer pid=1) INFO 10-20 02:51:13 [launcher.py:42] Route: /v1/completions, Methods: POST
(APIServer pid=1) INFO 10-20 02:51:13 [launcher.py:42] Route: /v1/embeddings, Methods: POST
(APIServer pid=1) INFO 10-20 02:51:13 [launcher.py:42] Route: /pooling, Methods: POST
(APIServer pid=1) INFO 10-20 02:51:13 [launcher.py:42] Route: /classify, Methods: POST
(APIServer pid=1) INFO 10-20 02:51:13 [launcher.py:42] Route: /score, Methods: POST
(APIServer pid=1) INFO 10-20 02:51:13 [launcher.py:42] Route: /v1/score, Methods: POST
(APIServer pid=1) INFO 10-20 02:51:13 [launcher.py:42] Route: /v1/audio/transcriptions, Methods: POST
(APIServer pid=1) INFO 10-20 02:51:13 [launcher.py:42] Route: /v1/audio/translations, Methods: POST
(APIServer pid=1) INFO 10-20 02:51:13 [launcher.py:42] Route: /rerank, Methods: POST
(APIServer pid=1) INFO 10-20 02:51:13 [launcher.py:42] Route: /v1/rerank, Methods: POST
(APIServer pid=1) INFO 10-20 02:51:13 [launcher.py:42] Route: /v2/rerank, Methods: POST
(APIServer pid=1) INFO 10-20 02:51:13 [launcher.py:42] Route: /scale_elastic_ep, Methods: POST
(APIServer pid=1) INFO 10-20 02:51:13 [launcher.py:42] Route: /is_scaling_elastic_ep, Methods: POST
(APIServer pid=1) INFO 10-20 02:51:13 [launcher.py:42] Route: /invocations, Methods: POST
(APIServer pid=1) INFO 10-20 02:51:13 [launcher.py:42] Route: /metrics, Methods: GET
(APIServer pid=1) INFO:     Started server process [1]
(APIServer pid=1) INFO:     Waiting for application startup.
(APIServer pid=1) INFO:     Application startup complete.
(APIServer pid=1) INFO 10-20 02:51:20 [chat_utils.py:560] Detected the chat template content format to be 'string'. You can set `--chat-template-content-format` to override this.

(APIServer pid=1) INFO 10-20 03:31:03 [loggers.py:127] Engine 000: Avg prompt throughput: 1.5 tokens/s, Avg generation throughput: 0.9 tokens/s, Running: 0 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.0%, Prefix cache hit rate: 0.0%
(APIServer pid=1) INFO 10-20 03:31:13 [loggers.py:127] Engine 000: Avg prompt throughput: 1.5 tokens/s, Avg generation throughput: 11.4 tokens/s, Running: 1 reqs, Waiting: 0 reqs, GPU KV cache usage: 1.0%, Prefix cache hit rate: 0.0%
(APIServer pid=1) INFO 10-20 03:31:23 [loggers.py:127] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 22.9 tokens/s, Running: 1 reqs, Waiting: 0 reqs, GPU KV cache usage: 2.9%, Prefix cache hit rate: 0.0%
(APIServer pid=1) INFO 10-20 03:31:33 [loggers.py:127] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 22.6 tokens/s, Running: 1 reqs, Waiting: 0 reqs, GPU KV cache usage: 4.7%, Prefix cache hit rate: 0.0%
(APIServer pid=1) INFO 10-20 03:31:43 [loggers.py:127] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 22.4 tokens/s, Running: 1 reqs, Waiting: 0 reqs, GPU KV cache usage: 6.5%, Prefix cache hit rate: 0.0%
(APIServer pid=1) INFO 10-20 03:31:53 [loggers.py:127] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 22.5 tokens/s, Running: 1 reqs, Waiting: 0 reqs, GPU KV cache usage: 8.3%, Prefix cache hit rate: 0.0%
(APIServer pid=1) INFO 10-20 03:32:03 [loggers.py:127] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 22.3 tokens/s, Running: 1 reqs, Waiting: 0 reqs, GPU KV cache usage: 10.1%, Prefix cache hit rate: 0.0%
(APIServer pid=1) INFO 10-20 03:32:13 [loggers.py:127] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 22.2 tokens/s, Running: 1 reqs, Waiting: 0 reqs, GPU KV cache usage: 11.9%, Prefix cache hit rate: 0.0%
(APIServer pid=1) INFO 10-20 03:32:23 [loggers.py:127] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 22.1 tokens/s, Running: 1 reqs, Waiting: 0 reqs, GPU KV cache usage: 13.7%, Prefix cache hit rate: 0.0%
(APIServer pid=1) INFO 10-20 03:32:33 [loggers.py:127] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 21.3 tokens/s, Running: 1 reqs, Waiting: 0 reqs, GPU KV cache usage: 15.3%, Prefix cache hit rate: 0.0%
(APIServer pid=1) INFO 10-20 03:32:43 [loggers.py:127] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 21.9 tokens/s, Running: 1 reqs, Waiting: 0 reqs, GPU KV cache usage: 17.1%, Prefix cache hit rate: 0.0%
(APIServer pid=1) INFO 10-20 03:32:53 [loggers.py:127] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 21.8 tokens/s, Running: 1 reqs, Waiting: 0 reqs, GPU KV cache usage: 18.8%, Prefix cache hit rate: 0.0%
(APIServer pid=1) INFO 10-20 03:33:03 [loggers.py:127] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 21.7 tokens/s, Running: 1 reqs, Waiting: 0 reqs, GPU KV cache usage: 20.6%, Prefix cache hit rate: 0.0%
(APIServer pid=1) INFO 10-20 03:33:13 [loggers.py:127] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 21.2 tokens/s, Running: 1 reqs, Waiting: 0 reqs, GPU KV cache usage: 22.2%, Prefix cache hit rate: 0.0%
(APIServer pid=1) INFO 10-20 03:33:23 [loggers.py:127] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 21.3 tokens/s, Running: 1 reqs, Waiting: 0 reqs, GPU KV cache usage: 23.9%, Prefix cache hit rate: 0.0%
(APIServer pid=1) INFO 10-20 03:33:33 [loggers.py:127] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 21.4 tokens/s, Running: 1 reqs, Waiting: 0 reqs, GPU KV cache usage: 25.7%, Prefix cache hit rate: 0.0%
(APIServer pid=1) INFO 10-20 03:33:43 [loggers.py:127] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 21.2 tokens/s, Running: 1 reqs, Waiting: 0 reqs, GPU KV cache usage: 27.3%, Prefix cache hit rate: 0.0%
(APIServer pid=1) INFO 10-20 03:33:53 [loggers.py:127] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 21.2 tokens/s, Running: 1 reqs, Waiting: 0 reqs, GPU KV cache usage: 29.0%, Prefix cache hit rate: 0.0%
(APIServer pid=1) INFO 10-20 03:34:03 [loggers.py:127] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 20.8 tokens/s, Running: 1 reqs, Waiting: 0 reqs, GPU KV cache usage: 30.7%, Prefix cache hit rate: 0.0%
(APIServer pid=1) INFO 10-20 03:34:13 [loggers.py:127] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 20.3 tokens/s, Running: 1 reqs, Waiting: 0 reqs, GPU KV cache usage: 32.3%, Prefix cache hit rate: 0.0%
(APIServer pid=1) INFO:     172.17.0.1:44742 - "POST /v1/chat/completions HTTP/1.1" 200 OK

Environment Information

Local:

  • GPU: RTX 2000
  • CUDA Version: 12.4
  • Driver Version: 550.163.01
  • vllm: tested with 0.9.0.1 and 0.11.0 (latest)
  • model: Qwen3-4B-Instruct-2507-FP8
  • openai-python: 1.68.2

Server:

Structured output parameters for vllm were left at default values:

structured_outputs_config=StructuredOutputsConfig(backend='auto', disable_fallback=False, disable_any_whitespace=False, disable_additional_properties=False, reasoning_parser='')

Known Issue

  • The issue hasn't been already addressed in Documentation, Issues, and Discussions.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions