Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions providers/mlx5/verbs.c
Original file line number Diff line number Diff line change
Expand Up @@ -1770,6 +1770,12 @@ static int mlx5_get_max_recv_wr(struct mlx5_context *ctx,
struct mlx5dv_qp_init_attr *mlx5_qp_attr,
uint32_t *max_recv_wr)
{
/* Ignore max_recv_wr when SRQ is used. */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the issue that you hit without that change ? not clear from the code and the commit log.

What are the values of the below in your case ?

ctx->max_recv_wr
attr->cap.max_recv_wr

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my case attr->cap.max_recv_wr is way greater than ctx->max_recv_wr because it contains random bytes on stack. However, since the manual notes that "The attributes max_recv_wr and max_recv_sge are ignored by ibv_create_qp() if the QP is to be associated with an SRQ" (https://man7.org/linux/man-pages/man3/ibv_create_qp.3.html), I believe that any value of attr->cap.max_recv_wr is acceptable when QP is associated with SRQ.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without this change I will get EINVAL from ibv_create_qp() if attr->cap.max_recv_wr > cap.max_recv_wr even if the QP is to be associated with SRQ.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked around at other drivers, is that case handled at all ?

Why should an application use SRQ and pass attr->cap.max_recv_wr which is different than 0 at all ?

If the issue is really important to handle, we may consider setting attr->cap.max_recv_wr to 0 in the verbs layer to let it work for all drivers around.

Please check and let's get other feedback here.

Copy link
Contributor Author

@FujiZ FujiZ Jul 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked around at other drivers, is that case handled at all ?

I'm not sure if other drivers handle this issue correctly, since I only have mellanox NIC at hand.

Why should an application use SRQ and pass attr->cap.max_recv_wr which is different than 0 at all ?

Because as I mentioned before, the manual said that "The attributes max_recv_wr and max_recv_sge are ignored by ibv_create_qp() if the QP is to be associated with an SRQ" (https://man7.org/linux/man-pages/man3/ibv_create_qp.3.html), which means that assigning different values to max_recv_wr should produce the same result when SRQ is used. Besides, the manual didn't mention that "The attributes max_recv_wr and max_recv_sge should be set to 0 if the QP is to be associated with an SRQ", so it is valid for an application to use SRQ and pass attr->cap.max_recv_wr which is different than 0. The value of attr->cap.max_recv_wr may come from uninitialized stack variables, since the application may construct the ibv_qp_init_attr on stack and not explicitly set the max_recv_wr when using SRQ.

If the issue is really important to handle, we may consider setting attr->cap.max_recv_wr to 0 in the verbs layer to let it work for all drivers around.

Yeah I think that is viable choice, too.

if (attr->srq) {
*max_recv_wr = 0;
return 0;
}

if (mlx5_qp_attr && (mlx5_qp_attr->create_flags & MLX5DV_QP_CREATE_OOO_DP) &&
attr->cap.max_recv_wr > 1) {
uint32_t max_recv_wr_cap = 0;
Expand Down