Skip to content
Open
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions scheds/rust/scx_lavd/src/bpf/balance.bpf.c
Original file line number Diff line number Diff line change
Expand Up @@ -257,6 +257,12 @@ static bool try_to_steal_task(struct cpdom_ctx *cpdomc)
if (!cpdomc->nr_active_cpus)
return false;

/*
* No stealee, nothing to steal.
*/
if (!sys_stat.nr_stealee)
return false;

Comment on lines +260 to +265
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... I feel this is a bit redundant since calling try_to_steal_task() is anyway gated by cpdomc->is_stealer. Is it possible that there is a stealer but no stealee? Logically, that does not make sense. Can we somehow guarantee that there is no such case triggered in `plan_x_cpdom_migration()?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that logically, a “stealer but no stealee” state shouldn’t exist. As we focus on resolving the overloading domains. However, there are still several scenarios:

  1. It's valid in a big system. Some domains are under-loaded, but no domains are stealees.
  2. In a dynamic system, the domain could become a stealer due to the loading
    changes. Particularly, the cur_util_sum fluctuates quickly.
  3. Even the system continues to migrate tasks and set both is_stealee and
    is_stealer to false, however, if the number of stealers is more than stealees,
    the system would continue to migrate tasks.
  4. The asynchronous update by the timer is also possible that when
    consume_task() is called, the roles are not updated completely yet.
  5. The stale issue described above. Despite the stealer increases the loading
    (become neutral) and the all the stealees has been balanced by other stealers.
    But, the recalculation is not conducted and the stale stealers are still there.

Even though there exists a stealer, it's not harmful. In pick_idle_cpu(), the stealer
would be selected as a sticky domain.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even though there exists a stealer, it's not harmful. In pick_idle_cpu(), the stealer would be selected as a sticky domain.

When there is no stealee, why not turn off all stealer flags? Since plan_x_cpdom_migration() runs once in a while, imposing a bit more overhead (2-pass scanning: first for stealee then for stealer) there would be better. What do you think?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reviewed again the sticky domain logic in pick_idle_cpu(), and found the
stealer is chosen only when there exists stealee, and not based on is_stealer alone.

        if (!cpdc || !READ_ONCE(cpdc->is_stealee))
                goto skip_fully_idle_neighbor;

This means that the stealer is not used without a stealee. So, the last statement

Even though there exists a stealer, it's not harmful. In pick_idle_cpu(), the stealer
would be selected as a sticky domain.

of my previous comment is weak.

Given above justification, we can safely set the stealer to false when there is
no stealee. How about the following patch?

	/*
	 * If there is no overloaded domain (no stealees), skip load balancing.
	 * Clear all stealer/stealee roles to prevent stale state from previous
	 * balancing rounds from triggering incorrect migrations. While some
	 * domains may be underloaded (stealers), migration is unnecessary
	 * without overloaded domains (stealees) to steal from.
	 *  <~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~>
	 * [stealer_threshold ... avg_sc_load ... max_sc_load ... stealee_threshold]
	 *            -------------------------------------->
	 */
	if ((stealee_threshold > max_sc_load) && !overflow_running) {
		bpf_for(cpdom_id, 0, nr_cpdoms) {
			if (cpdom_id >= LAVD_CPDOM_MAX_NR)
				break;
			
			cpdomc = MEMBER_VPTR(cpdom_ctxs, [cpdom_id]);
			WRITE_ONCE(cpdomc->is_stealer, false);
			WRITE_ONCE(cpdomc->is_stealee, false);
		}
		sys_stat.nr_stealee = 0;
		return 0;
	}

Maybe separate these into another patch to not mix the logic. Also, I put the
logic in comment to make it clearer.

	/*
	 * At this point, there is at least one overloaded domain (stealee),
	 * indicated by the following condition, which is the negation of the
	 * previous if-statement:
	 *    stealee_threshold <= max_sc_load || overflow_running
	 *
	 * Adjust the stealer threshold to minimum scaled load to ensure that
	 * there exists at least one stealer.
	 */
	if (stealer_threshold < min_sc_load) {

However, we still need to bear in mind that this is to avoid the stale state and
stealer still could be observed while there is no stealee when the number of
stealers is more than stealees, and the asynchronous update situation. But, this
is harmless in this case. The impact is for the try_to_steal_task() to run more
loops to select the non-existing stealee domain to steal.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both changes look good to me. Please force-push another version.

/*
* Probabilistically make a go or no go decision to avoid the
* thundering herd problem. In other words, one out of nr_cpus
Expand Down