Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sched: Fix nx_waitid(), nx_waitpid(), nxtask_exithook() for SMP #2298

Merged
merged 1 commit into from
Nov 15, 2020

Conversation

masayuki2009
Copy link
Contributor

Summary

  • I noticed waitpid_test stops with lc823450-xgevk:rndis
  • The condition was CONFIG_DEBUG_ASSERTION=y
  • Actually, the child task sent SIGCHILD but the parent couldn't catch the signal
  • Then, I found that nx_waitid(), nx_waitpid() use sched_lock()
  • However, a parent task and a child task are running on different CPUs
  • So, sched_lock() is not enough and need to use a critical section
  • Also, signal handling in nxtask_exithook() must be done in a critical section

Impact

  • SMP only

Testing

  • Tested with ostest with the following configurations
  • lc823450-xgevk:rndis (CONFIG_DEBUG_ASSERTION=y and n)
  • spresense:smp
  • spresense:wifi_smp (NCPUS=2 and 4)
  • sabre-6quad:smp (QEMU)
  • esp32-core:smp (QEMU)
  • maix-bit:smp (QEMU)
  • sim:smp

Summary:
- I noticed waitpid_test stops with lc823450-xgevk:rndis
- The condition was CONFIG_DEBUG_ASSERTION=y
- Actually, the child task sent SIGCHILD but the parent couldn't catch the signal
- Then, I found that nx_waitid(), nx_waitpid() use sched_lock()
- However, a parent task and a child task are running on different CPUs
- So, sched_lock() is not enough and need to use a critical section
- Also, signal handling in nxtask_exithook() must be done in a critical section

Impact:
- SMP only

Testing:
- Tested with ostest with the following configurations
- lc823450-xgevk:rndis (CONFIG_DEBUG_ASSERTION=y and n)
- spresense:smp
- spresense:wifi_smp (NCPUS=2 and 4)
- sabre-6quad:smp (QEMU)
- esp32-core:smp (QEMU)
- maix-bit:smp (QEMU)
- sim:smp

Signed-off-by: Masayuki Ishikawa <[email protected]>
*/

#ifdef CONFIG_SMP
irqstate_t flags = enter_critical_section();
Copy link
Member

@Ouss4 Ouss4 Nov 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't the call to sched_(un)lock in line 156 not needed then?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Ouss4
Thanks for your comment. I will modify the code and do the regression test again.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Ouss4
I tried the following change in sched_waitid.c and sched_waitpid.c
The ostest passed for the all SMP configurations.

However, Wi-Fi streaming plus stress testing with spresense:wifi_smp (NCPUS=4, DEBUG_ASSERTIONS=y)
failed with DEBUGASSERTION after 3.5 hours. (FYI, without the following changes, the test continued over 9 hours)

I think replacing sched_lock() and sched_unlock() with a critical section should be a future task.
What do you think?

#ifdef CONFIG_SMP                                                                                                                                                                   
  /* NOTE: sched_lock() is not enough for SMP                                                                                                                                       
   * because the child task is running on another CPU                                                                                                                               
   */                                                                                                                                                                               
                                                                                                                                                                                    
  irqstate_t flags = enter_critical_section();                                                                                                                                      
#else                                                                                                                                                                               
  /* Disable pre-emption so that nothing changes while the loop executes */                                                                                                         
                                                                                                                                                                                    
  sched_lock();                                                                                                                                                                     
#endif 

...

#ifdef CONFIG_SMP                                                                                                                                                                   
  leave_critical_section(flags);                                                                                                                                                    
#else                                                                                                                                                                               
  sched_unlock();                                                                                                                                                                   
#endif   

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think replacing sched_lock() and sched_unlock() with a critical section should be a future task.

Discussed a little in issue #1138, Related to issue #1137

@patacongo
Copy link
Contributor

See issue #1137

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants