User Tools

Site Tools


realtime:documentation:howto:tools:cpu-partitioning:start

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
realtime:documentation:howto:tools:cpu-partitioning:start [2024/05/25 22:12]
alison [Realtime application example]
realtime:documentation:howto:tools:cpu-partitioning:start [2024/05/27 23:33] (current)
alison [Realtime application best practices]
Line 49: Line 49:
 Kworker threads and the workqueue tasks which they perform are a special case.   While it is possible rely on //taskset// and //​sched_setaffinity()//​ to manage kworkers, doing so is of little utility since the threads are often short-lived and, at any rate, often perform a wide variety of work.  The paradigm with workqueues is instead to associate an affinity setting with the task itself. ​  "​Unbound"​ is the name for workqueues which are not per-CPU. ​ These workqueues consume a lot of CPU time on many systems and tend to present the greatest management challenge for latency control. ​  Those unbound workqueues which appear in ///​sys/​devices/​virtual/​workqueue//​ are configurable from userspace. ​  The parameters //​affinity_scope//,​ //​affinity_strict//​ and //​cpu_mask//​ together determine on which cores the kworker which executes the work function will run.   Many unbound workqueues are not configurable via sysfs. ​ Making their properties visible there requires an additional //​WQ_SYSFS//​ flag in the kernel source. Kworker threads and the workqueue tasks which they perform are a special case.   While it is possible rely on //taskset// and //​sched_setaffinity()//​ to manage kworkers, doing so is of little utility since the threads are often short-lived and, at any rate, often perform a wide variety of work.  The paradigm with workqueues is instead to associate an affinity setting with the task itself. ​  "​Unbound"​ is the name for workqueues which are not per-CPU. ​ These workqueues consume a lot of CPU time on many systems and tend to present the greatest management challenge for latency control. ​  Those unbound workqueues which appear in ///​sys/​devices/​virtual/​workqueue//​ are configurable from userspace. ​  The parameters //​affinity_scope//,​ //​affinity_strict//​ and //​cpu_mask//​ together determine on which cores the kworker which executes the work function will run.   Many unbound workqueues are not configurable via sysfs. ​ Making their properties visible there requires an additional //​WQ_SYSFS//​ flag in the kernel source.
  
-Since kernel 6.5, the //​tools/​workqueue/​wq_monitor.py//​ Python script is available in-tree, and since 6.6,  //​wq_dump.py//​ has joined it.   These Python scripts require the //drgn// debugger, which is packaged by major Linux distributions. ​ Another recent addition of potential particular interest for the realtime project is //wqlat.py//, which is part of the //​bcc/​tools//​ suite (see https://​github.com/​iovisor/​bcc/​blob/​master/​tools/​wqlat.py).  Both sets of tools may require special kernel configuration settings.+Since kernel 6.5, the //​tools/​workqueue/​wq_monitor.py//​ Python script is available in-tree, and since 6.6,  //​wq_dump.py//​ has joined it.   These Python scripts require the //drgn// debugger, which is packaged by major Linux distributions. ​ Another recent addition of potential particular interest for the realtime project is [[https://​github.com/​iovisor/​bcc/​blob/​master/​tools/wqlat.py|wqlat.py]], which is part of the //​bcc/​tools//​ suite (see ).  Both sets of tools may require special kernel configuration settings.
  
 ==== IRQ affinity ==== ==== IRQ affinity ====
Line 89: Line 89:
  
 Kernel configuration allows system managers to move the NET_RX and RCU callbacks out of softirqs and into their own kthreads. ​ Since kernel 5.12, moving the NET_RX into its own kthread is possible by //​echo//​-ing '​1'​ into the //​threaded//​ sysfs attribute associated with a network device. ​ The process table will afterwards include a new kthread called //​napi/​xxx//,​ where xxx is the interface name. [Read more about the [[https://​wiki.linuxfoundation.org/​networking/​napi?​s[]=napi|NAPI]] mechanism in the networking wiki.] ​ Userspace may employ //taskset// to pin this kthread on any core.   ​Moving the softirq into its own kthread incurs a context-switch penalty, but even so may be worthwhile on systems where bursts of network traffic unacceptably delay applications. ​  ​[[https://​wiki.linuxfoundation.org/​realtime/​documentation/​technical_details/​rcu?​s[]=rcu#​rcu_callback_offloading|RCU Callback Offloading]] produces a new set of kthreads, and can be accomplished via a combination of compile-time configuration with boot-time command-line parameters. Kernel configuration allows system managers to move the NET_RX and RCU callbacks out of softirqs and into their own kthreads. ​ Since kernel 5.12, moving the NET_RX into its own kthread is possible by //​echo//​-ing '​1'​ into the //​threaded//​ sysfs attribute associated with a network device. ​ The process table will afterwards include a new kthread called //​napi/​xxx//,​ where xxx is the interface name. [Read more about the [[https://​wiki.linuxfoundation.org/​networking/​napi?​s[]=napi|NAPI]] mechanism in the networking wiki.] ​ Userspace may employ //taskset// to pin this kthread on any core.   ​Moving the softirq into its own kthread incurs a context-switch penalty, but even so may be worthwhile on systems where bursts of network traffic unacceptably delay applications. ​  ​[[https://​wiki.linuxfoundation.org/​realtime/​documentation/​technical_details/​rcu?​s[]=rcu#​rcu_callback_offloading|RCU Callback Offloading]] produces a new set of kthreads, and can be accomplished via a combination of compile-time configuration with boot-time command-line parameters.
-===== Realtime application best practices ===== 
- 
-Multithreaded applications which rely on glibc'​s libpthread are prone to unexpected latency delays since pthread mutexes and condition variables do not honor priority inheritance. ​ librtpi ([[https://​github.com/​dvhart/​librtpi|https://​github.com/​dvhart/​librtpi]]) is an alternative LGPL-licensed pthread implementation which supports priority inheritance,​ and whose API as close to glibc'​s as possible. 
- 
  
realtime/documentation/howto/tools/cpu-partitioning/start.1716675156.txt.gz ยท Last modified: 2024/05/25 22:12 by alison