User Tools

Site Tools


realtime:documentation:howto:tools:cpu-partitioning:start

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Last revision Both sides next revision
realtime:documentation:howto:tools:cpu-partitioning:start [2024/05/13 02:04]
alison [CPU affinity of tasks]
realtime:documentation:howto:tools:cpu-partitioning:start [2024/05/27 14:17]
alison [Realtime application best practices]
Line 49: Line 49:
 Kworker threads and the workqueue tasks which they perform are a special case.   While it is possible rely on //taskset// and //​sched_setaffinity()//​ to manage kworkers, doing so is of little utility since the threads are often short-lived and, at any rate, often perform a wide variety of work.  The paradigm with workqueues is instead to associate an affinity setting with the task itself. ​  "​Unbound"​ is the name for workqueues which are not per-CPU. ​ These workqueues consume a lot of CPU time on many systems and tend to present the greatest management challenge for latency control. ​  Those unbound workqueues which appear in ///​sys/​devices/​virtual/​workqueue//​ are configurable from userspace. ​  The parameters //​affinity_scope//,​ //​affinity_strict//​ and //​cpu_mask//​ together determine on which cores the kworker which executes the work function will run.   Many unbound workqueues are not configurable via sysfs. ​ Making their properties visible there requires an additional //​WQ_SYSFS//​ flag in the kernel source. Kworker threads and the workqueue tasks which they perform are a special case.   While it is possible rely on //taskset// and //​sched_setaffinity()//​ to manage kworkers, doing so is of little utility since the threads are often short-lived and, at any rate, often perform a wide variety of work.  The paradigm with workqueues is instead to associate an affinity setting with the task itself. ​  "​Unbound"​ is the name for workqueues which are not per-CPU. ​ These workqueues consume a lot of CPU time on many systems and tend to present the greatest management challenge for latency control. ​  Those unbound workqueues which appear in ///​sys/​devices/​virtual/​workqueue//​ are configurable from userspace. ​  The parameters //​affinity_scope//,​ //​affinity_strict//​ and //​cpu_mask//​ together determine on which cores the kworker which executes the work function will run.   Many unbound workqueues are not configurable via sysfs. ​ Making their properties visible there requires an additional //​WQ_SYSFS//​ flag in the kernel source.
  
-Since kernel 6.5, the //​tools/​workqueue/​wq_monitor.py//​ Python script is available in-tree, and since 6.6,  //​wq_dump.py//​ has joined it.   These Python scripts require the //drgn// debugger, which is packaged by major Linux distributions. ​ Another recent addition of potential particular interest for the realtime project is //wqlat.py//, which is part of the //​bcc/​tools//​ suite (see https://​github.com/​iovisor/​bcc/​blob/​master/​tools/​wqlat.py).  Both sets of tools may require special kernel configuration settings.+Since kernel 6.5, the //​tools/​workqueue/​wq_monitor.py//​ Python script is available in-tree, and since 6.6,  //​wq_dump.py//​ has joined it.   These Python scripts require the //drgn// debugger, which is packaged by major Linux distributions. ​ Another recent addition of potential particular interest for the realtime project is [[https://​github.com/​iovisor/​bcc/​blob/​master/​tools/wqlat.py|wqlat.py]], which is part of the //​bcc/​tools//​ suite (see ).  Both sets of tools may require special kernel configuration settings.
  
 ==== IRQ affinity ==== ==== IRQ affinity ====
Line 78: Line 78:
   * **Tools**: [[realtime:​documentation:​howto:​tools:​cpu-partitioning:​irqbalanced|irqbalanced]]   * **Tools**: [[realtime:​documentation:​howto:​tools:​cpu-partitioning:​irqbalanced|irqbalanced]]
  
-===== Realtime application example =====+  * **Tools**: [[realtime:​documentation:​howto:​tools:​cpu-partitioning:​taskset|taskset]]
  
-FIXME and fill it with content.+==== Housekeeping cores ====
  
 +A common paradigm with realtime systems is to pin latency-insensitive kernel and userspace tasks tasks on a designated "​housekeeping"​ core.   For example, //taskset// can pin kernel threads like //kswapd// and //​kauditd//​. ​ Applications whose network traffic latency is not critical may wish to pin network IRQs there as well.   ​Userspace threads which are sometimes CPU-intensive like systemd and rsyslog may also be pinned on the housekeeping core.  Pinning userspace threads will not have the desired effect if much of their work is performed by [[https://​wiki.linuxfoundation.org/​realtime/​documentation/​howto/​tools/​cpu-partitioning/​start#​cpu_affinity_and_kworkers|unbound workqueues]],​ which may migrate to any core.
 +
 +==== Softirqs and kthreads ====
 +
 +Softirqs are kernel threads which are often challenging to manage on realtime systems. ​ Softirqs may run in atomic context immediately following a hard IRQ which "​raises"​ them, or they may be executed in process context by per-CPU kernel threads called //​ksoftirqd/​n//,​ where //n// is the core number. ​  There are 10 kinds of softirqs which perform diverse tasks for the networking, block, scheduling, timer and [[https://​wiki.linuxfoundation.org/​realtime/​documentation/​technical_details/​rcu|RCU]] subsystems as well as executing callbacks for a large number of device drivers via the tasklet mechanism. ​  Only one softirq of any kind may be active at any given time on a core.   Thus if ksoftirqd is preempted by a hard IRQ, the associated soft interrupt is disabled from following it immediately,​ and must wait for ksoftirqd. ​  This unfortunate situation has been called "the new Big Kernel Lock" by realtime Linux maintainers.
 +
 +Kernel configuration allows system managers to move the NET_RX and RCU callbacks out of softirqs and into their own kthreads. ​ Since kernel 5.12, moving the NET_RX into its own kthread is possible by //​echo//​-ing '​1'​ into the //​threaded//​ sysfs attribute associated with a network device. ​ The process table will afterwards include a new kthread called //​napi/​xxx//,​ where xxx is the interface name. [Read more about the [[https://​wiki.linuxfoundation.org/​networking/​napi?​s[]=napi|NAPI]] mechanism in the networking wiki.] ​ Userspace may employ //taskset// to pin this kthread on any core.   ​Moving the softirq into its own kthread incurs a context-switch penalty, but even so may be worthwhile on systems where bursts of network traffic unacceptably delay applications. ​  ​[[https://​wiki.linuxfoundation.org/​realtime/​documentation/​technical_details/​rcu?​s[]=rcu#​rcu_callback_offloading|RCU Callback Offloading]] produces a new set of kthreads, and can be accomplished via a combination of compile-time configuration with boot-time command-line parameters.
 +===== Realtime application best practices =====
 +
 +Multithreaded applications which rely on glibc'​s libpthread are prone to unexpected latency delays since pthread condition variables do not honor priority inheritance ([[https://​sourceware.org/​bugzilla/​show_bug.cgi?​id=11588|bugzilla]]). [[https://​github.com/​dvhart/​librtpi|librtpi]] is an alternative LGPL-licensed pthread implementation which supports priority inheritance,​ and whose API as close to glibc'​s as possible. ​ The alternative [[https://​www.musl-libc.org/​|MUSL libc]] has a pthread condition variable implementation similar to glibc'​s.
  
  
realtime/documentation/howto/tools/cpu-partitioning/start.txt ยท Last modified: 2024/05/27 23:33 by alison