This page contains information about latencies caused by SMIs and about how they can be detected and resolved.
System management interrupts are high priority unmaskable hardware interrupts which cause the CPU to immediately suspend all other activities, including the operating system, and go into a special execution mode called system management mode (SMM). Once the system is in SMM, the interrupt is handled by firmware code.
System management mode is an execution mode in x86 processors that can only be entered via an SMI.
SMIs can cause problems for real-time systems because the operating system has no control over when an SMI will happen or over how much time the CPU will spend in SMM handling the SMI. In other words, an SMI can make it impossible for the CPU to handle a critical system interrupt before its deadline. Considering that the sometimes poorly written SMI handlers can potentially take milliseconds to execute, this can cause latencies that are extremely problematic for a real-time system. Unfortunately, these latencies tend to be difficult to resolve.
Below are some tools for confirming and characterizing SMIs presented in a suggested order of use. Counting SMIs with Cyclictest is a good place to start because it is a good idea to first confirm that SMIs are happening on a system before investigating the possibility that they are causing latencies.
The concepts of SMI and SMM are specific to x86, but some of the techniques and tools mentioned above can be used to debug similar firmware or hardware issues on other architectures.
Unless the unnecessarily frequent SMIs or long SMI handling times are caused by code that can be modified, eliminating SMI related latencies can be hard. SMIs are unmaskable, so in order to fix a latency without being able to modify code it is necessary to disable the particular SMI that is causing problems by changing the relevant option in the firmware (e.g. BIOS).
Successfully disabling the correct SMI via the firmware is a difficult process for a couple reasons. Firstly, there is very little documentation containing details about which SMIs happen in which situations, so it is hard to identify the specific SMI that is causing the problem. Secondly, even if the problematic SMI can be identified, there is also very little documentation about which firmware options can be used to disable a particular SMI. If it is not possible to identify the problematic SMI or the firmware option that will disable it, the remaining possibility is to go speak with the hardware vendor about the problem.
Because important tasks such as temperature management are performed in SMM, disabling all SMIs to prevent the system from ever going into SMM is not a recommended solution as it can result in a melted processor.