Perf's stat tool has an option that can measure the cost of SMIs (–smi-cost) if the necessary MSRs exist on the CPU. This option calculates the percentage of cycles spent handling SMIs. If the –no-metric-only option is also used, the tool can be used to determine the average number of cycles taken to handle an SMI during the measurement period. These Perf options are available as of the 4.13 linux kernel and will only work on Intel x86 processors from after around 2008 (as of the Nehalem microarchitecture).
Disclaimer: The results from this tool may not be reliable. If the system enters an idle state while the tool is running and no SMIs occur, the counter values in the results appear to be inconsistent with their documentation in the Intel Software Development Manual.
On Debian, Perf can be installed using the package manager (linux-perf package). Sometimes it is alright if the Perf version is different than the system's kernel version, because Perf will still try and run on the different kernel if the features are available for it.
Add instructions (or link to instructions) for building perf from source?
To measure the percentage of cycles spent handling SMIs use:
# perf stat --smi-cost
To see the number of cycles which were not spent handling SMIs, the number of SMIs, and the total number of cycles use:
# perf stat --smi-cost --no-metric-only
The average number of handling cycles per SMI can be calculated using these values.
The test can be stopped by pressing ctrl+c. Another option is to run a program of a finite duration from within the tool:
# perf stat --smi-cost --no-metric-only <command>
and then Perf will stop when the program ends.
When trying to run a Perf version that is different from the kernel version, the Perf version must be called explicitly.
For example, if Perf for the 4.16 kernel (linux-perf-4.16 Debian package) is installed on a system running the 4.9 kernel, then an example of a call using this version would be:
# perf_4.16 --help
Give an example of results and explain how to read them because the perf documentation is confusing?
Discuss the maximum suggested percentage?
If the average number of cycles per SMI is above a few microseconds, then SMI handling could be taking more time than it should.
Because the tool only provides the average time taken to resolve SMIs during a certain period, the results should be interpreted carefully. A low average does not necessarily mean that there are no SMI related latency problems. For example, an average duration of 2 us per SMI for one hundred SMIs could mean that there were ninety-nine 1 us SMIs and one 101 us SMI. This 101 us SMI could be a problem for some systems.