Table of Contents

Perf stat - SMI cost

Perf's stat tool has an option that can measure the cost of SMIs (–smi-cost) if the necessary MSRs exist on the CPU. This option calculates the percentage of cycles spent handling SMIs. If the –no-metric-only option is also used, the tool can be used to determine the average number of cycles taken to handle an SMI during the measurement period. These Perf options are available as of the 4.13 linux kernel and will only work on Intel x86 processors from after around 2008 (as of the Nehalem microarchitecture).

Disclaimer: The results from this tool may not be reliable. If the system enters an idle state while the tool is running and no SMIs occur, the counter values in the results appear to be inconsistent with their documentation in the Intel Software Development Manual.


Installation

On Debian, Perf can be installed using the package manager (linux-perf package). Sometimes it is alright if the Perf version is different than the system's kernel version, because Perf will still try and run on the different kernel if the features are available for it.

Add instructions (or link to instructions) for building perf from source?

Test

To measure the percentage of cycles spent handling SMIs use:

# perf stat --smi-cost

To see the number of cycles which were not spent handling SMIs, the number of SMIs, and the total number of cycles use:

# perf stat --smi-cost --no-metric-only

The average number of handling cycles per SMI can be calculated using these values.

The test can be stopped by pressing ctrl+c. Another option is to run a program of a finite duration from within the tool:

# perf stat --smi-cost --no-metric-only <command>

and then Perf will stop when the program ends.

When trying to run a Perf version that is different from the kernel version, the Perf version must be called explicitly.

For example, if Perf for the 4.16 kernel (linux-perf-4.16 Debian package) is installed on a system running the 4.9 kernel, then an example of a call using this version would be:

# perf_4.16 --help

Results

Give an example of results and explain how to read them because the perf documentation is confusing?

Analysis

Discuss the maximum suggested percentage?

If the average number of cycles per SMI is above a few microseconds, then SMI handling could be taking more time than it should.

Because the tool only provides the average time taken to resolve SMIs during a certain period, the results should be interpreted carefully. A low average does not necessarily mean that there are no SMI related latency problems. For example, an average duration of 2 us per SMI for one hundred SMIs could mean that there were ninety-nine 1 us SMIs and one 101 us SMI. This 101 us SMI could be a problem for some systems.